name: nextflow-development description: 在测序数据上运行nf-core生物信息学管道（rnaseq、sarek、atacseq）。用于分析RNA-seq、WGS/WES或ATAC-seq数据——无论是本地FASTQ文件还是来自GEO/SRA的公共数据集。触发词包括nf-core、Nextflow、FASTQ分析、变异调用、基因表达、差异表达、GEO重新分析、GSE/GSM/SRR访问号或样本表创建。

nf-core管道部署

在本地或公共测序数据上运行nf-core生物信息学管道。

目标用户： 需要运行大规模组学分析——差异表达、变异调用或染色质可及性分析的台面科学家和研究人员，无需专门的生物信息学培训。

工作流清单

- [ ] 步骤0：获取数据（如果来自GEO/SRA）
- [ ] 步骤1：环境检查（必须通过）
- [ ] 步骤2：选择管道（与用户确认）
- [ ] 步骤3：运行测试配置文件（必须通过）
- [ ] 步骤4：创建样本表
- [ ] 步骤5：配置和运行（与用户确认基因组）
- [ ] 步骤6：验证输出

步骤0：获取数据（仅限GEO/SRA）

如果用户有本地FASTQ文件，请跳过此步骤。

对于公共数据集，首先从GEO/SRA获取。完整工作流见references/geo-sra-acquisition.md。

快速开始：

# 1. 获取研究信息
python scripts/sra_geo_fetch.py info GSE110004

# 2. 下载（交互模式）
python scripts/sra_geo_fetch.py download GSE110004 -o ./fastq -i

# 3. 生成样本表
python scripts/sra_geo_fetch.py samplesheet GSE110004 --fastq-dir ./fastq -o samplesheet.csv

决策点： 获取研究信息后，与用户确认：

下载哪个样本子集（如果有多种数据类型）
建议的基因组和管道

然后继续步骤1。

步骤1：环境检查

首先运行。如果不通过环境检查，管道将失败。

python scripts/check_environment.py

所有关键检查必须通过。如果有任何失败，提供修复说明：

Docker问题

问题	修复
未安装	从 https://docs.docker.com/get-docker/ 安装
权限被拒绝	`sudo usermod -aG docker $USER` 然后重新登录
守护进程未运行	`sudo systemctl start docker`

Nextflow问题

问题	修复
未安装	`curl -s https://get.nextflow.io \| bash && mv nextflow ~/bin/`
版本 < 23.04	`nextflow self-update`

Java问题

问题	修复
未安装 / < 11	`sudo apt install openjdk-11-jdk`

在所有检查通过之前不要继续。 对于HPC/Singularity，见references/troubleshooting.md。

步骤2：选择管道

决策点：在继续之前与用户确认。

数据类型	管道	版本	目标
RNA-seq	`rnaseq`	3.22.2	基因表达
WGS/WES	`sarek`	3.7.1	变异调用
ATAC-seq	`atacseq`	2.1.2	染色质可及性

从数据自动检测：

python scripts/detect_data_type.py /path/to/data

管道特定详情：

步骤3：运行测试配置文件

使用小数据验证环境。在真实数据之前必须通过。

nextflow run nf-core/<pipeline> -r <version> -profile test,docker --outdir test_output

管道	命令
rnaseq	`nextflow run nf-core/rnaseq -r 3.22.2 -profile test,docker --outdir test_rnaseq`
sarek	`nextflow run nf-core/sarek -r 3.7.1 -profile test,docker --outdir test_sarek`
atacseq	`nextflow run nf-core/atacseq -r 2.1.2 -profile test,docker --outdir test_atacseq`

验证：

ls test_output/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log

如果测试失败，见references/troubleshooting.md。

步骤4：创建样本表

自动生成

python scripts/generate_samplesheet.py /path/to/data <pipeline> -o samplesheet.csv

脚本：

发现FASTQ/BAM/CRAM文件
配对R1/R2读取
推断样本元数据
在写入前验证

对于sarek： 如果未自动检测，脚本会提示肿瘤/正常状态。

验证现有样本表

python scripts/generate_samplesheet.py --validate samplesheet.csv <pipeline>

样本表格式

rnaseq：

sample,fastq_1,fastq_2,strandedness
SAMPLE1,/abs/path/R1.fq.gz,/abs/path/R2.fq.gz,auto

sarek：

patient,sample,lane,fastq_1,fastq_2,status
patient1,tumor,L001,/abs/path/tumor_R1.fq.gz,/abs/path/tumor_R2.fq.gz,1
patient1,normal,L001,/abs/path/normal_R1.fq.gz,/abs/path/normal_R2.fq.gz,0

atacseq：

sample,fastq_1,fastq_2,replicate
CONTROL,/abs/path/ctrl_R1.fq.gz,/abs/path/ctrl_R2.fq.gz,1

步骤5：配置和运行

5a. 检查基因组可用性

python scripts/manage_genomes.py check <genome>
# 如果未安装：
python scripts/manage_genomes.py download <genome>

常见基因组：GRCh38（人类）、GRCh37（旧版）、GRCm39（小鼠）、R64-1-1（酵母）、BDGP6（果蝇）

5b. 决策点

决策点：与用户确认：

基因组： 使用哪个参考
管道特定选项：
- rnaseq： 对齐器（推荐star_salmon，低内存使用hisat2）
- sarek： 工具（用于种系的haplotypecaller，用于体细胞的mutect2）
- atacseq： 读取长度（50、75、100或150）

5c. 运行管道

nextflow run nf-core/<pipeline> \
    -r <version> \
    -profile docker \
    --input samplesheet.csv \
    --outdir results \
    --genome <genome> \
    -resume

关键标志：

-r：固定版本
-profile docker：使用Docker（或HPC的singularity）
--genome：iGenomes键
-resume：从检查点继续

资源限制（如果需要）：

--max_cpus 8 --max_memory '32.GB' --max_time '24.h'

步骤6：验证输出

检查完成

ls results/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log

按管道的关键输出

rnaseq：

results/star_salmon/salmon.merged.gene_counts.tsv - 基因计数
results/star_salmon/salmon.merged.gene_tpm.tsv - TPM值

sarek：

results/variant_calling/*/ - VCF文件
results/preprocessing/recalibrated/ - BAM文件

atacseq：

results/macs2/narrowPeak/ - 峰调用
results/bwa/mergedLibrary/bigwig/ - 覆盖轨道

快速参考

常见退出代码和修复，见references/troubleshooting.md。

恢复失败运行

nextflow run nf-core/<pipeline> -resume

参考文献

references/geo-sra-acquisition.md - 下载公共GEO/SRA数据
references/troubleshooting.md - 常见问题和修复
references/installation.md - 环境设置
references/pipelines/rnaseq.md - RNA-seq管道详情
references/pipelines/sarek.md - 变异调用详情
references/pipelines/atacseq.md - ATAC-seq详情

免责声明

此技能作为原型示例提供，演示如何将nf-core生物信息学管道集成到Claude Code中，用于自动化分析工作流。当前实现支持三个管道（rnaseq、sarek和atacseq），作为基础，使社区能够扩展支持完整的nf-core管道集。

它用于教育和研究目的，不应被视为生产就绪，除非为您的特定用例进行适当验证。用户负责确保其计算环境满足管道要求，并验证分析结果。

Anthropic不保证生物信息学输出的准确性，用户应遵循验证计算分析的标准实践。此集成未得到nf-core社区的正式认可或关联。

归因

发表结果时，请引用适当的管道。引用可在每个nf-core仓库的CITATIONS.md文件中找到（例如，https://github.com/nf-core/rnaseq/blob/3.22.2/CITATIONS.md）。

许可证

nf-core管道： MIT许可证（https://nf-co.re/about）
Nextflow： Apache许可证，版本2.0（https://www.nextflow.io/about-us.html）
NCBI SRA Toolkit： 公共领域（https://github.com/ncbi/sra-tools/blob/master/LICENSE）