参考文章:
MACS2 Call Peak 参数详细学习
1. 软件说明:
macs2 callpeak --help
usage: macs2 callpeak -t TFILE # treat组[-c [CFILE]] # control 或 mock(非特异性抗体,如IgG)组# control:input DNA,没有经过免疫共沉淀处理;# mock:1)未使用抗体富集与蛋白结合的DNA片段;2)非特异性抗体,如IgG[-f] # MACS2读入文件格式,默认自动检测输入文件格式[-g GSIZE] # 有效基因组大小(可比对基因组大小),人类hs,小鼠mm[-s TSIZE] # 测序读长;如果不设定,MACS 利用输入的前10个序列自动检测[--outdir OUTDIR] # MACS2结果文件保存路径[-n NAME] # 为MACS2输出文件命名[-B] # 保留the fragment pileup, control lambda, -log10pvalue 和 -log10qvalue scores到bedGraph 文件。 [-q QVALUE | -p PVALUE] # qvalue (minimum FDR)设定call significant regions的阈值;# 默认,0.01,对于 broad marks(组蛋白修饰的chipseq),可以使用0.05;# Q-values are calculated from p-values using Benjamini-Hochberg procedure.# 设定p值时, qvalue不再起作用。
2. 操作记录:
(base) zexing@DNA:~/projects/daizhongye/ChIP_data/bam.sort$ macs2 callpeak -t E14-2i-Lif-H3K4me3Q5ser_FKDL192548278-1a.bam.sort -c E14-2i-Lif-input_FKDL192548276-1a.bam.sort -f BAM -g mm --outdir /f/xudonglab/zexing/projects/daizhongye/ChIP_data/macs2_callpeak/ -n E14-2i-Lif -B -q 0.01
INFO @ Wed, 15 Jul 2020 14:18:51:
# Command line: callpeak -t E14-2i-Lif-H3K4me3Q5ser_FKDL192548278-1a.bam.sort -c E14-2i-Lif-input_FKDL192548276-1a.bam.sort -f BAM -g mm --outdir /f/xudonglab/zexing/projects/daizhongye/ChIP_data/macs2_callpeak/ -n E14-2i-Lif -B -q 0.01
# ARGUMENTS LIST:
# name = E14-2i-Lif
# format = BAM
# ChIP-seq file = ['E14-2i-Lif-H3K4me3Q5ser_FKDL192548278-1a.bam.sort']
# control file = ['E14-2i-Lif-input_FKDL192548276-1a.bam.sort']
# effective genome size = 1.87e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 1.00e-02
# The maximum gap between significant sites is assigned as the read length/tag size.
# The minimum length of peaks is assigned as the predicted fragment length "d".
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is off
# Paired-End mode is offINFO @ Wed, 15 Jul 2020 14:18:51: #1 read tag files...
INFO @ Wed, 15 Jul 2020 14:18:51: #1 read treatment tags...
INFO @ Wed, 15 Jul 2020 14:18:57: 1000000
INFO @ Wed, 15 Jul 2020 14:19:03: 2000000
INFO @ Wed, 15 Jul 2020 14:19:09: 3000000
INFO @ Wed, 15 Jul 2020 14:19:14: 4000000
INFO @ Wed, 15 Jul 2020 14:19:20: 5000000
INFO @ Wed, 15 Jul 2020 14:19:25: 6000000
INFO @ Wed, 15 Jul 2020 14:19:31: 7000000
INFO @ Wed, 15 Jul 2020 14:19:37: 8000000
INFO @ Wed, 15 Jul 2020 14:19:42: 9000000
INFO @ Wed, 15 Jul 2020 14:19:48: 10000000
INFO @ Wed, 15 Jul 2020 14:19:54: 11000000
INFO @ Wed, 15 Jul 2020 14:19:59: 12000000
INFO @ Wed, 15 Jul 2020 14:20:05: 13000000
INFO @ Wed, 15 Jul 2020 14:20:10: 14000000
INFO @ Wed, 15 Jul 2020 14:20:16: 15000000
INFO @ Wed, 15 Jul 2020 14:20:22: 16000000
INFO @ Wed, 15 Jul 2020 14:20:27: 17000000
INFO @ Wed, 15 Jul 2020 14:20:33: 18000000
INFO @ Wed, 15 Jul 2020 14:20:39: 19000000
INFO @ Wed, 15 Jul 2020 14:20:44: 20000000
INFO @ Wed, 15 Jul 2020 14:20:50: 21000000
INFO @ Wed, 15 Jul 2020 14:20:56: 22000000
INFO @ Wed, 15 Jul 2020 14:21:07: 22602577 reads have been read.
INFO @ Wed, 15 Jul 2020 14:21:07: #1.2 read input tags...
INFO @ Wed, 15 Jul 2020 14:21:13: 1000000
INFO @ Wed, 15 Jul 2020 14:21:19: 2000000
INFO @ Wed, 15 Jul 2020 14:21:25: 3000000
INFO @ Wed, 15 Jul 2020 14:21:31: 4000000
INFO @ Wed, 15 Jul 2020 14:21:37: 5000000
INFO @ Wed, 15 Jul 2020 14:21:43: 6000000
INFO @ Wed, 15 Jul 2020 14:21:48: 7000000
INFO @ Wed, 15 Jul 2020 14:21:55: 8000000
INFO @ Wed, 15 Jul 2020 14:22:01: 9000000
INFO @ Wed, 15 Jul 2020 14:22:07: 10000000
INFO @ Wed, 15 Jul 2020 14:22:13: 11000000
INFO @ Wed, 15 Jul 2020 14:22:19: 12000000
INFO @ Wed, 15 Jul 2020 14:22:25: 13000000
INFO @ Wed, 15 Jul 2020 14:22:31: 14000000
INFO @ Wed, 15 Jul 2020 14:22:37: 15000000
INFO @ Wed, 15 Jul 2020 14:22:45: 15104730 reads have been read.
INFO @ Wed, 15 Jul 2020 14:22:45: #1 tag size is determined as 150 bps
INFO @ Wed, 15 Jul 2020 14:22:45: #1 tag size = 150.0
INFO @ Wed, 15 Jul 2020 14:22:45: #1 total tags in treatment: 22602577
INFO @ Wed, 15 Jul 2020 14:22:45: #1 user defined the maximum tags...
INFO @ Wed, 15 Jul 2020 14:22:45: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s)
INFO @ Wed, 15 Jul 2020 14:22:45: #1 tags after filtering in treatment: 18059817
INFO @ Wed, 15 Jul 2020 14:22:45: #1 Redundant rate of treatment: 0.20
INFO @ Wed, 15 Jul 2020 14:22:45: #1 total tags in control: 15104730
INFO @ Wed, 15 Jul 2020 14:22:45: #1 user defined the maximum tags...
INFO @ Wed, 15 Jul 2020 14:22:45: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s)
INFO @ Wed, 15 Jul 2020 14:22:45: #1 tags after filtering in control: 12304280
INFO @ Wed, 15 Jul 2020 14:22:45: #1 Redundant rate of control: 0.19
INFO @ Wed, 15 Jul 2020 14:22:45: #1 finished!
INFO @ Wed, 15 Jul 2020 14:22:45: #2 Build Peak Model...
INFO @ Wed, 15 Jul 2020 14:22:45: #2 looking for paired plus/minus strand peaks...
INFO @ Wed, 15 Jul 2020 14:22:48: #2 number of paired peaks: 45538
INFO @ Wed, 15 Jul 2020 14:22:48: start model_add_line...
INFO @ Wed, 15 Jul 2020 14:22:49: start X-correlation...
INFO @ Wed, 15 Jul 2020 14:22:49: end of X-cor
INFO @ Wed, 15 Jul 2020 14:22:49: #2 finished!
INFO @ Wed, 15 Jul 2020 14:22:49: #2 predicted fragment length is 223 bps
INFO @ Wed, 15 Jul 2020 14:22:49: #2 alternative fragment length(s) may be 223 bps
INFO @ Wed, 15 Jul 2020 14:22:49: #2.2 Generate R script for model : /f/xudonglab/zexing/projects/daizhongye/ChIP_data/macs2_callpeak/E14-2i-Lif_model.r
WARNING @ Wed, 15 Jul 2020 14:22:49: #2 Since the d (223) calculated from paired-peaks are smaller than 2*tag length, it may be influenced by unknown sequencing problem!
WARNING @ Wed, 15 Jul 2020 14:22:49: #2 You may need to consider one of the other alternative d(s): 223
WARNING @ Wed, 15 Jul 2020 14:22:49: #2 You can restart the process with --nomodel --extsize XXX with your choice or an arbitrary number. Nontheless, MACS will continute computing.
INFO @ Wed, 15 Jul 2020 14:22:49: #3 Call peaks...
INFO @ Wed, 15 Jul 2020 14:22:49: #3 Pre-compute pvalue-qvalue table...
INFO @ Wed, 15 Jul 2020 14:23:45: #3 In the peak calling step, the following will be performed simultaneously:
INFO @ Wed, 15 Jul 2020 14:23:45: #3 Write bedGraph files for treatment pileup (after scaling if necessary)... E14-2i-Lif_treat_pileup.bdg
INFO @ Wed, 15 Jul 2020 14:23:45: #3 Write bedGraph files for control lambda (after scaling if necessary)... E14-2i-Lif_control_lambda.bdg
INFO @ Wed, 15 Jul 2020 14:23:45: #3 Pileup will be based on sequencing depth in control.
INFO @ Wed, 15 Jul 2020 14:23:45: #3 Call peaks for each chromosome...
INFO @ Wed, 15 Jul 2020 14:24:43: #4 Write output xls file... /f/xudonglab/zexing/projects/daizhongye/ChIP_data/macs2_callpeak/E14-2i-Lif_peaks.xls
INFO @ Wed, 15 Jul 2020 14:24:43: #4 Write peak in narrowPeak format file... /f/xudonglab/zexing/projects/daizhongye/ChIP_data/macs2_callpeak/E14-2i-Lif_peaks.narrowPeak
INFO @ Wed, 15 Jul 2020 14:24:44: #4 Write summits bed file... /f/xudonglab/zexing/projects/daizhongye/ChIP_data/macs2_callpeak/E14-2i-Lif_summits.bed
INFO @ Wed, 15 Jul 2020 14:24:44: Done!
确定代码:
macs2 callpeak -t IgGold.bam.sort -c Input/IgG.bam.sort -f BAM -g mm --outdir /f/.../macs2_callpeak/ -n NAME -B -q 0.01
3. 使用shell script进行批量
#! /bin/bash
#上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
# Program:
# This program is used for peakcalling of ChIP-seq data by macs2.
#History:
# 2020/07/15 zexing First release
#macs2命令为peakcalling命令
#简写代码:macs2 callpeak -t IgGold.bam.sort -c Input/IgG.bam.sort -f BAM -g mm --outdir /f/.../macs2_callpeak/ -n NAME -B -q 0.01
#调用程序macs2 callpeak,参数-t 为treat组,-c为control(Input DNA)或mock组(未使用抗体富集的DNA或IgG抗体),-f为指定输入文件格式,默认为自动检测格>式,-g为有效基因组大小,人类hs,小鼠mm;--outdir指定结果保存路径,-n指定输出文件名的前缀,-q设定call significant regions的值,默认为0.01,对于broad marks(组蛋白修饰的chipseq)可以使用0.05。
#对变量${i}利用 for ${i} in A B C D 的方式遍历指定
for i in E14-2i-Lif E14-S-Lif
do
macs2 callpeak -t /f/xudonglab/zexing/projects/daizhongye/ChIP_data/bam.sort/${i}-H3K4me3Q5ser.bam.sort \
-c /f/xudonglab/zexing/projects/daizhongye/ChIP_data/bam.sort/${i}-input.bam.sort \
-f BAM -g mm -B -q 0.01 \
--outdir /f/xudonglab/zexing/projects/daizhongye/ChIP_data/macs2_callpeak/ -n ${i}
done