当前位置: 代码迷 >> 综合 >> CHIP-seq流程学习笔记(4)-call peak 软件macs2
  详细解决方案

CHIP-seq流程学习笔记(4)-call peak 软件macs2

热度:54   发布时间:2024-02-21 09:20:31.0

参考文章:

MACS2 Call Peak 参数详细学习

1. 软件说明:

macs2 callpeak --help
usage: macs2 callpeak -t TFILE     # treat组[-c [CFILE]] # control 或 mock(非特异性抗体,如IgG)组# control:input DNA,没有经过免疫共沉淀处理;# mock:1)未使用抗体富集与蛋白结合的DNA片段;2)非特异性抗体,如IgG[-f]         # MACS2读入文件格式,默认自动检测输入文件格式[-g GSIZE]   # 有效基因组大小(可比对基因组大小),人类hs,小鼠mm[-s TSIZE]   # 测序读长;如果不设定,MACS 利用输入的前10个序列自动检测[--outdir OUTDIR] # MACS2结果文件保存路径[-n NAME]    # 为MACS2输出文件命名[-B]  # 保留the fragment pileup, control lambda, -log10pvalue 和 -log10qvalue scores到bedGraph 文件。 [-q QVALUE | -p PVALUE] # qvalue (minimum FDR)设定call significant regions的阈值;# 默认,0.01,对于 broad marks(组蛋白修饰的chipseq),可以使用0.05;# Q-values are calculated from p-values using Benjamini-Hochberg procedure.# 设定p值时, qvalue不再起作用。

2. 操作记录:

(base) zexing@DNA:~/projects/daizhongye/ChIP_data/bam.sort$ macs2 callpeak -t E14-2i-Lif-H3K4me3Q5ser_FKDL192548278-1a.bam.sort -c E14-2i-Lif-input_FKDL192548276-1a.bam.sort -f BAM -g mm --outdir /f/xudonglab/zexing/projects/daizhongye/ChIP_data/macs2_callpeak/ -n E14-2i-Lif -B -q 0.01
INFO  @ Wed, 15 Jul 2020 14:18:51:
# Command line: callpeak -t E14-2i-Lif-H3K4me3Q5ser_FKDL192548278-1a.bam.sort -c E14-2i-Lif-input_FKDL192548276-1a.bam.sort -f BAM -g mm --outdir /f/xudonglab/zexing/projects/daizhongye/ChIP_data/macs2_callpeak/ -n E14-2i-Lif -B -q 0.01
# ARGUMENTS LIST:
# name = E14-2i-Lif
# format = BAM
# ChIP-seq file = ['E14-2i-Lif-H3K4me3Q5ser_FKDL192548278-1a.bam.sort']
# control file = ['E14-2i-Lif-input_FKDL192548276-1a.bam.sort']
# effective genome size = 1.87e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 1.00e-02
# The maximum gap between significant sites is assigned as the read length/tag size.
# The minimum length of peaks is assigned as the predicted fragment length "d".
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is off
# Paired-End mode is offINFO  @ Wed, 15 Jul 2020 14:18:51: #1 read tag files...
INFO  @ Wed, 15 Jul 2020 14:18:51: #1 read treatment tags...
INFO  @ Wed, 15 Jul 2020 14:18:57:  1000000
INFO  @ Wed, 15 Jul 2020 14:19:03:  2000000
INFO  @ Wed, 15 Jul 2020 14:19:09:  3000000
INFO  @ Wed, 15 Jul 2020 14:19:14:  4000000
INFO  @ Wed, 15 Jul 2020 14:19:20:  5000000
INFO  @ Wed, 15 Jul 2020 14:19:25:  6000000
INFO  @ Wed, 15 Jul 2020 14:19:31:  7000000
INFO  @ Wed, 15 Jul 2020 14:19:37:  8000000
INFO  @ Wed, 15 Jul 2020 14:19:42:  9000000
INFO  @ Wed, 15 Jul 2020 14:19:48:  10000000
INFO  @ Wed, 15 Jul 2020 14:19:54:  11000000
INFO  @ Wed, 15 Jul 2020 14:19:59:  12000000
INFO  @ Wed, 15 Jul 2020 14:20:05:  13000000
INFO  @ Wed, 15 Jul 2020 14:20:10:  14000000
INFO  @ Wed, 15 Jul 2020 14:20:16:  15000000
INFO  @ Wed, 15 Jul 2020 14:20:22:  16000000
INFO  @ Wed, 15 Jul 2020 14:20:27:  17000000
INFO  @ Wed, 15 Jul 2020 14:20:33:  18000000
INFO  @ Wed, 15 Jul 2020 14:20:39:  19000000
INFO  @ Wed, 15 Jul 2020 14:20:44:  20000000
INFO  @ Wed, 15 Jul 2020 14:20:50:  21000000
INFO  @ Wed, 15 Jul 2020 14:20:56:  22000000
INFO  @ Wed, 15 Jul 2020 14:21:07: 22602577 reads have been read.
INFO  @ Wed, 15 Jul 2020 14:21:07: #1.2 read input tags...
INFO  @ Wed, 15 Jul 2020 14:21:13:  1000000
INFO  @ Wed, 15 Jul 2020 14:21:19:  2000000
INFO  @ Wed, 15 Jul 2020 14:21:25:  3000000
INFO  @ Wed, 15 Jul 2020 14:21:31:  4000000
INFO  @ Wed, 15 Jul 2020 14:21:37:  5000000
INFO  @ Wed, 15 Jul 2020 14:21:43:  6000000
INFO  @ Wed, 15 Jul 2020 14:21:48:  7000000
INFO  @ Wed, 15 Jul 2020 14:21:55:  8000000
INFO  @ Wed, 15 Jul 2020 14:22:01:  9000000
INFO  @ Wed, 15 Jul 2020 14:22:07:  10000000
INFO  @ Wed, 15 Jul 2020 14:22:13:  11000000
INFO  @ Wed, 15 Jul 2020 14:22:19:  12000000
INFO  @ Wed, 15 Jul 2020 14:22:25:  13000000
INFO  @ Wed, 15 Jul 2020 14:22:31:  14000000
INFO  @ Wed, 15 Jul 2020 14:22:37:  15000000
INFO  @ Wed, 15 Jul 2020 14:22:45: 15104730 reads have been read.
INFO  @ Wed, 15 Jul 2020 14:22:45: #1 tag size is determined as 150 bps
INFO  @ Wed, 15 Jul 2020 14:22:45: #1 tag size = 150.0
INFO  @ Wed, 15 Jul 2020 14:22:45: #1 total tags in treatment: 22602577
INFO  @ Wed, 15 Jul 2020 14:22:45: #1 user defined the maximum tags...
INFO  @ Wed, 15 Jul 2020 14:22:45: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s)
INFO  @ Wed, 15 Jul 2020 14:22:45: #1 tags after filtering in treatment: 18059817
INFO  @ Wed, 15 Jul 2020 14:22:45: #1 Redundant rate of treatment: 0.20
INFO  @ Wed, 15 Jul 2020 14:22:45: #1 total tags in control: 15104730
INFO  @ Wed, 15 Jul 2020 14:22:45: #1 user defined the maximum tags...
INFO  @ Wed, 15 Jul 2020 14:22:45: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s)
INFO  @ Wed, 15 Jul 2020 14:22:45: #1 tags after filtering in control: 12304280
INFO  @ Wed, 15 Jul 2020 14:22:45: #1 Redundant rate of control: 0.19
INFO  @ Wed, 15 Jul 2020 14:22:45: #1 finished!
INFO  @ Wed, 15 Jul 2020 14:22:45: #2 Build Peak Model...
INFO  @ Wed, 15 Jul 2020 14:22:45: #2 looking for paired plus/minus strand peaks...
INFO  @ Wed, 15 Jul 2020 14:22:48: #2 number of paired peaks: 45538
INFO  @ Wed, 15 Jul 2020 14:22:48: start model_add_line...
INFO  @ Wed, 15 Jul 2020 14:22:49: start X-correlation...
INFO  @ Wed, 15 Jul 2020 14:22:49: end of X-cor
INFO  @ Wed, 15 Jul 2020 14:22:49: #2 finished!
INFO  @ Wed, 15 Jul 2020 14:22:49: #2 predicted fragment length is 223 bps
INFO  @ Wed, 15 Jul 2020 14:22:49: #2 alternative fragment length(s) may be 223 bps
INFO  @ Wed, 15 Jul 2020 14:22:49: #2.2 Generate R script for model : /f/xudonglab/zexing/projects/daizhongye/ChIP_data/macs2_callpeak/E14-2i-Lif_model.r
WARNING @ Wed, 15 Jul 2020 14:22:49: #2 Since the d (223) calculated from paired-peaks are smaller than 2*tag length, it may be influenced by unknown sequencing problem!
WARNING @ Wed, 15 Jul 2020 14:22:49: #2 You may need to consider one of the other alternative d(s): 223
WARNING @ Wed, 15 Jul 2020 14:22:49: #2 You can restart the process with --nomodel --extsize XXX with your choice or an arbitrary number. Nontheless, MACS will continute computing.
INFO  @ Wed, 15 Jul 2020 14:22:49: #3 Call peaks...
INFO  @ Wed, 15 Jul 2020 14:22:49: #3 Pre-compute pvalue-qvalue table...
INFO  @ Wed, 15 Jul 2020 14:23:45: #3 In the peak calling step, the following will be performed simultaneously:
INFO  @ Wed, 15 Jul 2020 14:23:45: #3 Write bedGraph files for treatment pileup (after scaling if necessary)... E14-2i-Lif_treat_pileup.bdg
INFO  @ Wed, 15 Jul 2020 14:23:45: #3 Write bedGraph files for control lambda (after scaling if necessary)... E14-2i-Lif_control_lambda.bdg
INFO  @ Wed, 15 Jul 2020 14:23:45: #3 Pileup will be based on sequencing depth in control.
INFO  @ Wed, 15 Jul 2020 14:23:45: #3 Call peaks for each chromosome...
INFO  @ Wed, 15 Jul 2020 14:24:43: #4 Write output xls file... /f/xudonglab/zexing/projects/daizhongye/ChIP_data/macs2_callpeak/E14-2i-Lif_peaks.xls
INFO  @ Wed, 15 Jul 2020 14:24:43: #4 Write peak in narrowPeak format file... /f/xudonglab/zexing/projects/daizhongye/ChIP_data/macs2_callpeak/E14-2i-Lif_peaks.narrowPeak
INFO  @ Wed, 15 Jul 2020 14:24:44: #4 Write summits bed file... /f/xudonglab/zexing/projects/daizhongye/ChIP_data/macs2_callpeak/E14-2i-Lif_summits.bed
INFO  @ Wed, 15 Jul 2020 14:24:44: Done!

确定代码:

macs2 callpeak -t IgGold.bam.sort -c Input/IgG.bam.sort -f BAM -g mm --outdir /f/.../macs2_callpeak/ -n NAME -B -q 0.01

3. 使用shell script进行批量

#! /bin/bash
#上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
# Program:
# This program is used for peakcalling of ChIP-seq data by macs2.
#History:
# 2020/07/15 zexing First release
#macs2命令为peakcalling命令
#简写代码:macs2 callpeak -t IgGold.bam.sort -c Input/IgG.bam.sort -f BAM -g mm --outdir /f/.../macs2_callpeak/ -n NAME -B -q 0.01
#调用程序macs2 callpeak,参数-t 为treat组,-c为control(Input DNA)或mock组(未使用抗体富集的DNA或IgG抗体),-f为指定输入文件格式,默认为自动检测格>式,-g为有效基因组大小,人类hs,小鼠mm;--outdir指定结果保存路径,-n指定输出文件名的前缀,-q设定call significant regions的值,默认为0.01,对于broad marks(组蛋白修饰的chipseq)可以使用0.05。
#对变量${i}利用 for ${i} in A B C D 的方式遍历指定
for i in E14-2i-Lif E14-S-Lif
do
macs2 callpeak -t /f/xudonglab/zexing/projects/daizhongye/ChIP_data/bam.sort/${i}-H3K4me3Q5ser.bam.sort \
-c /f/xudonglab/zexing/projects/daizhongye/ChIP_data/bam.sort/${i}-input.bam.sort \
-f BAM -g mm -B -q 0.01 \
--outdir /f/xudonglab/zexing/projects/daizhongye/ChIP_data/macs2_callpeak/ -n ${i}
done
  相关解决方案