文献阅读:Benjamin Buchfink, Chao Xie & Daniel H. Huson, Fast and Sensitive Protein Alignment using DIAMOND, Nature Methods, 12, 59–60 (2015) doi:10.1038/nmeth.3176.
软件下载:https://github.com/bbuchfink/diamond
特点:速度快,比blastx速度快20,000倍
简要使用(核酸比对蛋白):
建立索引:
diamond makedb --in nr.fa -d nr--in : 参考序列(格式:fasta)
-d :索引的前缀名
比对:
diamond blastx -e 1e-5 --db $ref/nr -q $query.fa -o $out.diamond -p 20 -f 6 qseqid qlen qstart qend qcovhsp slen sstart send score evalue positive length ppos sseqid stitle nident mismatch gaps gapopen bitscore pident -e : 比对结果的期望值-db : 参考数据的索引-q : 比对的序列-p : 质量值-f : 输出的文件格式Value 6 may be followed by a space-separated list of these keywords:qseqid means Query Seq - id 查询序列的idqlen means Query sequence length 查询序列的长度sseqid means Subject Seq - id sallseqid means All subject Seq - id(s), separated by a ';'slen means Subject sequence lengthqstart means Start of alignment in query 查询序列比对起始处qend means End of alignment in query 查询序列比对结束处sstart means Start of alignment in subject 比对到参考序列的起始处send means End of alignment in subject 比对到参考序列的结束处qseq means Aligned part of query sequence sseq means Aligned part of subject sequenceevalue means Expect valuebitscore means Bit scorescore means Raw scorelength means Alignment lengthpident means Percentage of identical matchesnident means Number of identical matchesmismatch means Number of mismatchespositive means Number of positive - scoring matchesgapopen means Number of gap openingsgaps means Total number of gapsppos means Percentage of positive - scoring matchesqframe means Query framestitle means Subject Titlesalltitles means All Subject Title(s), separated by a '<>'qcovhsp means Query Coverage Per HSPDefault: qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore