MUSCLE: multiple sequence alignment with high accuracy and high throughput

被引:37663
作者
Edgar, RC
机构
[1] Mill Valley, CA 94941
关键词
D O I
10.1093/nar/gkh340
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
引用
收藏
页码:1792 / 1797
页数:6
相关论文
共 45 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations [J].
Bahr, A ;
Thompson, JD ;
Thierry, JC ;
Poch, O .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :323-326
[3]   OPTIMAL PROTEIN-STRUCTURE ALIGNMENTS BY MULTIPLE LINKAGE CLUSTERING - APPLICATION TO DISTANTLY RELATED PROTEINS [J].
BOUTONNET, NS ;
ROOMAN, MJ ;
OCHAGAVIA, ME ;
RICHELLE, J ;
WODAK, SJ .
PROTEIN ENGINEERING, 1995, 8 (07) :647-662
[4]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[5]   LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA [J].
Brudno, M ;
Do, CB ;
Cooper, GM ;
Kim, MF ;
Davydov, E ;
Green, ED ;
Sidow, A ;
Batzoglou, S .
GENOME RESEARCH, 2003, 13 (04) :721-731
[6]   A comparison of scoring functions for protein sequence profile alignment [J].
Edgar, RC ;
Sjölander, K .
BIOINFORMATICS, 2004, 20 (08) :1301-1308
[7]   COACH:: profile-profile alignment of protein families using hidden Markov models [J].
Edgar, RC ;
Sjölander, K .
BIOINFORMATICS, 2004, 20 (08) :1309-1318
[8]   Local homology recognition and distance measures in linear time using compressed amino acid alphabets [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (01) :380-385
[9]   Structure comparison and structure patterns [J].
Eidhammer, I ;
Jonassen, I ;
Taylor, WR .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (05) :685-716
[10]   PROGRESSIVE SEQUENCE ALIGNMENT AS A PREREQUISITE TO CORRECT PHYLOGENETIC TREES [J].
FENG, DF ;
DOOLITTLE, RF .
JOURNAL OF MOLECULAR EVOLUTION, 1987, 25 (04) :351-360