A comprehensive comparison of multiple sequence alignment programs

被引:530
作者
Thompson, JD [1 ]
Plewniak, F [1 ]
Poch, O [1 ]
机构
[1] ULP, INSERM, CNRS,Lab Biol Struct, Inst Genet & Biol Mol & Cellulaire, F-67404 Illkirch, France
关键词
D O I
10.1093/nar/27.13.2682
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In recent years improvements to existing programs and the introduction of new iterative algorithms have changed the state-of-the-art in protein sequence alignment. This paper presents the first systematic study of the most commonly used alignment programs using BAliBASE benchmark alignments as test cases, Even below the 'twilight zone' at 10-20% residue identity, the best programs were capable of correctly aligning on average 47% of the residues. We show that iterative algorithms often offer improved alignment accuracy though at the expense of computation time, A notable exception was the effect of introducing a single divergent sequence into a set of closely related sequences, causing the iteration to diverge away from the best alignment. Global alignment programs generally performed better than local methods, except in the presence of large N/C-terminal extensions and internal insertions. In these cases, a local algorithm was more successful in identifying the most conserved motifs, This study enables us to propose appropriate alignment strategies, depending on the nature of a particular set of sequences. The employment of more than one program based on different alignment techniques should significantly improve the quality of automatic protein sequence alignment methods. The results also indicate guidelines for improvement of alignment algorithms.
引用
收藏
页码:2682 / 2690
页数:9
相关论文
共 30 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION [J].
BALDI, P ;
CHAUVIN, Y ;
HUNKAPILLER, T ;
MCCLURE, MA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (03) :1059-1063
[3]   A STRATEGY FOR THE RAPID MULTIPLE ALIGNMENT OF PROTEIN SEQUENCES - CONFIDENCE LEVELS FROM TERTIARY STRUCTURE COMPARISONS [J].
BARTON, GJ ;
STERNBERG, MJE .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 198 (02) :327-337
[4]   Comparative analysis of seven multiple protein sequence alignment servers: clues to enhance reliability of predictions [J].
Briffeuil, P ;
Baudoux, G ;
Lambert, C ;
De Bolle, X ;
Vinals, C ;
Feytmans, E ;
Depiereux, E .
BIOINFORMATICS, 1998, 14 (04) :357-366
[5]   SIMILAR AMINO-ACID-SEQUENCES - CHANCE OR COMMON ANCESTRY [J].
DOOLITTLE, RF .
SCIENCE, 1981, 214 (4517) :149-159
[6]  
Eddy S R, 1995, Proc Int Conf Intell Syst Mol Biol, V3, P114
[7]   PROGRESSIVE SEQUENCE ALIGNMENT AS A PREREQUISITE TO CORRECT PHYLOGENETIC TREES [J].
FENG, DF ;
DOOLITTLE, RF .
JOURNAL OF MOLECULAR EVOLUTION, 1987, 25 (04) :351-360
[9]   Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments [J].
Gotoh, O .
JOURNAL OF MOLECULAR BIOLOGY, 1996, 264 (04) :823-838
[10]   Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment [J].
Gracy, J ;
Argos, P .
BIOINFORMATICS, 1998, 14 (02) :164-173