An algorithm for progressive multiple alignment of sequences with insertions

被引:703
作者
Löytynoja, A [1 ]
Goldman, N [1 ]
机构
[1] European Bioinformat Inst, European Mol Biol Lab, Hinxton CB10 1SD, England
基金
英国惠康基金;
关键词
insertion/deletion; progressive algorithm; sequence alignment;
D O I
10.1073/pnas.0409137102
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Dynamic programming algorithms guarantee to find the optimal alignment between two sequences. For more than a few sequences, exact algorithms become computationally impractical, and progressive algorithms iterating pairwise alignments are widely used. These heuristic methods have a serious drawback because pairwise algorithms do not differentiate insertions from deletions and end up penalizing single insertion events multiple times. Such an unrealistically high penalty for insertions typically results in overmatching of sequences and an underestimation of the number of insertion events. We describe a modification of the traditional alignment algorithm that can distinguish insertion from deletion and avoid repeated penalization of insertions and illustrate this method with a pair hidden Markov model that uses an evolutionary scoring function. In comparison with a traditional progressive alignment method, our algorithm infers a greater number of insertion events and creates gaps that are phylogenetically consistent but spatially less concentrated. Our results suggest that some insertion/deletion "hot spots" may actually be artifacts of traditional alignment algorithms.
引用
收藏
页码:10557 / 10562
页数:6
相关论文
共 20 条
[1]  
Durbin R., 1998, BIOL SEQUENCE ANAL
[2]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[3]   MUSCLE: a multiple sequence alignment method with reduced time and space complexity [J].
Edgar, RC .
BMC BIOINFORMATICS, 2004, 5 (1) :1-19
[4]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[5]  
Gonnet G. H., 1996, Algorithm Theory - SWAT '96. 5th Scandinavian Workshop on Algorithm Theory. Proceedings, P380
[6]   AN IMPROVED ALGORITHM FOR MATCHING BIOLOGICAL SEQUENCES [J].
GOTOH, O .
JOURNAL OF MOLECULAR BIOLOGY, 1982, 162 (03) :705-708
[7]   DATING OF THE HUMAN APE SPLITTING BY A MOLECULAR CLOCK OF MITOCHONDRIAL-DNA [J].
HASEGAWA, M ;
KISHINO, H ;
YANO, TA .
JOURNAL OF MOLECULAR EVOLUTION, 1985, 22 (02) :160-174
[8]  
HEIN J, 1989, MOL BIOL EVOL, V6, P649
[9]   LINEAR SPACE ALGORITHM FOR COMPUTING MAXIMAL COMMON SUBSEQUENCES [J].
HIRSCHBERG, DS .
COMMUNICATIONS OF THE ACM, 1975, 18 (06) :341-343
[10]   THE ALIGNMENT OF SETS OF SEQUENCES AND THE CONSTRUCTION OF PHYLETIC TREES - AN INTEGRATED METHOD [J].
HOGEWEG, P ;
HESPER, B .
JOURNAL OF MOLECULAR EVOLUTION, 1984, 20 (02) :175-186