ProbCons: Probabilistic consistency-based multiple sequence alignment

被引:786
作者
Do, CB [1 ]
Mahabhashyam, MSP [1 ]
Brudno, M [1 ]
Batzoglou, S [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
D O I
10.1101/gr.2821705
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
To study gene evolution across a wide range of organisms, biologists need accurate tools for Multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objective functions for measuring alignment quality. In this paper, we introduce probabilistic consistency, a novel scoring function for multiple sequence comparisons. We present ProbCons, a practical tool for progressive protein multiple sequence alignment based oil probabilistic consistency, and evaluate its performance on several standard alignment benchmark data sets. On the BAHBASE, SABmark, and PREFAB benchmark alignment databases, ProbCons achieves statistically significant improvement over other leading methods while maintaining practical speed. ProbCons is publicly available as a Web resource.
引用
收藏
页码:330 / 340
页数:11
相关论文
共 66 条
[1]   AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE [J].
ALTSCHUL, SF .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) :555-565
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   WEIGHTS FOR DATA RELATED BY A TREE [J].
ALTSCHUL, SF ;
CARROLL, RJ ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1989, 207 (04) :647-653
[4]  
Attwood Terri K, 2002, Brief Bioinform, V3, P252, DOI 10.1093/bib/3.3.252
[5]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[6]  
BERGER MP, 1991, COMPUT APPL BIOSCI, V7, P479
[7]   OPTIMAL PROTEIN-STRUCTURE ALIGNMENTS BY MULTIPLE LINKAGE CLUSTERING - APPLICATION TO DISTANTLY RELATED PROTEINS [J].
BOUTONNET, NS ;
ROOMAN, MJ ;
OCHAGAVIA, ME ;
RICHELLE, J ;
WODAK, SJ .
PROTEIN ENGINEERING, 1995, 8 (07) :647-662
[8]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[9]   THE MULTIPLE SEQUENCE ALIGNMENT PROBLEM IN BIOLOGY [J].
CARRILLO, H ;
LIPMAN, D .
SIAM JOURNAL ON APPLIED MATHEMATICS, 1988, 48 (05) :1073-1082
[10]   The functional genomic distribution of protein divergence in two animal phyla: Coevolution, genomic conflict, and constraint [J].
Castillo-Davis, CI ;
Kondrashov, FA ;
Hartl, DL ;
Kulathinal, RJ .
GENOME RESEARCH, 2004, 14 (05) :802-811