Pattern-constrained multiple polypeptide sequence alignment

被引:4
作者
Du, ZH [1 ]
Lin, F [1 ]
机构
[1] Nanyang Technol Univ, BioInformat Res Ctr, Singapore 639798, Singapore
关键词
multiple sequence alignment; prosite databank; structural information; domain knowledge; regular expression;
D O I
10.1016/j.compbiolchem.2005.06.002
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Multiple sequence alignment (MSA) is one of the fundamental research topics in computational biology. The alignments help us to find functional assignment, evolutionary history and conserved region. Previous methods use a substitution matrix and do not incorporate knowledge of the sequences being aligned. Therefore, they do not assure the alignment of similar structures and common patterns in the sequences. We have been investigating into the solution to the problem in multiple and making use of knowledge of the sequences being aligned, including patterns in the Prosite databank, Blocks+, eBlocks databases, as well as motif and structural information. A pattern-constrained algorithm has been developed. Experiments with protein sequences have shown more accurate alignments with incorporation of the domain knowledge available in the sequences. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:303 / 307
页数:5
相关论文
共 22 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   PRINTS prepares for the new millennium [J].
Attwood, TK ;
Flower, DR ;
Lewis, AP ;
Mabey, JE ;
Morgan, SR ;
Scordis, P ;
Selley, JN ;
Wright, W .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :220-225
[3]   Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins [J].
Bateman, A ;
Birney, E ;
Durbin, R ;
Eddy, SR ;
Finn, RD ;
Sonnhammer, ELL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :260-262
[4]   THE MULTIPLE SEQUENCE ALIGNMENT PROBLEM IN BIOLOGY [J].
CARRILLO, H ;
LIPMAN, D .
SIAM JOURNAL ON APPLIED MATHEMATICS, 1988, 48 (05) :1073-1082
[5]   A SURVEY OF MULTIPLE SEQUENCE COMPARISON METHODS [J].
CHAN, SC ;
WONG, AKC ;
CHIU, DKY .
BULLETIN OF MATHEMATICAL BIOLOGY, 1992, 54 (04) :563-598
[6]   Pairwise sequence alignment using a PROSITE pattern-derived similarity score [J].
Comet, JP ;
Henry, J .
COMPUTERS & CHEMISTRY, 2002, 26 (05) :421-436
[7]   The ProDom database of protein domain families [J].
Corpet, F ;
Gouzy, J ;
Kahn, D .
NUCLEIC ACIDS RESEARCH, 1998, 26 (01) :323-326
[8]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[9]  
Gattiker Alexandre, 2002, Appl Bioinformatics, V1, P107
[10]   Automated protein sequence database classification. II. Delineation of domain boundaries from sequence similarities [J].
Gracy, J ;
Argos, P .
BIOINFORMATICS, 1998, 14 (02) :174-187