Accurate detection of very sparse sequence motifs

被引:14
作者
Heger, A
Lappe, M
Holm, L
机构
[1] Univ Helsinki, Inst Biotechnol, FIN-00014 Helsinki, Finland
[2] EMBL EBI, Cambridge, England
[3] Univ Helsinki, Dept Genet, SF-00100 Helsinki, Finland
关键词
protein sequence alignment; consistency; protein sequence motifs; transitive alignment;
D O I
10.1089/1066527042432242
中图分类号
Q5 [生物化学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Protein sequence alignments are more reliable the shorter the evolutionary distance. Here, we align distantly related proteins using many closely spaced intermediate sequences as stepping stones. Such transitive alignments can be generated between any two proteins in a connected set, whether they are direct or indirect sequence neighbors in the underlying library of pairwise alignments. We have implemented a greedy algorithm, MaxFlow, using a novel consistency score to estimate the relative likelihood of alternative paths of transitive alignment. In contrast to traditional profile models of amino acid preferences, MaxFlow models the probability that two positions are structurally equivalent and retains high information content across large distances in sequence space. Thus, MaxFlow is able to identify sparse and narrow active-site sequence signatures which are embedded in high-entropy sequence segments in the structure based multiple alignment of large diverse enzyme superfamilies. In a challenging benchmark based on the urease superfamily, MaxFlow yields better reliability and double coverage compared to available sequence alignment software. This promises to increase information returns from functional and structural genomics, where reliable sequence alignment is a bottleneck to transferring the functional or structural characterization of model proteins to entire protein superfamilies.
引用
收藏
页码:843 / 857
页数:15
相关论文
共 22 条
[1]
AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE [J].
ALTSCHUL, SF .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) :555-565
[2]
[Anonymous], LECT NOTES COMPUTER
[3]
THE CYTIDYLYLTRANSFERASE SUPERFAMILY - IDENTIFICATION OF THE NUCLEOTIDE-BINDING SITE AND FOLD PREDICTION [J].
BORK, P ;
HOLM, L ;
KOONIN, EV ;
SANDER, C .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1995, 22 (03) :259-266
[4]
A METHOD TO PREDICT FUNCTIONAL RESIDUES IN PROTEINS [J].
CASARI, G ;
SANDER, C ;
VALENCIA, A .
NATURE STRUCTURAL BIOLOGY, 1995, 2 (02) :171-178
[5]
A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3 [J].
Dietmann, S ;
Park, J ;
Notredame, C ;
Heger, A ;
Lappe, M ;
Holm, L .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :55-57
[6]
Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[7]
Completion and refinement of 3-D homology models with restricted molecular dynamics: Application to targets 47, 58, and 111 in the CASP modeling competition and posterior analysis [J].
Flohil, JA ;
Vriend, G ;
Berendsen, HJC .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2002, 48 (04) :593-604
[8]
GRUNDY WN, 1997, CABIOS, V5, P211
[9]
HEGER A, 2003, IN PRESS ISMB 03
[10]
HEGER A, 2003, IN PRESS J STRUCT FU