Sensitive pattern discovery with 'fuzzy' alignments of distantly related proteins

被引:35
作者
Heger, Andreas [1 ]
Holm, Liisa [1 ]
机构
[1] Univ Helsinki, Inst Biotechnol, FIN-00014 Helsinki, Finland
关键词
D O I
10.1093/bioinformatics/btg1017
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Evolutionary comparison leads to efficient functional characterisation of hypothetical proteins. Here, our goal is to map specific sequence patterns to putative functional classes. The evolutionary signal stands out most clearly in a maximally diverse set of homologues. This diversity, however, leads to a number of technical difficulties. The targeted patterns-as gleaned from structure comparisons-are too sparse for statistically significant signals of sequence similarity and accurate multiple sequence alignment. Results: We address this problem by a fuzzy alignment model, which probabilistically assigns residues to structurally equivalent positions (attributes) of the proteins. We then apply multivariate analysis to the 'attributes x proteins' matrix. The dimensionality of the space is reduced using non-negative matrix factorization. The method is general, fully automatic and works without assumptions about pattern density, minimum support, explicit multiple alignments, phylogenetic trees, etc. We demonstrate the discovery of biologically meaningful patterns in an extremely diverse superfamily related to urease.
引用
收藏
页码:i130 / i137
页数:8
相关论文
共 16 条
[1]   AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE [J].
ALTSCHUL, SF .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) :555-565
[2]   Approaches to the automatic discovery of patterns in biosequences [J].
Brazma, A ;
Jonassen, I ;
Eidhammer, I ;
Gilbert, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (02) :279-305
[3]   Biochemical characterization and crystallographic structure of an Escherichia coli protein from the phosphotriesterase gene family [J].
Buchbinder, JL ;
Stephenson, RC ;
Dresser, MJ ;
Pitera, JW ;
Scanlan, TS ;
Fletterick, RJ .
BIOCHEMISTRY, 1998, 37 (15) :5096-5106
[4]   A METHOD TO PREDICT FUNCTIONAL RESIDUES IN PROTEINS [J].
CASARI, G ;
SANDER, C ;
VALENCIA, A .
NATURE STRUCTURAL BIOLOGY, 1995, 2 (02) :171-178
[5]  
GRUNDY WN, 1997, CABIOS, V5, P211
[6]  
HEGER A, 2003, ACCURATE DE IN PRESS
[7]   Unification of protein families [J].
Holm, L .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1998, 8 (03) :372-379
[8]  
Holm L, 1997, PROTEINS, V28, P72, DOI 10.1002/(SICI)1097-0134(199705)28:1<72::AID-PROT7>3.3.CO
[9]  
2-T
[10]  
Hyvarinen A., 1999, Neural Computing Surveys, V2