Towards a covering set of protein family profiles

被引:43
作者
Heger, A [1 ]
Holm, L [1 ]
机构
[1] EMBL EBI, Struct Genom Grp, Cambridge CB10 1SD, England
关键词
clustering; domains; homology; sequence alignment; structural genomics;
D O I
10.1016/S0079-6107(00)00013-4
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Evolutionary classification leads to an economical description of the protein sequence universe because attributes of function and structure are inherited in protein families. Efficient strategies of functional and structural genomics therefore target one representative from each family. Enumerating all families and establishing family membership consistently based on sequence similarities are nontrivial computational problems. Emerging concepts and caveats of global sequence clustering are reviewed. Explicit multiple alignments coupled with neighbourhood analysis lead to domain segmentation, and hierarchical unification helps to resolve conflicts and validate clusters. Eventually, every part of every sequence will be assigned to a domain family which is uniquely associated with a fold and a molecular function. (C) 2000 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:321 / 337
页数:17
相关论文
共 54 条
[1]
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]
Automated genome sequence analysis and annotation [J].
Andrade, MA ;
Brown, NP ;
Leroy, C ;
Hoersch, S ;
de Daruvar, A ;
Reich, C ;
Franchini, A ;
Tamames, J ;
Valencia, A ;
Ouzounis, C ;
Sander, C .
BIOINFORMATICS, 1999, 15 (05) :391-412
[3]
PRINTS-S: the database formerly known as PRINTS [J].
Attwood, TK ;
Croning, MDR ;
Flower, DR ;
Lewis, AP ;
Mabey, JE ;
Scordis, P ;
Selley, JN ;
Wright, W .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :225-227
[4]
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[5]
Bates PA, 1999, PROTEINS, P47
[6]
THE CYTIDYLYLTRANSFERASE SUPERFAMILY - IDENTIFICATION OF THE NUCLEOTIDE-BINDING SITE AND FOLD PREDICTION [J].
BORK, P ;
HOLM, L ;
KOONIN, EV ;
SANDER, C .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1995, 22 (03) :259-266
[7]
Predicting functions from protein sequences - where are the bottlenecks? [J].
Bork, P ;
Koonin, EV .
NATURE GENETICS, 1998, 18 (04) :313-318
[8]
Approaches to the automatic discovery of patterns in biosequences [J].
Brazma, A ;
Jonassen, I ;
Eidhammer, I ;
Gilbert, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (02) :279-305
[9]
Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[10]
ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons [J].
Corpet, F ;
Servant, F ;
Gouzy, J ;
Kahn, D .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :267-269