Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits

被引:43
作者
Dessimoz, Christophe [1 ]
Boeckmann, Brigitte
Roth, Alexander C. J.
Gonnet, Gaston H.
机构
[1] ETH, Inst Computat Sci, CH-8092 Zurich, Switzerland
[2] CMU, Swiss Inst Bioinformat, CH-1211 Geneva, Switzerland
关键词
D O I
10.1093/nar/gkl433
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.
引用
收藏
页码:3309 / 3316
页数:8
相关论文
共 30 条
[1]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[2]   The Escherichia coli RNA degradosome:: structure, function and relationship to other ribonucleolytic multienyzme complexes [J].
Carpousis, AJ .
BIOCHEMICAL SOCIETY TRANSACTIONS, 2002, 30 :150-155
[3]   The DEAD-box RNA helicase SrmB is involved in the assembly of 50S ribosomal subunits in Escherichia coli [J].
Charollais, J ;
Pflieger, D ;
Vinh, J ;
Dreyfus, M ;
Iost, I .
MOLECULAR MICROBIOLOGY, 2003, 48 (05) :1253-1265
[4]   Multiple sequence alignment with the Clustal series of programs [J].
Chenna, R ;
Sugawara, H ;
Koike, T ;
Lopez, R ;
Gibson, TJ ;
Higgins, DG ;
Thompson, JD .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3497-3500
[5]  
Dessimoz C, 2005, LECT NOTES COMPUT SC, V3678, P61
[6]   Escherichia coli DbPA is a 3′ → 5′ RNA helicase [J].
Diges, CM ;
Uhlenbeck, OC .
BIOCHEMISTRY, 2005, 44 (21) :7903-7911
[7]   CONVERGENT EVOLUTION - THE NEED TO BE EXPLICIT [J].
DOOLITTLE, RF .
TRENDS IN BIOCHEMICAL SCIENCES, 1994, 19 (01) :15-18
[8]   MUSCLE: a multiple sequence alignment method with reduced time and space complexity [J].
Edgar, RC .
BMC BIOINFORMATICS, 2004, 5 (1) :1-19
[9]  
Felsenstein J., 1993, PHYLIP PHYLOGENY INF
[10]   DISTINGUISHING HOMOLOGOUS FROM ANALOGOUS PROTEINS [J].
FITCH, WM .
SYSTEMATIC ZOOLOGY, 1970, 19 (02) :99-&