Expansion of the BioCyc collection of pathway/genome databases to 160 genomes

被引:415
作者
Karp, PD
Ouzounis, CA
Moore-Kochlacs, C
Goldovsky, L
Kaipa, P
Ahrén, D
Tsoka, S
Darzentas, N
Kunin, V
López-Bigas, N
机构
[1] SRI Int, Bioinformat Res Grp, Menlo Pk, CA 94025 USA
[2] EMBL Cambridge Outstn, European Bioinformat Inst, Computat Genom Grp, Cambridge CB10 1SD, England
关键词
D O I
10.1093/nar/gki892
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The BioCyc database collection is a set of 160 pathway/genome databases (PGDBs) for most eukaryotic and prokaryotic species whose genomes have been completely sequenced to date. Each PGDB in the BioCyc collection describes the genome and predicted metabolic network of a single organism, inferred from the MetaCyc database, which is a reference source on metabolic pathways from multiple organisms. In addition, each bacterial PGDB includes predicted operons for the corresponding species. The BioCyc collection provides a unique resource for computational systems biology, namely global and comparative analyses of genomes and metabolic networks, and a supplement to the BioCyc resource of curated PGDBs. The Omics viewer available through the BioCyc website allows scientists to visualize combinations of gene expression, proteomics and metabolomics data on the metabolic maps of these organisms. This paper discusses the computational methodology by which the BioCyc collection has been expanded, and presents an aggregate analysis of the collection that includes the range of number of pathways present in these organisms, and the most frequently observed pathways. We seek scientists to adopt and curate individual PGDBs within the BioCyc collection. Only by harnessing the expertise of many scientists we can hope to produce biological databases, which accurately reflect the depth and breadth of knowledge that the biomedical research community is producing.
引用
收藏
页码:6083 / 6089
页数:7
相关论文
共 39 条
[1]  
Ahren Dag G, 2004, J Bioinform Comput Biol, V2, P589, DOI 10.1142/S021972000400079X
[2]   The universal protein resource (UniProt) [J].
Bairoch, A ;
Apweiler, R ;
Wu, CH ;
Barker, WC ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E ;
Huang, HZ ;
Lopez, R ;
Magrane, M ;
Martin, MJ ;
Natale, DA ;
O'Donovan, C ;
Redaschi, N ;
Yeh, LSL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D154-D159
[3]   GenBank [J].
Benson, DA ;
Karsch-Mizrachi, I ;
Lipman, DJ ;
Ostell, J ;
Wheeler, DL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D34-D38
[4]   Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms [J].
Christie, KR ;
Weng, S ;
Balakrishnan, R ;
Costanzo, MC ;
Dolinski, K ;
Dwight, SS ;
Engel, SR ;
Feierbach, B ;
Fisk, DG ;
Hirschman, JE ;
Hong, EL ;
Issel-Tarver, L ;
Nash, R ;
Sethuraman, A ;
Starr, B ;
Theesfeld, CL ;
Andrada, R ;
Binkley, G ;
Dong, Q ;
Lane, C ;
Schroeder, M ;
Botstein, D ;
Cherry, JM .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D311-D314
[5]  
desJardins M, 1997, ISMB-97 - FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS FOR MOLECULAR BIOLOGY, PROCEEDINGS, P92
[6]   FlyBase: genes and gene models [J].
Drysdale, RA ;
Crosby, MA .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D390-D395
[7]   The Mouse Genome Database (MGD): from genes to mice - a community resource for mouse biology [J].
Eppig, JT ;
Bult, CJ ;
Kadin, JA ;
Richardson, JE ;
Blake, JA .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D471-D475
[8]   WHOLE-GENOME RANDOM SEQUENCING AND ASSEMBLY OF HAEMOPHILUS-INFLUENZAE RD [J].
FLEISCHMANN, RD ;
ADAMS, MD ;
WHITE, O ;
CLAYTON, RA ;
KIRKNESS, EF ;
KERLAVAGE, AR ;
BULT, CJ ;
TOMB, JF ;
DOUGHERTY, BA ;
MERRICK, JM ;
MCKENNEY, K ;
SUTTON, G ;
FITZHUGH, W ;
FIELDS, C ;
GOCAYNE, JD ;
SCOTT, J ;
SHIRLEY, R ;
LIU, LI ;
GLODEK, A ;
KELLEY, JM ;
WEIDMAN, JF ;
PHILLIPS, CA ;
SPRIGGS, T ;
HEDBLOM, E ;
COTTON, MD ;
UTTERBACK, TR ;
HANNA, MC ;
NGUYEN, DT ;
SAUDEK, DM ;
BRANDON, RC ;
FINE, LD ;
FRITCHMAN, JL ;
FUHRMANN, JL ;
GEOGHAGEN, NSM ;
GNEHM, CL ;
MCDONALD, LA ;
SMALL, KV ;
FRASER, CM ;
SMITH, HO ;
VENTER, JC .
SCIENCE, 1995, 269 (5223) :496-512
[9]   Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers [J].
Green, ML ;
Karp, PD .
NUCLEIC ACIDS RESEARCH, 2005, 33 (13) :4035-4039
[10]   A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases [J].
Green, ML ;
Karp, PD .
BMC BIOINFORMATICS, 2004, 5 (1)