SEQUENCE SIMILARITY ANALYSIS OF ESCHERICHIA-COLI PROTEINS - FUNCTIONAL AND EVOLUTIONARY IMPLICATIONS

被引:88
作者
KOONIN, EV
TATUSOV, RL
RUDD, KE
机构
[1] Natl. Ctr. for Biotech. Information, National Library of Medicine, National Institutes of Health, Bethesda
关键词
PROTEIN SEQUENCE SIMILARITY; ESCHERICHIA COLI GENOME; PARALOGOUS PROTEIN CLUSTERS; ANCIENT CONSERVED REGIONS;
D O I
10.1073/pnas.92.25.11921
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A computer analysis of 2328 protein sequences comprising about 60% of the Escherichia coli gene products was performed using methods for database screening with individual sequences and alignment blocks. A high fraction of E. coli proteins-86%-shows significant sequence similarity to other proteins in current databases; about 70% show conservation at least at the level of distantly related bacteria, and about 40% contain ancient conserved regions (ACRs) shared with eukaryotic or Archaeal proteins. For >90% of the E. coli proteins, either functional information or sequence similarity, or both, are available. Forty-six percent of the E. coli proteins belong to 299 clusters of paralogs (intraspecies homologs) defined on the basis of pairwise similarity. Another 10% could be included in 70 superclusters using motif detection methods. The majority of the clusters contain only two to four members. In contrast, nearly 25% of all E. coli proteins belong to the four largest superclusters-namely, permeases, ATPases and GTPases with the conserved ''Walker-type'' motif, helix-turn-helix regulatory proteins, and NAD(FAD)-binding proteins. We conclude that bacterial protein sequences generally are highly conserved in evolution, with about 50% of all ACR-containiug protein families represented among the E. coli gene products. With the current sequence databases and methods of their screening, computer analysis yields useful information on the functions and evolutionary relationships of the vast majority of genes in a bacterial genome. Sequence similarity with E. coli proteins allows the prediction of functions for a number of important eukaryotic genes, including Several whose products are implicated in human diseases.
引用
收藏
页码:11921 / 11925
页数:5
相关论文
共 35 条
[1]   ISSUES IN SEARCHING MOLECULAR SEQUENCE DATABASES [J].
ALTSCHUL, SF ;
BOGUSKI, MS ;
GISH, W ;
WOOTTON, JC .
NATURE GENETICS, 1994, 6 (02) :119-129
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]  
BERLYN M, 1995, IN PRESS ESCHERICHIA
[4]   FROM GENOME SEQUENCES TO PROTEIN FUNCTION [J].
BORK, P ;
OUZOUNIS, C ;
SANDER, C .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1994, 4 (03) :393-403
[5]   EXPLORING THE MYCOPLASMA-CAPRICOLUM GENOME - A MINIMAL CELL REVEALS ITS PHYSIOLOGY [J].
BORK, P ;
OUZOUNIS, C ;
CASARI, G ;
SCHNEIDER, R ;
SANDER, C ;
DOLAN, M ;
GILBERT, W ;
GILLEVET, PM .
MOLECULAR MICROBIOLOGY, 1995, 16 (05) :955-967
[6]   INTRINSIC AND EXTRINSIC APPROACHES FOR DETECTING GENES IN A BACTERIAL GENOME [J].
BORODOVSKY, M ;
RUDD, KE ;
KOONIN, EV .
NUCLEIC ACIDS RESEARCH, 1994, 22 (22) :4756-4767
[7]   PROTEINS - 1000 FAMILIES FOR THE MOLECULAR BIOLOGIST [J].
CHOTHIA, C .
NATURE, 1992, 357 (6379) :543-544
[8]   ANALYSIS OF THE ESCHERICHIA-COLI GENOME - DNA-SEQUENCE OF THE REGION FROM 84.5 TO 86.5 MINUTES [J].
DANIELS, DL ;
PLUNKETT, G ;
BURLAND, V ;
BLATTNER, FR .
SCIENCE, 1992, 257 (5071) :771-778
[9]   DISTINGUISHING HOMOLOGOUS FROM ANALOGOUS PROTEINS [J].
FITCH, WM .
SYSTEMATIC ZOOLOGY, 1970, 19 (02) :99-&
[10]   WHOLE-GENOME RANDOM SEQUENCING AND ASSEMBLY OF HAEMOPHILUS-INFLUENZAE RD [J].
FLEISCHMANN, RD ;
ADAMS, MD ;
WHITE, O ;
CLAYTON, RA ;
KIRKNESS, EF ;
KERLAVAGE, AR ;
BULT, CJ ;
TOMB, JF ;
DOUGHERTY, BA ;
MERRICK, JM ;
MCKENNEY, K ;
SUTTON, G ;
FITZHUGH, W ;
FIELDS, C ;
GOCAYNE, JD ;
SCOTT, J ;
SHIRLEY, R ;
LIU, LI ;
GLODEK, A ;
KELLEY, JM ;
WEIDMAN, JF ;
PHILLIPS, CA ;
SPRIGGS, T ;
HEDBLOM, E ;
COTTON, MD ;
UTTERBACK, TR ;
HANNA, MC ;
NGUYEN, DT ;
SAUDEK, DM ;
BRANDON, RC ;
FINE, LD ;
FRITCHMAN, JL ;
FUHRMANN, JL ;
GEOGHAGEN, NSM ;
GNEHM, CL ;
MCDONALD, LA ;
SMALL, KV ;
FRASER, CM ;
SMITH, HO ;
VENTER, JC .
SCIENCE, 1995, 269 (5223) :496-512