Computational identification of promoters and first exons in the human genome

被引:300
作者
Davuluri, RV [1 ]
Grosse, I [1 ]
Zhang, MQ [1 ]
机构
[1] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1038/ng780
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The identification of promoters and first exons has been one of the most difficult problems in gene-finding. We present a set of discriminant functions that can recognize structural and compositional features such as CpG islands, promoter regions and first splice-donor sites. We explain the implementation of the discriminant functions into a decision tree that constitutes a new program called FirstEF. By using different models to predict CpG-related and non-CpG-related first exons, we showed by cross-validation that the program could predict 86% of the first exons with 17% false positives. We also demonstrated the prediction accuracy of FirstEF at the genome level by applying it to the finished sequences of human chromosomes 21 and 22 as well as by comparing the predictions with the locations of the experimentally verified first exons. Finally, we present the analysis of the predicted first exons for all of the 24 chromosomes of the human genome.
引用
收藏
页码:412 / 417
页数:6
相关论文
共 24 条
[1]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[2]   From bioinformatics to computational biology [J].
Claverie, JM .
GENOME RESEARCH, 2000, 10 (09) :1277-1279
[3]  
CLEVERIE JM, 1997, HUM MOL GENET, V6, P1735
[4]   NONMETHYLATED ISLANDS IN FISH GENOMES ARE GC-POOR [J].
CROSS, S ;
KOVARIK, P ;
SCHMIDTKE, J ;
BIRD, A .
NUCLEIC ACIDS RESEARCH, 1991, 19 (07) :1469-1474
[5]   CPG ISLANDS AND GENES [J].
CROSS, SH ;
BIRD, AP .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 1995, 5 (03) :309-314
[6]   CART classification of human 5′ UTR sequences [J].
Davuluri, RV ;
Suzuki, Y ;
Sugano, S ;
Zhang, MQ .
GENOME RESEARCH, 2000, 10 (11) :1807-1816
[7]   The DNA sequence of human chromosome 22 [J].
Dunham, I ;
Shimizu, N ;
Roe, BA ;
Chissoe, S ;
Dunham, I ;
Hunt, AR ;
Collins, JE ;
Bruskiewich, R ;
Beare, DM ;
Clamp, M ;
Smink, LJ ;
Ainscough, R ;
Almeida, JP ;
Babbage, A ;
Bagguley, C ;
Balley, J ;
Barlow, K ;
Bates, KN ;
Beasley, O ;
Bird, CP ;
Blakey, S ;
Bridgeman, AM ;
Buck, D ;
Burgess, J ;
Burrill, WD ;
Burton, J ;
Carder, C ;
Carter, NP ;
Chen, Y ;
Clark, G ;
Clegg, SM ;
Cobley, V ;
Cole, CG ;
Collier, RE ;
Connor, RE ;
Conroy, D ;
Corby, N ;
Coville, GJ ;
Cox, AV ;
Davis, J ;
Dawson, E ;
Dhami, PD ;
Dockree, C ;
Dodsworth, SJ ;
Durbin, RM ;
Ellington, A ;
Evans, KL ;
Fey, JM ;
Fleming, K ;
French, L .
NATURE, 1999, 402 (6761) :489-495
[8]   Sequence interpretation - Making sense of the sequence [J].
Galas, DJ .
SCIENCE, 2001, 291 (5507) :1257-+
[9]   CPG ISLANDS IN VERTEBRATE GENOMES [J].
GARDINERGARDEN, M ;
FROMMER, M .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 196 (02) :261-282
[10]   The DNA sequence of human chromosome 21 [J].
Hattori, M ;
Fujiyama, A ;
Taylor, TD ;
Watanabe, H ;
Yada, T ;
Park, HS ;
Toyoda, A ;
Ishii, K ;
Totoki, Y ;
Choi, DK ;
Soeda, E ;
Ohki, M ;
Takagi, T ;
Sakaki, Y ;
Taudien, S ;
Blechschmidt, K ;
Polley, A ;
Menzel, U ;
Delabar, J ;
Kumpf, K ;
Lehmann, R ;
Patterson, D ;
Reichwald, K ;
Rump, A ;
Schillhabel, M ;
Schudy, A ;
Zimmermann, W ;
Rosenthal, A ;
Kudoh, J ;
Shibuya, K ;
Kawasaki, K ;
Asakawa, S ;
Shintani, A ;
Sasaki, T ;
Nagamine, K ;
Mitsuyama, S ;
Antonarakis, SE ;
Minoshima, S ;
Shimizu, N ;
Nordsiek, G ;
Hornischer, K ;
Brandt, P ;
Scharfe, M ;
Schön, O ;
Desario, A ;
Reichelt, J ;
Kauer, G ;
Blöcker, H ;
Ramser, J ;
Beck, A .
NATURE, 2000, 405 (6784) :311-319