Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs

被引:1393
作者
Okazaki, Y
Furuno, M
Kasukawa, T
Adachi, J
Bono, H
Kondo, S
Nikaido, I
Osato, N
Saito, R
Suzuki, H
Yamanaka, I
Kiyosawa, H
Yagi, K
Tomaru, Y
Hasegawa, Y
Nogami, A
Schönbach, C
Gojobori, T
Baldarelli, R
Hill, DP
Bult, C
Hume, DA
Quackenbush, J
Schriml, LM
Kanapin, A
Matsuda, H
Batalov, S
Beisel, KW
Blake, JA
Bradt, D
Brusic, V
Chothia, C
Corbani, LE
Cousins, S
Dalla, E
Dragani, TA
Fletcher, CF
Forrest, A
Frazer, KS
Gaasterland, T
Gariboldi, M
Gissi, C
Godzik, A
Gough, J
Grimmond, S
Gustincich, S
Hirokawa, N
Jackson, IJ
Jarvis, ED
Kanai, A
机构
[1] RIKEN Yokohama Inst, RIKEN Genom Sci Ctr, Lab Genome Explorat Res Grp, Tsurumi KU, Yokohama, Kanagawa 2300045, Japan
[2] Discovery & Res Inst, Genome Sci Lab, Wako, Saitama 3510198, Japan
[3] NTT Software Corp, Naka Ku, Kanagawa 2318554, Japan
[4] Yokohama City Univ, Grad Sch Integrated Sci, Div Genom Informat Resources Sci Biol Supramol Sy, Tsurumi Ku, Yokohama, Kanagawa 2300045, Japan
[5] Keio Univ, Inst Adv Biosci, Yamagata 9970017, Japan
[6] Univ Tsukuba, Inst Basic Med Sci, Tsukuba, Ibaraki 3058577, Japan
[7] RIKEN Yokohama Inst, RIKEN Genom Sci Ctr, Bioinformat Grp, Biomed Knowledge Discovery Team, Kanagawa 2300045, Japan
[8] Natl Inst Genet, DNA Data Bank Japan, Shizuoka 4118540, Japan
[9] Natl Inst Genet, Ctr Informat Biol, Shizuoka 4118540, Japan
[10] Jackson Lab, Mouse Genome Informat Grp, Bar Harbor, ME 04609 USA
[11] Univ Queensland, Inst Mol Biosci, Brisbane, Qld 4072, Australia
[12] Univ Queensland, ARC Special Res Ctr Funct & Appl Genom, Brisbane, Qld 4072, Australia
[13] Inst Genom Res TIGR, Rockville, MD 20850 USA
[14] NIH, Natl Ctr Biotechnol Informat, Bethesda, MD 20894 USA
[15] European Bioinformat Inst, Cambridge CB10 1SD, England
[16] Osaka Univ, Grad Sch Informat Sci & Technol, Osaka 5608531, Japan
[17] Novartis Res Fdn, Genom Inst, San Diego, CA 92121 USA
[18] Boys Town Natl Res Hosp, Omaha, NE 68131 USA
[19] Labs Informat Technol, Singapore 119613, Singapore
[20] MRC, Mol Biol Lab, Cambridge CB2 2QH, England
[21] LNCIB, Funct Genom, I-34012 Trieste, Italy
[22] Ist Tumori Milano, I-20133 Milan, Italy
[23] The Scripps Res Inst, Res Inst, La Jolla, CA 92037 USA
[24] Univ Oregon, Zebrafish Int Resource Ctr, Eugene, OR 97403 USA
[25] Rockefeller Univ, Lab Computat Genom, New York, NY 10021 USA
[26] Univ Milan, I-20133 Milan, Italy
[27] Burnham Inst, La Jolla, CA 92037 USA
[28] Harvard Univ, Sch Med, Dept Neurobiol, Boston, MA 02115 USA
[29] Univ Tokyo, Grad Sch Med, Bunkyo Ku, Tokyo 1130033, Japan
[30] MRC, Human Genet Unit, Edinburgh, Midlothian, Scotland
[31] Duke Univ, Med Ctr, Dept Neurobiol, Durham, NC 27710 USA
[32] Univ Texas, SW Med Ctr, Dept Mol Genet, Howard Hughes Med Inst, Dallas, TX 75390 USA
[33] Karolinska Inst, Ctr Genom & Bioinformat, S-17177 Stockholm, Sweden
[34] Addenbrookes Hosp, Cambridge Inst Med Res, JDRF WT Diabet & Inflammat Lab, Cambridge CB2 2XY, England
[35] NHGRI, NIH, Bethesda, MD 20892 USA
[36] Canberra Hosp, Autoimmun Res Unit, Woden, ACT 2606, Australia
[37] Appl Genom Inc, Sunnyvale, CA 94085 USA
[38] Hirakata Ryoikuen, Osaka 5650874, Japan
[39] Hyogo Med Univ, Inst Adv Med Sci, Nishinomiya, Hyogo 6638501, Japan
[40] Wellcome Trust Sanger Inst, Hinxton CB10 1SA, Cambs, England
[41] Univ Calif San Diego, Sch Med, La Jolla, CA 92093 USA
[42] Univ Bonn, Dept Psychiat, D-53105 Bonn, Germany
[43] Washington Univ, Sch Med, Genome Sequencing Ctr, St Louis, MO 63108 USA
[44] Whitehead Inst MIT Ctr Genome Res, Cambridge, MA 02141 USA
关键词
D O I
10.1038/nature01266
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
引用
收藏
页码:563 / 573
页数:11
相关论文
共 52 条
[1]   InterPro - an integrated documentation resource for protein families, domains and functional sites [J].
Apweiler, R ;
Attwood, TK ;
Bairoch, A ;
Bateman, A ;
Birney, E ;
Biswas, M ;
Bucher, P ;
Cerutti, L ;
Corpet, F ;
Croning, MDR ;
Durbin, R ;
Falquet, L ;
Fleischmann, W ;
Gouzy, J ;
Hermjakob, H ;
Hulo, N ;
Jonassen, I ;
Kahn, D ;
Kanapin, A ;
Karavidopoulou, Y ;
Lopez, R ;
Marx, B ;
Mulder, NJ ;
Oinn, TM ;
Pagni, M ;
Servant, F ;
Sigrist, CJA ;
Zdobnov, EM .
BIOINFORMATICS, 2000, 16 (12) :1145-1150
[2]  
Ashburner M, 2001, GENOME RES, V11, P1425
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   The ENZYME database in 2000 [J].
Bairoch, A .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :304-305
[5]   The complete gene sequence of titin, expression of an unusual ≈700-kDa titin isoform, and its interaction with obscurin identify a novel Z-line to I-band linking system [J].
Bang, ML ;
Centner, T ;
Fornoff, F ;
Geach, AJ ;
Gotthardt, M ;
McNabb, M ;
Witt, CC ;
Labeit, D ;
Gregorio, CC ;
Granzier, H ;
Labeit, S .
CIRCULATION RESEARCH, 2001, 89 (11) :1065-1072
[6]   Alternative splicing and genome complexity [J].
Brett, D ;
Pospisil, H ;
Valcárcel, J ;
Reich, J ;
Bork, P .
NATURE GENETICS, 2002, 30 (01) :29-30
[7]   The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome [J].
Camargo, AA ;
Samaia, HPB ;
Dias-Neto, E ;
Simao, DF ;
Migotto, IA ;
Briones, MRS ;
Costa, FF ;
Nagai, MA ;
Verjovski-Almeida, S ;
Zago, MA ;
Andrade, LEC ;
Carrer, H ;
El-Dorry, HFA ;
Espreafico, EM ;
Habr-Gama, A ;
Giannella-Neto, D ;
Goldman, GH ;
Gruber, A ;
Hackel, C ;
Kimura, ET ;
Maciel, RMB ;
Marie, SKN ;
Martins, EAL ;
Nóbrega, MP ;
Paçó-Larson, ML ;
Pardini, MIMC ;
Pereira, GG ;
Pesquero, JB ;
Rodrigues, V ;
Rogatto, SR ;
da Silva, IDCG ;
Sogayar, MC ;
Sonati, MDF ;
Tajara, EH ;
Valentini, SR ;
Alberto, FL ;
Amaral, MEJ ;
Aneas, I ;
Arnaldi, LAT ;
de Assis, AM ;
Bengtson, MH ;
Bergamo, NA ;
Bombonato, V ;
de Camargo, MER ;
Canevari, RA ;
Carraro, DM ;
Cerutti, JM ;
Corrêa, MLC ;
Corrêa, RFR ;
Costa, MCR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (21) :12103-12108
[8]   Normalization and subtraction of cap-trapper-selected cDNAs to prepare full-length cDNA libraries for rapid discovery of new genes [J].
Carninci, P ;
Shibata, Y ;
Hayatsu, N ;
Sugahara, Y ;
Shibata, K ;
Itoh, M ;
Konno, H ;
Okazaki, Y ;
Muramatsu, M ;
Hayashizaki, Y .
GENOME RESEARCH, 2000, 10 (10) :1617-1630
[9]   Cytoplasmic RNA extraction from fresh and frozen mammalian tissues [J].
Carninci, P ;
Nakamura, M ;
Sato, K ;
Hayashizaki, Y ;
Brownstein, MJ .
BIOTECHNIQUES, 2002, 33 (02) :306-309
[10]   Extra-long first-strand cDNA synthesis [J].
Carninci, P ;
Shiraki, T ;
Mizuno, Y ;
Muramatsu, M ;
Hayashizaki, Y .
BIOTECHNIQUES, 2002, 32 (05) :984-985