The sequence read archive: explosive growth of sequencing data

被引:646
作者
Kodama, Yuichi [1 ,2 ]
Shumway, Martin [3 ]
Leinonen, Rasko [4 ]
机构
[1] Res Org Informat & Syst, Ctr Informat Biol, Mishima, Shizuoka 4118540, Japan
[2] Res Org Informat & Syst, DNA Data Bank Japan, Natl Inst Genet, Mishima, Shizuoka 4118540, Japan
[3] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, Bethesda, MD 20894 USA
[4] Wellcome Trust Genome Campus, European Bioinformat Inst, Cambridge CB10 1SD, England
基金
英国惠康基金;
关键词
D O I
10.1093/nar/gkr854
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
New generation sequencing platforms are producing data with significantly higher throughput and lower cost. A portion of this capacity is devoted to individual and community scientific projects. As these projects reach publication, raw sequencing datasets are submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). Archiving experimental data is the key to the progress of reproducible science. The SRA was established as a public repository for next-generation sequence data as a part of the International Nucleotide Sequence Database Collaboration (INSDC). INSDC is composed of the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at www.ebi.ac.uk/ena from EBI and at trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA and report on updated metadata structures, submission file formats and supported sequencing platforms. We also briefly outline our various responses to the challenge of explosive data growth.
引用
收藏
页码:D54 / D56
页数:3
相关论文
共 8 条
[1]   NCBI GEO: archive for functional genomics data sets-10 years on [J].
Barrett, Tanya ;
Troup, Dennis B. ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Evangelista, Carlos ;
Kim, Irene F. ;
Tomashevsky, Maxim ;
Marshall, Kimberly A. ;
Phillippy, Katherine H. ;
Sherman, Patti M. ;
Muertter, Rolf N. ;
Holko, Michelle ;
Ayanbule, Oluwabukunmi ;
Yefanov, Andrey ;
Soboleva, Alexandra .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D1005-D1010
[2]   Efficient storage of high throughput DNA sequencing data using reference-based compression [J].
Fritz, Markus Hsi-Yang ;
Leinonen, Rasko ;
Cochrane, Guy ;
Birney, Ewan .
GENOME RESEARCH, 2011, 21 (05) :734-740
[3]   The International Nucleotide Sequence Database Collaboration [J].
Karsch-Mizrachi, Ilene ;
Nakamura, Yasukazu ;
Cochrane, Guy .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D33-D37
[4]  
Kodama Y., NUCL ACIDS IN PRESS
[5]   The Sequence Read Archive [J].
Leinonen, Rasko ;
Sugawara, Hideaki ;
Shumway, Martin .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D19-D21
[6]  
Li H, 2009, BIOINFORMATICS, V25, P1094, DOI [10.1093/bioinformatics/btp324, 10.1093/bioinformatics/btp100]
[7]   ArrayExpress update-an archive of microarray and high-throughput sequencing-based functional genomics experiments [J].
Parkinson, Helen ;
Sarkans, Ugis ;
Kolesnikov, Nikolay ;
Abeygunawardena, Niran ;
Burdett, Tony ;
Dylag, Miroslaw ;
Emam, Ibrahim ;
Farne, Anna ;
Hastings, Emma ;
Holloway, Ele ;
Kurbatova, Natalja ;
Lukk, Margus ;
Malone, James ;
Mani, Roby ;
Pilicheva, Ekaterina ;
Rustici, Gabriella ;
Sharma, Anjan ;
Williams, Eleanor ;
Adamusiak, Tomasz ;
Brandizi, Marco ;
Sklyar, Nataliya ;
Brazma, Alvis .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D1002-D1004
[8]   Archiving next generation sequencing data [J].
Shumway, Martin ;
Cochrane, Guy ;
Sugawara, Hideaki .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D870-D871