MS1, MS2, and SQT - three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications

被引:286
作者
McDonald, WH
Tabb, DL
Sadygov, RG
MacCoss, MJ
Venable, J
Graumann, J
Johnson, JR
Cociorva, D
Yates, JR
机构
[1] Scripps Res Inst, Dept Cell Biol, La Jolla, CA 92037 USA
[2] CALTECH, Dept Biol, Pasadena, CA 91125 USA
[3] Univ Washington, Dept Genome Sci, Seattle, WA USA
关键词
D O I
10.1002/rcm.1603
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
As the speed with which proteomic labs generate data increases along with the scale of projects they are undertaking, the resulting data storage and data processing problems will continue to challenge computational resources. This is especially true for shotgun proteomic techniques that can generate tens of thousands of spectra per instrument each day. One design factor leading to many of these problems is caused by storing spectra and the database identifications for a given spectrum as individual files. While these problems can be addressed by storing all of the spectra and search results in large relational databases, the infrastructure to implement such a strategy can be beyond the means of academic labs. We report here a series of unified text file formats for storing spectral data (MS1 and MS2) and search results (SQT) that are compact, easily parsed by both machine and humans, and yet flexible enough to be coupled with new algorithms and data-mining Strategies. Copyright (C) 2004 John Wiley Sons, Ltd.
引用
收藏
页码:2162 / 2168
页数:7
相关论文
共 15 条
[1]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[2]   A proteomic view of the Plasmodium falciparum life cycle [J].
Florens, L ;
Washburn, MP ;
Raine, JD ;
Anthony, RM ;
Grainger, M ;
Haynes, JD ;
Moch, JK ;
Muster, N ;
Sacci, JB ;
Tabb, DL ;
Witney, AA ;
Wolters, D ;
Wu, YM ;
Gardner, MJ ;
Holder, AA ;
Sinden, RE ;
Yates, JR ;
Carucci, DJ .
NATURE, 2002, 419 (6906) :520-526
[3]   Automated identification of amino acid sequence variations in proteins by HPLC/microspray tandem mass spectrometry [J].
Gatlin, CL ;
Eng, JK ;
Cross, ST ;
Detter, JC ;
Yates, JR .
ANALYTICAL CHEMISTRY, 2000, 72 (04) :757-763
[4]   Direct analysis of protein complexes using mass spectrometry [J].
Link, AJ ;
Eng, J ;
Schieltz, DM ;
Carmack, E ;
Mize, GJ ;
Morris, DR ;
Garvik, BM ;
Yates, JR .
NATURE BIOTECHNOLOGY, 1999, 17 (07) :676-682
[5]   Shotgun identification of protein modifications from protein complexes and lens tissue [J].
MacCoss, MJ ;
McDonald, WH ;
Saraf, A ;
Sadygov, R ;
Clark, JM ;
Tasto, JJ ;
Gould, KL ;
Wolters, D ;
Washburn, M ;
Weiss, A ;
Clark, JI ;
Yates, JR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (12) :7900-7905
[6]   Direct analysis and identification of proteins in mixtures by LC/MS/MS and database searching at the low-femtomole level [J].
McCormack, AL ;
Schieltz, DM ;
Goode, B ;
Yang, S ;
Barnes, G ;
Drubin, D ;
Yates, JR .
ANALYTICAL CHEMISTRY, 1997, 69 (04) :767-776
[7]   Comparison of three directly coupled HPLC MS/MS strategies for identification of proteins from complex mixtures: single-dimension LC-MS/MS, 2-phase MudPIT, and 3-phase MudPIT [J].
McDonald, WH ;
Ohi, R ;
Miyamoto, DT ;
Mitchison, TJ ;
Yates, JR .
INTERNATIONAL JOURNAL OF MASS SPECTROMETRY, 2002, 219 (01) :245-251
[8]   The proteomics standards initiative [J].
Orchard, S ;
Hermjakob, H ;
Apweiler, R .
PROTEOMICS, 2003, 3 (07) :1374-1376
[9]   Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: The yeast proteome [J].
Peng, JM ;
Elias, JE ;
Thoreen, CC ;
Licklider, LJ ;
Gygi, SP .
JOURNAL OF PROTEOME RESEARCH, 2003, 2 (01) :43-50
[10]   A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases [J].
Sadygov, RG ;
Yates, JR .
ANALYTICAL CHEMISTRY, 2003, 75 (15) :3792-3798