A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases

被引:157
作者
Sadygov, RG [1 ]
Yates, JR [1 ]
机构
[1] Scripps Res Inst, Dept Cell Biol, La Jolla, CA 92037 USA
关键词
D O I
10.1021/ac034157w
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
We present a new probability-based method for protein identification using tandem mass spectra and protein databases. The method employs a hypergeometric distribution to model frequencies of matches between fragment ions predicted for peptide sequences with a specific (M + H)(+) value (at some mass tolerance) in a protein sequence database and an experimental tandem mass spectrum. The hypergeometric distribution constitutes null hypothesis-all peptide matches to a tandem mass spectrum are random. It is used to generate a score characterizing the randomness of a database sequence match to an experimental tandem mass spectrum and to determine the level of significance of the null hypothesis. For each tandem mass spectrum and database search, a peptide is identified that has the least probability of being a random match to the spectrum and the corresponding level of significance of the null hypothesis is determined. To check the validity of the hypergeometric model in describing fragment ion matches, we used chi(2) test. The distribution of frequencies and corresponding hypergeometric probabilities are generated for each tandem mass spectrum. No proteolytic cleavage specificity is used to create the peptide sequences from the database. We do not use any empirical probabilities in this method. The scores generated by the hypergeometric model do not have a significant molecular weight bias and are reasonably independent of database size. The approach has been implemented in a database search algorithm, PEP-PROBE By using a large set of tandem mass spectra derived from a set of peptides created by digestion of a collection of known proteins using four different proteases, a false positive rate of 5% is demonstrated.
引用
收藏
页码:3792 / 3798
页数:7
相关论文
共 24 条
[11]   Probability-based validation of protein identifications using a modified SEQUEST algorithm [J].
MacCoss, MJ ;
Wu, CC ;
Yates, JR .
ANALYTICAL CHEMISTRY, 2002, 74 (21) :5593-5599
[12]   Shotgun identification of protein modifications from protein complexes and lens tissue [J].
MacCoss, MJ ;
McDonald, WH ;
Saraf, A ;
Sadygov, R ;
Clark, JM ;
Tasto, JJ ;
Gould, KL ;
Wolters, D ;
Washburn, M ;
Weiss, A ;
Clark, JI ;
Yates, JR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (12) :7900-7905
[13]   ERROR TOLERANT IDENTIFICATION OF PEPTIDES IN SEQUENCE DATABASES BY PEPTIDE SEQUENCE TAGS [J].
MANN, M ;
WILM, M .
ANALYTICAL CHEMISTRY, 1994, 66 (24) :4390-4399
[14]   RAPID IDENTIFICATION OF PROTEINS BY PEPTIDE-MASS FINGERPRINTING [J].
PAPPIN, DJC ;
HOJRUP, P ;
BLEASBY, AJ .
CURRENT BIOLOGY, 1993, 3 (06) :327-332
[15]  
Perkins DN, 1999, ELECTROPHORESIS, V20, P3551, DOI 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO
[16]  
2-2
[17]   Code developments to improve the efficiency of automated MS/MS spectra interpretation [J].
Sadygov, RG ;
Eng, J ;
Durr, E ;
Saraf, A ;
McDonald, H ;
MacCoss, MJ ;
Yates, JR .
JOURNAL OF PROTEOME RESEARCH, 2002, 1 (03) :211-215
[18]   Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides [J].
Tabb, DL ;
Smith, LL ;
Breci, LA ;
Wysocki, VH ;
Lin, D ;
Yates, JR .
ANALYTICAL CHEMISTRY, 2003, 75 (05) :1155-1163
[19]   DTASelect and contrast: Tools for assembling and comparing protein identifications from shotgun proteomics [J].
Tabb, DL ;
McDonald, WH ;
Yates, JR .
JOURNAL OF PROTEOME RESEARCH, 2002, 1 (01) :21-26
[20]   Selective degradation of ubiquitinated Sic1 by purified 26S proteasome yields active S phase cyclin-Cdk [J].
Verma, R ;
McDonald, H ;
Yates, JR ;
Deshaies, RJ .
MOLECULAR CELL, 2001, 8 (02) :439-448