Linear discriminant analysis-based estimation of the false discovery rate for phosphopeptide identifications

被引:30
作者
Du, Xiuxia [1 ]
Yang, Feng [1 ]
Manes, Nathan P. [1 ]
Stenoien, David L. [1 ]
Monroe, Matthew E. [1 ]
Adkins, Joshua N. [1 ]
States, David J. [2 ]
Purvine, Samuel O. [1 ]
Camp, David G., II [1 ]
Smith, Richard D. [1 ]
机构
[1] Pacific NW Natl Lab, Fundamental & Computat Sci Directorate, Richland, WA 99352 USA
[2] Univ Michigan, Sch Med, Ann Arbor, MI 48109 USA
关键词
false discovery rate; phosphoproteomics; expectation maximization; linear discriminant analysis; p-value; q-value; Bayesian analysis;
D O I
10.1021/pr070510t
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The-development of liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has made it possible to characterize phosphopeptides in an increasingly large-scale and high-throughput fashion. However, extracting confident phosphopeptide identifications from the resulting large data sets in a similar high-throughput fashion remains difficult, as does rigorously estimating the false discovery rate (FDR) of a set of phosphopeptide identifications. This article describes a data analysis pipeline designed to address these issues. The first step is to reanalyze phosphopeptide identifications that contain ambiguous assignments for the incorporated phosphate(s) to determine the most likely arrangement of the phosphate(s). The next step is to employ an expectation maximization algorithm to estimate the joint distribution of the peptide scores. A linear discriminant analysis is then performed to determine how to optimally combine peptide scores (in this case, from SEQUEST) into a discriminant score that possesses the maximum discriminating power. Based on this discriminant score, the p- and q-values for each phosphopeptide identification are calculated, and the phosphopeptide identification FDR is then estimated. This data analysis approach was applied to data from a study of irradiated human skin fibroblasts to provide a robust estimate of FDR for phosphopeptides. The Phosphopeptide FDR Estimator software is freely available for download at http://ncrr.pni.gov/software/.
引用
收藏
页码:2195 / 2203
页数:9
相关论文
共 28 条
[1]  
[Anonymous], 1977, MAXIMUM LIKELIHOOD I
[2]   A probability-based approach for high-throughput protein phosphorylation analysis and site localization [J].
Beausoleil, Sean A. ;
Villen, Judit ;
Gerber, Scott A. ;
Rush, John ;
Gygi, Steven P. .
NATURE BIOTECHNOLOGY, 2006, 24 (10) :1285-1292
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra [J].
Chen, Y ;
Kwon, SW ;
Kim, SC ;
Zhao, YM .
JOURNAL OF PROTEOME RESEARCH, 2005, 4 (03) :998-1005
[5]   False discovery rates and related statistical concepts in mass spectrometry-based proteomics [J].
Choi, Hyungwon ;
Nesvizhskii, Alexey I. .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) :47-50
[6]  
Ciechanover A, 2000, BIOESSAYS, V22, P442, DOI 10.1002/(SICI)1521-1878(200005)22:5<442::AID-BIES6>3.0.CO
[7]  
2-Q
[8]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[9]   Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry [J].
Elias, Joshua E. ;
Gygi, Steven P. .
NATURE METHODS, 2007, 4 (03) :207-214
[10]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989