An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms:: Sensitivity and specificity analysis

被引:259
作者
Kapp, EA
Schütz, F
Connolly, LM
Chakel, JA
Meza, JE
Miller, CA
Fenyo, D
Eng, JK
Adkins, JN
Omenn, GS
Simpson, RJ
机构
[1] Ludwig Inst Canc Res, Joint Prot Lab, Melbourne Branch, Walter & Eliza Hall Inst Med Res, Parkville, Vic, Australia
[2] Agilent Technol, Santa Clara, CA USA
[3] GE Healthcare, Piscataway, NJ USA
[4] Inst Syst Biol, Seattle, WA USA
[5] Pacific NW Natl Lab, Richland, WA USA
[6] Univ Michigan, Sch Med, Ann Arbor, MI USA
关键词
MASCOT; mass spectrometry; PeptideProphet; SEQUEST; sonar; spectrum mill; X!Tandem;
D O I
10.1002/pmic.200500126
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
MS/MS and associated database search algorithms are essential proteomic tools for identifying peptides. Due to their widespread use, it is now time to perform a systematic analysis of the various algorithms currently in use. Using blood specimens used in the HUPO Plasma Proteome Project, we have evaluated five search algorithms with respect to their sensitivity and specificity, and have also accurately benchmarked them based on specified false-positive (FP) rates. Spectrum Mill and SEQUEST performed well in terms of sensitivity, but were inferior to MASCOT, X!Tandem, and Sonar in terms of specificity. Overall, MASCOT, a probabilistic search algorithm, correctly identified most peptides based on a specified FP rate. The rescoring algorithm, PeptideProphet, enhanced the overall performance of the SEQUEST algorithm, as well as provided predictable FP error rates. Ideally, score thresholds should be calculated for each peptide spectrum or minimally, derived from a reversed-sequence search as demonstrated in this study based on a validated data set. The availability of open-source search algorithms, such as X!Tandem, makes it feasible to further improve the validation process (manual or automatic) on the basis of "consensus scoring", i.e., the use of multiple (at least two) search algorithms to reduce the number of FPs.
引用
收藏
页码:3475 / 3490
页数:16
相关论文
共 45 条
[1]   Toward a human blood serum proteome - Analysis by multidimensional separation coupled with mass spectrometry [J].
Adkins, JN ;
Varnum, SM ;
Auberry, KJ ;
Moore, RJ ;
Angell, NH ;
Smith, RD ;
Springer, DL ;
Pounds, JG .
MOLECULAR & CELLULAR PROTEOMICS, 2002, 1 (12) :947-955
[2]   The human plasma proteome - History, character, and diagnostic prospects [J].
Anderson, NL ;
Anderson, NG .
MOLECULAR & CELLULAR PROTEOMICS, 2002, 1 (11) :845-867
[3]   The human plasma proteome - A nonredundant list developed by combination of four separate sources [J].
Anderson, NL ;
Polanski, M ;
Pieper, R ;
Gatlin, T ;
Tirumalai, RS ;
Conrads, TP ;
Veenstra, TD ;
Adkins, JN ;
Pounds, JG ;
Fagan, R ;
Lobley, A .
MOLECULAR & CELLULAR PROTEOMICS, 2004, 3 (04) :311-326
[4]   Protein identification by mass spectrometry - Issues to be considered [J].
Baldwin, MA .
MOLECULAR & CELLULAR PROTEOMICS, 2004, 3 (01) :1-9
[5]   Improving large-scale proteomics by clustering of mass spectrometry data [J].
Beer, I ;
Barnea, E ;
Ziv, T ;
Admon, A .
PROTEOMICS, 2004, 4 (04) :950-960
[6]   Potential for false positive identifications from large databases through tandem mass spectrometry [J].
Cargile, BJ ;
Bundy, JL ;
Stephenson, JL .
JOURNAL OF PROTEOME RESEARCH, 2004, 3 (05) :1082-1085
[7]  
CLAUSER K, 2004, MOL CELL PROTEOMICS, V3, P531
[8]   A method for reducing the time required to match protein sequences with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2003, 17 (20) :2310-2316
[9]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[10]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989