A statistical basis for testing the significance of mass spectrometric protein identification results

被引:91
作者
Eriksson, J [1 ]
Chait, BT [1 ]
Fenyö, D [1 ]
机构
[1] Rockefeller Univ, New York, NY 10021 USA
关键词
D O I
10.1021/ac990792j
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
A method for testing the significance of mass spectrometric (MS) protein identification results is presented. MS proteolytic peptide mapping and genome database searching provide a rapid, sensitive, and potentially accurate means for identifying proteins. Database search algorithms detect the matching between proteolytic peptide masses from an MS peptide map and theoretical proteolytic peptide masses of the proteins in a genome database. The number of masses that matches is used to compute a score, S, for each protein, and the protein that yields the best score is assumed as the identification result. There is a risk of obtaining a false result, because masses determined by MS are not unique; i.e., each mass in a peptide map can match randomly one or several proteins in a genome database. A false result is obtained when the score, S, due to random matching cannot be discerned from the score due to matching with a real protein in the sample. We therefore introduce the frequency function, f(S), for false (random) identification results as a basis for testing at what significance level, a, one can reject a null hypothesis, H-0: "the result is false". The significance is tested by comparing an experimental score, SE, with a critical score, Sc, required for a significant result at the level alpha. If S-E greater than or equal to S-C, H-0 is rejected. f(S) and S-C were obtained by simulations utilizing random tryptic peptide maps generated from a genome database. The critical score, S-C, was studied as a function of the number of masses in the peptide map, the mass accuracy, the degree of incomplete enzymatic cleavage, the protein mass range, and the size of the genome. With S-C known for a variety of experimental constraints, significance testing can be fully automated and integrated with database searching software used for protein identification.
引用
收藏
页码:999 / 1005
页数:7
相关论文
共 33 条
[11]   Identification of gel-separated proteins by liquid chromatography electrospray tandem mass spectrometry: Comparison of methods and their limitations [J].
Haynes, PA ;
Fripp, N ;
Aebersold, R .
ELECTROPHORESIS, 1998, 19 (06) :939-945
[12]   IDENTIFYING PROTEINS FROM 2-DIMENSIONAL GELS BY MOLECULAR MASS SEARCHING OF PEPTIDE-FRAGMENTS IN PROTEIN-SEQUENCE DATABASES [J].
HENZEL, WJ ;
BILLECI, TM ;
STULTS, JT ;
WONG, SC ;
GRIMLEY, C ;
WATANABE, C .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1993, 90 (11) :5011-5015
[13]   Sequence patterns produced by incomplete enzymatic digestion or one-step Edman degradation of peptide mixtures as probes for protein database searches [J].
Jensen, ON ;
Vorm, O ;
Mann, M .
ELECTROPHORESIS, 1996, 17 (05) :938-944
[14]   Automation of matrix-assisted laser desorption/ionization mass spectrometry using fuzzy logic feedback control [J].
Jensen, ON ;
Mortensen, P ;
Vorm, O ;
Mann, M .
ANALYTICAL CHEMISTRY, 1997, 69 (09) :1706-1714
[15]   Identification of the components of simple protein mixtures by high accuracy peptide mass mapping and database searching [J].
Jensen, ON ;
Podtelejnikov, AV ;
Mann, M .
ANALYTICAL CHEMISTRY, 1997, 69 (23) :4741-4750
[16]   A unified statistical framework for sequence comparison and structure comparison [J].
Levitt, M ;
Gerstein, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :5913-5920
[17]   USE OF MASS-SPECTROMETRIC MOLECULAR-WEIGHT INFORMATION TO IDENTIFY PROTEINS IN SEQUENCE DATABASES [J].
MANN, M ;
HOJRUP, P ;
ROEPSTORFF, P .
BIOLOGICAL MASS SPECTROMETRY, 1993, 22 (06) :338-345
[18]  
MANN M, 1995, 43 ASMS C MASS SPECT
[19]   Two-dimensional mass spectrometry of biomolecules at the subfemtomole level [J].
McLafferty, FW ;
Kelleher, NL ;
Begley, TP ;
Fridriksson, EK ;
Zubarev, RA ;
Horn, DM .
CURRENT OPINION IN CHEMICAL BIOLOGY, 1998, 2 (05) :571-578
[20]   IDENTIFICATION OF PROTEINS IN POLYACRYLAMIDE GELS BY MASS-SPECTROMETRIC PEPTIDE-MAPPING COMBINED WITH DATABASE SEARCH [J].
MORTZ, E ;
VORM, O ;
MANN, M ;
ROEPSTORFF, P .
BIOLOGICAL MASS SPECTROMETRY, 1994, 23 (05) :249-261