AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE

被引:419
作者
ALTSCHUL, SF
机构
[1] National Center for Biotechnology Information National Library, Medicine National Institutes of Health Bethesda
关键词
HOMOLOGY; SEQUENCE COMPARISON; STATISTICAL SIGNIFICANCE; ALIGNMENT ALGORITHMS; PATTERN RECOGNITION;
D O I
10.1016/0022-2836(91)90193-A
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein sequence alignments have become an important tool for molecular biologists. Local alignments are frequently constructed with the aid of a "substitution score matrix" that specifies a score for aligning each pair of amino acid residues. Over the years, many different substitution matrices have been proposed, based on a wide variety of rationales. Statistical results, however, demonstrate that any such matrix is implicitly a "log-odds" matrix, with a specific target distribution for aligned pairs of amino acid residues. In the light of information theory, it is possible to express the scores of a substitution matrix in bits and to see that different matrices are better adapted to different purposes. The most widely used matrix for protein sequence comparison has been the PAM-250 matrix. It is argued that for database searches the PAM-120 matrix generally is more appropriate, while for comparing two specific proteins with suspected homology the PAM-200 matrix is indicated. Examples discussed include the lipocalins, human α1B-glycoprotein, the cystic fibrosis transmembrane conductance regulator and the globins. © 1991.
引用
收藏
页码:555 / 565
页数:11
相关论文
共 59 条
[1]   A NONLINEAR MEASURE OF SUBALIGNMENT SIMILARITY AND ITS SIGNIFICANCE LEVELS [J].
ALTSCHUL, SF ;
ERICKSON, BW .
BULLETIN OF MATHEMATICAL BIOLOGY, 1986, 48 (5-6) :617-632
[2]   PROTEIN DATABASE SEARCHES FOR MULTIPLE ALIGNMENTS [J].
ALTSCHUL, SF ;
LIPMAN, DJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1990, 87 (14) :5509-5513
[3]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]   A SENSITIVE PROCEDURE TO COMPARE AMINO-ACID-SEQUENCES [J].
ARGOS, P .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (02) :385-396
[5]   SEQUENCE AND TOPOLOGY OF A MODEL INTRACELLULAR MEMBRANE-PROTEIN, E1-GLYCOPROTEIN, FROM A CORONAVIRUS [J].
ARMSTRONG, J ;
NIEMANN, H ;
SMEEKENS, S ;
ROTTIER, P ;
WARREN, G .
NATURE, 1984, 308 (5961) :751-752
[6]   THE ERDOS-RENYI STRONG LAW FOR PATTERN-MATCHING WITH A GIVEN PROPORTION OF MISMATCHES [J].
ARRATIA, R ;
WATERMAN, MS .
ANNALS OF PROBABILITY, 1989, 17 (03) :1152-1169
[7]   STOCHASTIC SCRABBLE - LARGE DEVIATIONS FOR SEQUENCES WITH SCORES [J].
ARRATIA, R ;
MORRIS, P ;
WATERMAN, MS .
JOURNAL OF APPLIED PROBABILITY, 1988, 25 (01) :106-119
[8]   AN EXTREME VALUE THEORY FOR SEQUENCE MATCHING [J].
ARRATIA, R ;
GORDON, L ;
WATERMAN, M .
ANNALS OF STATISTICS, 1986, 14 (03) :971-993
[9]  
BOGUSKI MS, 1990, PROTEIN ENG PRACTICA, pCH5
[10]  
BROOKS DE, 1986, J BIOL CHEM, V261, P4956