FLEXIBLE PROTEIN-SEQUENCE PATTERNS - A SENSITIVE METHOD TO DETECT WEAK STRUCTURAL SIMILARITIES

被引:114
作者
BARTON, GJ
STERNBERG, MJE
机构
[1] UNIV LONDON BIRKBECK COLL, DEPT CRYSTALLOG, MOLEC BIOL LAB, LONDON WC1E 7HX, ENGLAND
[2] IMPERIAL CANC RES FUND, BIOMED COMP UNIT, LONDON WC2A 3PX, ENGLAND
[3] IMPERIAL CANC RES FUND, BIOMOLEC MODELLING LAB, LONDON WC2A 3PX, ENGLAND
关键词
D O I
10.1016/0022-2836(90)90133-7
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The concept of a flexible protein sequence pattern is defined. In contrast to conventional pattern matching, template or sequence alignment methods, flexible patterns allow residue patterns typical of a complete protein fold to be developed in terms of residue positions (elements), separated by gaps of defined range. An efficient dynamic programming algorithm is presented to enable the best alignment(s) of a pattern with a sequence to be identified. The flexible pattern method is evaluated in detail by reference to the globin protein family, and by comparison to alignment techniques that exploit single sequence multiple sequence and secondary structural information. A flexible pattern derived from seven globins aligned on structural criteria successfully discriminates all 345 globins from non-globins in the Protein Identification Resource database. Furthermore, a pattern that uses helical regions from just human α-haemoglobin identified 337 globins compared to 318 for the best non-pattern global alignment method. Patterns derived from successively fewer yet more highly conserved positions in a structural alignment of seven globins show that as few as 38 residue positions (25 buried hydrophobic, 4 exposed and 9 others) may be used to uniquely identify the globin fold. The study suggests that flexible patterns gain discriminating power both by discarding regions known to vary within the protein family and by defining gaps within specific ranges. Flexible patterns therefore provide a convenient and powerful bridge between regular expression pattern matching techniques and more conventional local and global sequence comparison algorithms. © 1990.
引用
收藏
页码:389 / 402
页数:14
相关论文
共 35 条
[1]   RAPID SEARCHES FOR COMPLEX PATTERNS IN BIOLOGICAL MOLECULES [J].
ABARBANEL, RM ;
WIENEKE, PR ;
MANSFIELD, E ;
JAFFE, DA ;
BRUTLAG, DL .
NUCLEIC ACIDS RESEARCH, 1984, 12 (01) :263-280
[2]   ASSESSMENT OF PROTEIN SECONDARY STRUCTURE PREDICTION METHODS BASED ON AMINO-ACID SEQUENCE [J].
ARGOS, P ;
SCHWARZ, J ;
SCHWARZ, J .
BIOCHIMICA ET BIOPHYSICA ACTA, 1976, 439 (02) :261-273
[3]   EVALUATION AND IMPROVEMENTS IN THE AUTOMATIC ALIGNMENT OF PROTEIN SEQUENCES [J].
BARTON, GJ ;
STERNBERG, MJE .
PROTEIN ENGINEERING, 1987, 1 (02) :89-94
[4]   A STRATEGY FOR THE RAPID MULTIPLE ALIGNMENT OF PROTEIN SEQUENCES - CONFIDENCE LEVELS FROM TERTIARY STRUCTURE COMPARISONS [J].
BARTON, GJ ;
STERNBERG, MJE .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 198 (02) :327-337
[5]  
BARTON GJ, 1987, THESIS U LONDON
[6]   DETERMINANTS OF A PROTEIN FOLD - UNIQUE FEATURES OF THE GLOBIN AMINO-ACID-SEQUENCES [J].
BASHFORD, D ;
CHOTHIA, C ;
LESK, AM .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 196 (01) :199-216
[7]  
BLUNDELL TL, 1987, NATURE, V326, P326
[8]  
BOSWELL DR, 1988, COMPUT APPL BIOSCI, V4, P345
[9]  
BROWNE WJ, 1969, J MOL BIOL, V120, P97
[10]  
COLLINS JF, 1988, COMPUT APPL BIOSCI, V4, P67