Weighting hidden Markov models for maximum discrimination

被引:23
作者
Karchin, R [1 ]
Hughey, R [1 ]
机构
[1] Univ Calif Santa Cruz, Jack Baskin Sch Engn, Dept Comp Engn, Santa Cruz, CA 95064 USA
关键词
D O I
10.1093/bioinformatics/14.9.772
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Hidden Markov models can efficiently and automatically build statistical representations of related sequences. Unfortunately, training sets are frequently biased toward one subgroup of sequences, lending to an insufficiently general model. This work evaluates sequence weighting methods based on the maximum-discrimination idea. Results: One good method scales sequence weights by an exponential that ranges between 0.1 for the best scoring sequence and 1.0 for the worst. Experiments with a curated data set show that while training with one or Two sequences performed worse than single-sequence Probabilistic Smith-Waterman, training with five or ten sequences reduced errors by 20% and 51%, respectively. This new version of the SAM HMM suite outperforms HMMer (17% reduction over PSW for 10 training sequences), Meta-MEME (28% reduction), and unweighted SAM (31% reduction).
引用
收藏
页码:772 / 782
页数:11
相关论文
共 28 条
[1]   Comparative accuracy of methods for protein sequence similarity search [J].
Agarwal, P ;
States, DJ .
BIOINFORMATICS, 1998, 14 (01) :40-47
[2]   AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE [J].
ALTSCHUL, SF .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) :555-565
[3]  
Altschul SF, 1996, METHOD ENZYMOL, V266, P460
[4]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[5]  
[Anonymous], P HAW INT C SYST SCI
[6]  
BAILEY T, 1994, ISMB, P28
[7]   HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION [J].
BALDI, P ;
CHAUVIN, Y ;
HUNKAPILLER, T ;
MCCLURE, MA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (03) :1059-1063
[8]  
Barrett C, 1997, COMPUT APPL BIOSCI, V13, P191
[9]   A flexible motif search technique based on generalized profiles [J].
Bucher, P ;
Karplus, K ;
Moeri, N ;
Hofmann, K .
COMPUTERS & CHEMISTRY, 1996, 20 (01) :3-23
[10]  
BUCHER P, 1994, ISMB 94, P53