ProfilePSTMM: capturing tree-structure motifs in carbohydrate sugar chains

被引:17
作者
Aoki-Kinoshita, Kiyoko F. [1 ]
Ueda, Nobuhisa [1 ]
Mamitsuka, Hiroshi [1 ]
Kanehisa, Minoru [1 ]
机构
[1] Kyoto Univ, Inst Chem Res, Bioinformat Ctr, Kyoto, Japan
关键词
D O I
10.1093/bioinformatics/btl244
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Carbohydrate sugar chains, or glycans, are considered the third major class of biomolecules after DNA and proteins. They consist of branching monosaccharides, starting from a single monosaccharide. They are extremely vital to the development and functioning of multicellular organisms because they are recognized by various proteins to allow them to perform specific functions. Our motivation is to study this recognition mechanism using informatics techniques from the data available. Previously, we introduced a probabilistic sibling-dependent tree Markov model (PSTMM), which we showed could be efficiently trained on sibling-dependent tree structures and return the most likely state paths. However, it had some limitations in that the extra dependency between siblings caused overfitting problems. The retrieval of the patterns from the trained model also involved manually extracting the patterns from the most likely state paths. Thus we introduce a profilePSTMM model which avoids these problems, incorporating a novel concept of different types of state transitions to handle parent-child and sibling dependencies differently. Results: Our new algorithms are more efficient and able to extract the patterns more easily. We tested the profilePSTMM model on both synthetic (controlled) data as well as glycan data from the KEGG GLYCAN database. Additionally, we tested it on glycans which are known to be recognized and bound to proteins at various binding affinities, and we show that our results correlate with results published in the literature.
引用
收藏
页码:E25 / E34
页数:10
相关论文
共 24 条
[1]   A score matrix to reveal the hidden links in glycans [J].
Aoki, KF ;
Mamitsuka, H ;
Akutsu, T ;
Kanehisa, M .
BIOINFORMATICS, 2005, 21 (08) :1457-1463
[2]   KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains [J].
Aoki, KF ;
Yamaguchi, A ;
Ueda, N ;
Akutsu, T ;
Mamitsuka, H ;
Goto, S ;
Kanehisa, M .
NUCLEIC ACIDS RESEARCH, 2004, 32 :W267-W272
[3]  
AOKI KF, 2004, P 12 ISMB
[4]  
Aoki Kiyoko F, 2003, Genome Inform, V14, P134
[5]   Glycan array screening reveals a candidate ligand for Siglec-8 [J].
Bochner, BS ;
Alvarez, RA ;
Mehta, P ;
Bovin, NV ;
Blixt, O ;
White, JR ;
Schnaar, RL .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2005, 280 (06) :4307-4312
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]  
Diligenti M, 2003, IEEE T PATTERN ANAL, V25, P519, DOI 10.1109/TPAMI.2003.1190578
[8]   THE COMPLEX CARBOHYDRATE STRUCTURE DATABASE [J].
DOUBET, S ;
BOCK, K ;
SMITH, D ;
DARVILL, A ;
ALBERSHEIM, P .
TRENDS IN BIOCHEMICAL SCIENCES, 1989, 14 (12) :475-477
[9]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[10]   Hidden Markov models [J].
Eddy, SR .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1996, 6 (03) :361-365