Incorporation of biological knowledge into distance for clustering genes

被引:4
作者
Boratyn, Grzegorz M. [1 ]
Datta, Susmita [2 ]
Datta, Somnath [2 ]
机构
[1] Univ Louisville, Clin Prote Ctr, Louisville, KY 40202 USA
[2] Univ Louisville, Dept Bioinformat & Biostat, Louisville, KY 40202 USA
基金
美国国家科学基金会;
关键词
knowledge; distance; clustering; genes; expression;
D O I
10.6026/97320630001396
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In this paper we propose a data based algorithm to marry existing biological knowledge (e.g., functional annotations of genes) with experimental data (gene expression profiles) in creating an overall dissimilarity that can be used with any clustering algorithm that uses a general dissimilarity matrix. We explore this idea with two publicly available gene expression data sets and functional annotations where the results are compared with the clustering results that uses only the experimental data. Although more elaborate evaluations might be called for, the present paper makes a strong case for utilizing existing biological information in the clustering process.
引用
收藏
页码:396 / 405
页数:10
相关论文
共 9 条
[1]   Transcriptomic changes in human breast cancer progression as determined by serial analysis of gene expression [J].
Abba, MC ;
Drake, JA ;
Hawkins, KA ;
Hu, YH ;
Sun, HX ;
Notcovich, C ;
Gaddis, S ;
Sahin, A ;
Baggerly, K ;
Aldaz, CM .
BREAST CANCER RESEARCH, 2004, 6 (05) :R499-R513
[2]  
Boratyn G. M., 2006, P 28 IEEE EMBS ANN I, V1, P5515
[3]   The transcriptional program of sporulation in budding yeast [J].
Chu, S ;
DeRisi, J ;
Eisen, M ;
Mulholland, J ;
Botstein, D ;
Brown, PO ;
Herskowitz, I .
SCIENCE, 1998, 282 (5389) :699-705
[4]   Comparisons and validation of statistical clustering techniques for microarray gene expression data [J].
Datta, S ;
Datta, S .
BIOINFORMATICS, 2003, 19 (04) :459-466
[5]   Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes [J].
Datta, Susmita ;
Datta, Somnath .
BMC BIOINFORMATICS, 2006, 7 (1)
[6]  
Grira Nizar, 2005, REV MACHINE LEARNING
[7]   Computational cluster validation in post-genomic data analysis [J].
Handl, J ;
Knowles, J ;
Kell, DB .
BIOINFORMATICS, 2005, 21 (15) :3201-3212
[8]   Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data [J].
Huang, DS ;
Pan, W .
BIOINFORMATICS, 2006, 22 (10) :1259-1268
[9]  
Pihur V., 2006, PREPRINT