Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data

被引:15
作者
Costa, Ivan G. [1 ]
Krause, Roland [1 ,3 ]
Opitz, Lennart [2 ]
Schliep, Alexander [1 ]
机构
[1] Max Planck Inst Mol Genet, Dept Computat Mol Biol, Berlin, Germany
[2] Univ Gottingen, Abt Entwicklungsbiochem, Gottingen, Germany
[3] Max Planck Inst Infect Biol, Dept Cellular Microbiol, Berlin, Germany
关键词
D O I
10.1186/1471-2105-8-S10-S3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns. Results: Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results. Conclusion: Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance.
引用
收藏
页数:15
相关论文
共 42 条
[1]  
[Anonymous], SUPPLEMENTARY MAT
[2]  
[Anonymous], 2006, KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, DOI DOI 10.1145/1150402.1150489
[3]   Gene expression during the life cycle of Drosophila melanogaster [J].
Arbeitman, MN ;
Furlong, EEM ;
Imam, F ;
Johnson, E ;
Null, BH ;
Baker, BS ;
Krasnow, MA ;
Scott, MP ;
Davis, RW ;
White, KP .
SCIENCE, 2002, 297 (5590) :2270-2275
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]   Analyzing time series gene expression data [J].
Bar-Joseph, Z .
BIOINFORMATICS, 2004, 20 (16) :2493-2503
[6]   Continuous representations of time-series gene expression data [J].
Bar-Joseph, Z ;
Gerber, GK ;
Gifford, DK ;
Jaakkola, TS ;
Simon, I .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (3-4) :341-356
[7]   GOstat: find statistically overrepresented Gene Ontologies within a group of genes [J].
Beissbarth, T ;
Speed, TP .
BIOINFORMATICS, 2004, 20 (09) :1464-1465
[8]  
BILMES J, 1997, TECH REP INT COMPUTE
[9]  
Chapelle O., 2006, SEMISUPERVISED LEARN, DOI DOI 10.1109/TNN.2009.2015974
[10]  
COSTA I, 2006, P ECML PKDD 2006 WOR, P55