Clustering gene expression profile data by selective shrinkage

被引:7
作者
Ishwaran, Hemant [1 ]
Rao, J. Sunil
机构
[1] Cleveland Clin, Cleveland, OH 44106 USA
[2] Case Western Reserve Univ, Cleveland, OH 44106 USA
关键词
D O I
10.1016/j.spl.2008.01.003
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Clustering of gene expression profiles is a widely used approach for finding macroscopic data structure. A complication in such analyses is that not all genes are informative for forming clusters and different clusters might have different transcription regulation. Driven by these considerations, we present a novel two-stage clustering approach. The first stage identifies informative genes by adaptive variable selection using pseudo-samples modeled by a high dimensional multigroup ANOVA model. Variables are selected using a rescaled spike and slab Bayesian hierarchical model having a special selective shrinkage property. The second stage Uses Output from the first stage for clustering. We demonstrate why selective shrinkage occurs, and by extension, why it is useful for the clustering paradigm. We analyze a human gene atlas expression dataset where the question of interest is to look for tissue-specific transcription regulation and investigate whether tissues can be grouped together due to similar genomic control. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:1490 / 1497
页数:8
相关论文
共 16 条
[1]   Incorporation of biological knowledge into distance for clustering genes [J].
Boratyn, Grzegorz M. ;
Datta, Susmita ;
Datta, Somnath .
BIOINFORMATION, 2007, 1 (10) :396-405
[2]   Empirical Bayes screening of many p-values with applications to microarray studies [J].
Datta, S ;
Datta, S .
BIOINFORMATICS, 2005, 21 (09) :1987-1994
[3]   Empirical Bayes analysis of a microarray experiment [J].
Efron, B ;
Tibshirani, R ;
Storey, JD ;
Tusher, V .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1151-1160
[4]  
FRIEDMAN JH, 2004, J ROYAL STAT SOC B, V66, P1, DOI DOI 10.1111/J.1467-9868.2004.02059.X
[5]   GENERAL COEFFICIENT OF SIMILARITY AND SOME OF ITS PROPERTIES [J].
GOWER, JC .
BIOMETRICS, 1971, 27 (04) :857-&
[6]  
Hastie T., 2000, Genome Biology, V1, pr
[7]   Detecting differentially expressed genes in microarrays using Bayesian model selection [J].
Ishwaran, H ;
Rao, JS .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (462) :438-455
[8]   BAMarray™:: Java']Java software for Bayesian analysis of variance for microarray data [J].
Ishwaran, H ;
Rao, JS ;
Kogalur, UB .
BMC BIOINFORMATICS, 2006, 7 (1)
[9]   Spike and slab gene selection for multigroup microarray data [J].
Ishwaran, H ;
Rao, JS .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (471) :764-780
[10]   Spike and slab variable selection: Frequentist and Bayesian strategies [J].
Ishwaran, H ;
Rao, JS .
ANNALS OF STATISTICS, 2005, 33 (02) :730-773