Integrative Analysis of Cancer Diagnosis Studies with Composite Penalization

被引:41
作者
Liu, Jin [1 ]
Ma, Shuangge [1 ]
Huang, Jian [2 ]
机构
[1] Yale Univ, Sch Publ Hlth, New Haven, CT 06520 USA
[2] Univ Iowa, Dept Stat & Actuarial Sci & Biostat, Iowa City, IA 52242 USA
关键词
composite penalization; cancer diagnosis studies; gene expression; integrative analysis; GENE-EXPRESSION PATTERNS; VARIABLE SELECTION; IDENTIFICATION; CLASSIFICATION; CONVERGENCE; REGRESSION;
D O I
10.1111/j.1467-9469.2012.00816.x
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In cancer diagnosis studies, high-throughput gene profiling has been extensively conducted, searching for genes whose expressions may serve as markers. Data generated from such studies have the 'large d, small n' feature, with the number of genes profiled much larger than the sample size. Penalization has been extensively adopted for simultaneous estimation and marker selection. Because of small sample sizes, markers identified from the analysis of single data sets can be unsatisfactory. A cost-effective remedy is to conduct integrative analysis of multiple heterogeneous data sets. In this article, we investigate composite penalization methods for estimation and marker selection in integrative analysis. The proposed methods use the minimax concave penalty (MCP) as the outer penalty. Under the homogeneity model, the ridge penalty is adopted as the inner penalty. Under the heterogeneity model, the Lasso penalty and MCP are adopted as the inner penalty. Effective computational algorithms based on coordinate descent are developed. Numerical studies, including simulation and analysis of practical cancer data sets, show satisfactory performance of the proposed methods.
引用
收藏
页码:87 / 103
页数:17
相关论文
共 29 条
[1]  
[Anonymous], CLASSICS APPL MATH
[2]  
[Anonymous], METAANALYSIS COMBINI
[3]   Identification and classification of differentially expressed genes in renal cell carcinoma by expression profiling on a global human 31,500-element cDNA array [J].
Boer, JM ;
Huber, WK ;
Sültmann, H ;
Wilmer, F ;
von Heydebreck, A ;
Haas, S ;
Korn, B ;
Gunawan, B ;
Vente, A ;
Füzesi, L ;
Vingron, M ;
Poustka, A .
GENOME RESEARCH, 2001, 11 (11) :1861-1870
[4]   COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION [J].
Breheny, Patrick ;
Huang, Jian .
ANNALS OF APPLIED STATISTICS, 2011, 5 (01) :232-253
[5]  
Breheny P, 2009, STAT INTERFACE, V2, P369
[6]  
Bühlmann P, 2011, SPRINGER SER STAT, P1, DOI 10.1007/978-3-642-20192-9
[7]   Gene expression patterns in human liver cancers [J].
Chen, X ;
Cheung, ST ;
So, S ;
Fan, ST ;
Barry, C ;
Higgins, J ;
Lai, KM ;
Ji, JF ;
Dudoit, S ;
Ng, IOL ;
van de Rijn, M ;
Botstein, D ;
Brown, PO .
MOLECULAR BIOLOGY OF THE CELL, 2002, 13 (06) :1929-1939
[8]  
Chen X, 2003, MOL BIOL CELL, V14, P3208, DOI 10.1091/mbc.E02-12-0833
[9]   Integrative analysis of multiple gene expression profiles applied to liver cancer study [J].
Choi, JK ;
Choi, JY ;
Kim, DG ;
Choi, DW ;
Kim, BY ;
Lee, KH ;
Yeom, YI ;
Yoo, HS ;
Yoo, OJ ;
Kim, S .
FEBS LETTERS, 2004, 565 (1-3) :93-100
[10]   Regularization Paths for Generalized Linear Models via Coordinate Descent [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Rob .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01) :1-22