Identification of cancer genomic markers via integrative sparse boosting

被引:21
作者
Huang, Yuan [2 ]
Huang, Jian [3 ]
Shia, Ben-Chang [4 ]
Ma, Shuangge [1 ]
机构
[1] Yale Univ, Sch Publ Hlth, New Haven, CT 06520 USA
[2] Penn State Univ, Dept Stat, State Coll, PA 16801 USA
[3] Univ Iowa, Dept Stat & Actuarial Sci, Iowa City, IA 52242 USA
[4] Fu Jen Catholic Univ, Dept Stat & Informat Sci, Taipei 24205, Taiwan
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Cancer genomics; Marker identification; Sparse boosting; GENE-EXPRESSION; TUMOR CLASSIFICATION; MICROARRAY DATA; METAANALYSIS; PROFILES; REGULARIZATION; SELECTION;
D O I
10.1093/biostatistics/kxr033
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In high-throughput cancer genomic studies, markers identified from the analysis of single data sets often suffer a lack of reproducibility because of the small sample sizes. An ideal solution is to conduct large-scale prospective studies, which are extremely expensive and time consuming. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. Integrative analysis of multiple data sets is challenging because of the high dimensionality of genomic measurements and heterogeneity among studies. In this article, we propose a sparse boosting approach for marker identification in integrative analysis of multiple heterogeneous cancer diagnosis studies with gene expression measurements. The proposed approach can effectively accommodate the heterogeneity among multiple studies and identify markers with consistent effects across studies. Simulation shows that the proposed approach has satisfactory identification results and outperforms alternatives including an intensity approach and meta-analysis. The proposed approach is used to identify markers of pancreatic cancer and liver cancer.
引用
收藏
页码:509 / 522
页数:14
相关论文
共 28 条
[1]  
[Anonymous], METAANALYSIS COMBINI
[2]  
Berk RA, 2008, SPRINGER SER STAT, P1, DOI 10.1007/978-0-387-77501-2_1
[3]   Boosting algorithms: Regularization, prediction and model fitting [J].
Buehlmann, Peter ;
Hothorn, Torsten .
STATISTICAL SCIENCE, 2007, 22 (04) :477-505
[4]  
Bühlmann P, 2006, J MACH LEARN RES, V7, P1001
[5]   Boosting [J].
Buhlmann, Peter ;
Yu, Bin .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (01) :69-74
[6]   A latent variable approach for meta-analysis of gene expression data from multiple microarray experiments [J].
Choi, Hyungwon ;
Shen, Ronglai ;
Chinnaiyan, Arul M. ;
Ghosh, Debashis .
BMC BIOINFORMATICS, 2007, 8 (1)
[7]   Integrative analysis of multiple gene expression profiles applied to liver cancer study [J].
Choi, JK ;
Choi, JY ;
Kim, DG ;
Choi, DW ;
Kim, BY ;
Lee, KH ;
Yeom, YI ;
Yoo, HS ;
Yoo, OJ ;
Kim, S .
FEBS LETTERS, 2004, 565 (1-3) :93-100
[8]   Molecular alterations in pancreatic carcinoma: expression profiling shows that dysregulated expression of S100 genes is highly prevalent [J].
Crnogorac-Jurcevic, T ;
Missiaglia, E ;
Blaveri, E ;
Gangeswaran, R ;
Jones, M ;
Terris, B ;
Costello, F ;
Neoptolemos, JP ;
Lemoine, NR .
JOURNAL OF PATHOLOGY, 2003, 201 (01) :63-74
[9]   Boosting for tumor classification with gene expression data [J].
Dettling, M ;
Bühlmann, P .
BIOINFORMATICS, 2003, 19 (09) :1061-1069
[10]   BagBoosting for tumor classification with gene expression data [J].
Dettling, M .
BIOINFORMATICS, 2004, 20 (18) :3583-3593