Incorporating pathway information into boosting estimation of high-dimensional risk prediction models

被引:62
作者
Binder, Harald [1 ,2 ]
Schumacher, Martin [1 ]
机构
[1] Univ Med Ctr Freiburg, Dept Med Biometry & Biostat, D-79104 Freiburg, Germany
[2] Univ Freiburg, Freiburg Ctr Data Anal & Modeling, D-79104 Freiburg, Germany
关键词
NETWORK-CONSTRAINED REGULARIZATION; VARIABLE SELECTION; REGRESSION-MODELS; GENOMIC DATA; SURVIVAL;
D O I
10.1186/1471-2105-10-18
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: There are several techniques for fitting risk prediction models to high-dimensional data, arising from microarrays. However, the biological knowledge about relations between genes is only rarely taken into account. One recent approach incorporates pathway information, available, e. g., from the KEGG database, by augmenting the penalty term in Lasso estimation for continuous response models. Results: As an alternative, we extend componentwise likelihood-based boosting techniques for incorporating pathway information into a larger number of model classes, such as generalized linear models and the Cox proportional hazards model for time-to-event data. In contrast to Lasso-like approaches, no further assumptions for explicitly specifying the penalty structure are needed, as pathway information is incorporated by adapting the penalties for single microarray features in the course of the boosting steps. This is shown to result in improved prediction performance when the coefficients of connected genes have opposite sign. The properties of the fitted models resulting from this approach are then investigated in two application examples with microarray survival data. Conclusion: The proposed approach results not only in improved prediction performance but also in structurally different model fits. Incorporating pathway information in the suggested way is therefore seen to be beneficial in several ways.
引用
收藏
页数:11
相关论文
共 27 条
[1]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]   Oncogenic pathway signatures in human cancers as a guide to targeted therapies [J].
Bild, AH ;
Yao, G ;
Chang, JT ;
Wang, QL ;
Potti, A ;
Chasse, D ;
Joshi, MB ;
Harpole, D ;
Lancaster, JM ;
Berchuck, A ;
Olson, JA ;
Marks, JR ;
Dressman, HK ;
West, M ;
Nevins, JR .
NATURE, 2006, 439 (7074) :353-357
[3]  
BINDER H, 2008, STAT APPL GENET MOL, P7
[4]   Comment on 'network-constrained regularization and variable selection for analysis of genomic data' [J].
Binder, Harald ;
Schumacher, Martin .
BIOINFORMATICS, 2008, 24 (21) :2566-2568
[5]   Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models [J].
Binder, Harald ;
Schumacher, Martin .
BMC BIOINFORMATICS, 2008, 9 (1)
[6]   Boosting algorithms: Regularization, prediction and model fitting [J].
Buehlmann, Peter ;
Hothorn, Torsten .
STATISTICAL SCIENCE, 2007, 22 (04) :477-505
[7]   Boosting with the L2 loss:: Regression and classification [J].
Bühlmann, P ;
Yu, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (462) :324-339
[8]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[9]  
Freund Y., 1996, INT C MACH LEARN ICM
[10]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232