Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO

被引:67
作者
Datta, Susmita [1 ]
Le-Rademacher, Jennifer
Datta, Somnath
机构
[1] Univ Louisville, Dept Bioinformat & Biostat, Louisville, KY 40202 USA
[2] Univ Georgia, Dept Stat, Athens, GA 30602 USA
关键词
cancer; gene expression; partial least squares; right censoring; survival;
D O I
10.1111/j.1541-0420.2006.00660.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We consider the problem of predicting survival times of cancer patients from the gene expression profiles of their turner samples via linear regression modeling of log-transformed failure times. The partial least squares (PLS) and least absolute shrinkage and selection operator (LASSO) methodologies are used for this purpose where we first modify the data to account for censoring. Three approaches of handling right censored data-reweighting, mean imputation, and multiple irnputation-are considered. Their performances are examined in a detailed simulation study and compared with that of full data PLS and LASSO had there been no censoring. A major objective of this article is to investigate the performances of PLS and LASSO in the context of microarray data where the number of covariates is very large and there are extremely few samples. We demonstrate that LASSO outperforms PLS in terms of prediction error when the list of covariates includes a moderate to large percentage of useless or noise variables; otherwise, PLS may outperform LASSO. For a moderate sample size (100 with 10,000 covariates), LASSO performed better than a no covariate model (or noise-based prediction). The mean imputation method appears to best track the performance of the full data PLS or LASSO. The mean imputation scheme is used on an existing data set on lung cancer. This reanalysis using the mean imputed PLS and LASSO identifies a number of genes that were known to be related to cancer or tumor activities from previous studies.
引用
收藏
页码:259 / 271
页数:13
相关论文
共 35 条
[1]  
[Anonymous], 1989, MULTIVARIATE CALIBRA
[2]  
[Anonymous], 1966, Multivariate Analysis
[3]   Semi-supervised methods to predict patient survival from gene expression data [J].
Bair, E ;
Tibshirani, R .
PLOS BIOLOGY, 2004, 2 (04) :511-522
[4]   Gene-expression profiles predict survival of patients with lung adenocarcinoma [J].
Beer, DG ;
Kardia, SLR ;
Huang, CC ;
Giordano, TJ ;
Levin, AM ;
Misek, DE ;
Lin, L ;
Chen, GA ;
Gharib, TG ;
Thomas, DG ;
Lizyness, ML ;
Kuick, R ;
Hayasaka, S ;
Taylor, JMG ;
Iannettoni, MD ;
Orringer, MB ;
Hanash, S .
NATURE MEDICINE, 2002, 8 (08) :816-824
[5]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[6]  
Brown P. J., 1993, MEASUREMENT REGRESSI
[7]  
BUCKLEY J, 1979, BIOMETRIKA, V66, P429
[8]   Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia [J].
Bullinger, L ;
Döhner, K ;
Bair, E ;
Fröhling, S ;
Schlenk, RF ;
Tibshirani, R ;
Döhner, H ;
Pollack, JR .
NEW ENGLAND JOURNAL OF MEDICINE, 2004, 350 (16) :1605-1616
[9]   Exploring relationships in gene expressions: A partial least squares approach [J].
Datta, S .
GENE EXPRESSION, 2001, 9 (06) :249-255
[10]  
Datta S., 2005, Statistical Methodology, V2, P65, DOI DOI 10.1016/J.STAMET.2004.11.003