Sure independence screening for ultrahigh dimensional feature space

被引:1841
作者
Fan, Jianqing [1 ]
Lv, Jinchi [2 ]
机构
[1] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[2] Univ So Calif, Los Angeles, CA USA
基金
美国国家科学基金会;
关键词
Adaptive lasso; Dantzig selector; Dimensionality reduction; Lasso; Oracle estimator; Smoothly clipped absolute deviation; Sure independence screening; Sure screening; Variable selection;
D O I
10.1111/j.1467-9868.2008.00674.x
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Variable selection plays an important role in high dimensional statistical modelling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, accuracy of estimation and computational cost are two top concerns. Recently, Candes and Tao have proposed the Dantzig selector using L-1-regularization and showed that it achieves the ideal risk up to a logarithmic factor log(p). Their innovative procedure and remarkable result are challenged when the dimensionality is ultrahigh as the factor log(p) can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size. In a fairly general asymptotic framework, correlation learning is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, iterative sure independence screening is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be accomplished by a well-developed method such as smoothly clipped absolute deviation, the Dantzig selector, lasso or adaptive lasso. The connections between these penalized least squares methods are also elucidated.
引用
收藏
页码:849 / 883
页数:35
相关论文
共 61 条
[1]  
[Anonymous], 2000, AID MEM LECT AM MATH
[2]   Regularization of wavelet approximations - Rejoinder [J].
Antoniadis, A ;
Fan, J .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (455) :964-967
[3]  
Bai ZD, 1999, STAT SINICA, V9, P611
[4]   LIMIT OF THE SMALLEST EIGENVALUE OF A LARGE DIMENSIONAL SAMPLE COVARIANCE-MATRIX [J].
BAI, ZD ;
YIN, YQ .
ANNALS OF PROBABILITY, 1993, 21 (03) :1275-1294
[5]  
BARON D, 2005, DISTRIBUTED CO UNPUB
[6]   Approximation and learning by greedy algorithms [J].
Barron, Andrew R. ;
Cohen, Albert ;
Dahmen, Wolfgang ;
DeVore, Ronald A. .
ANNALS OF STATISTICS, 2008, 36 (01) :64-94
[7]   Regularized estimation of large covariance matrices [J].
Bickel, Peter J. ;
Levina, Elizaveta .
ANNALS OF STATISTICS, 2008, 36 (01) :199-227
[8]   Some theory for Fisher's linear discriminant function, 'naive Bayes', and some alternatives when there are many more variables than observations [J].
Bickel, PJ ;
Levina, E .
BERNOULLI, 2004, 10 (06) :989-1010
[9]  
BICKEL PJ, 2008, ANN STAT IN PRESS, V36
[10]  
Breiman L, 1996, ANN STAT, V24, P2350