BACKWARD, FORWARD AND STEPWISE AUTOMATED SUBSET-SELECTION ALGORITHMS - FREQUENCY OF OBTAINING AUTHENTIC AND NOISE VARIABLES

被引:534
作者
DERKSEN, S [1 ]
KESELMAN, HJ [1 ]
机构
[1] UNIV MANITOBA,DEPT PSYCHOL,WINNIPEG R3T 2N2,MANITOBA,CANADA
关键词
D O I
10.1111/j.2044-8317.1992.tb00992.x
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The use of automated subset search algorithms is reviewed and issues concerning model selection and selection criteria are discussed. In addition, a Monte Carlo study is reported which presents data regarding the frequency with which authentic and noise variables are selected by automated subset algorithms. In particular, the effects of the correlation between predictor variables, the number of candidate predictor variables, the size of the sample, and the level of significance for entry and deletion of variables were studied for three automated subset algorithms: BACKWARD ELIMINATION, FORWARD SELECTION, and STEPWISE. Results indicated that: (1) the degree of correlation between the predictor variables affected the frequency with which authentic predictor variables found their way into the final model; (2) the number of candidate predictor variables affected the number of noise variables that gained entry to the model; (3) the size of the sample was of little practical importance in determining the number of authentic variables contained in the final model; and (4) the population multiple coefficient of determination could be faithfully estimated by adopting a statistic that is adjusted by the total number of candidate predictor variables rather than the number of variables in the final model.
引用
收藏
页码:265 / 282
页数:18
相关论文
共 46 条
[21]  
GALARNEAUGIBBONS D, 1981, J AM STAT ASSOC, V76, P131
[22]   ISSUES IN MULTIPLE REGRESSION [J].
GORDON, RA .
AMERICAN JOURNAL OF SOCIOLOGY, 1968, 73 (05) :592-616
[23]   ANALYSIS AND SELECTION OF VARIABLES IN LINEAR-REGRESSION [J].
HOCKING, RR .
BIOMETRICS, 1976, 32 (01) :1-49
[24]   A SIMULATION OF BIASED-ESTIMATION AND SUBSET-SELECTION REGRESSION TECHNIQUES [J].
HOERL, RW ;
SCHUENEMEYER, JH ;
HOERL, AE .
TECHNOMETRICS, 1986, 28 (04) :369-380
[25]  
Huberty C.J., 1989, ADV SOCIAL SCI METHO, V1, P43
[26]   MODEL BUILDING FOR PREDICTION IN REGRESSION BASED UPON REPEATED SIGNIFICANCE TESTS [J].
KENNEDY, WJ ;
BANCROFT, TA .
ANNALS OF MATHEMATICAL STATISTICS, 1971, 42 (04) :1273-&
[27]   DATA MINING [J].
LOVELL, MC .
REVIEW OF ECONOMICS AND STATISTICS, 1983, 65 (01) :1-12
[28]   A FAST PROCEDURE FOR GENERATING NORMAL RANDOM VARIABLES [J].
MARSAGLIA, G ;
MACLAREN, MD ;
BRAY, TA .
COMMUNICATIONS OF THE ACM, 1964, 7 (01) :4-10
[29]   MONTE-CARLO EVALUATION OF SOME RIDGE-TYPE ESTIMATORS [J].
MCDONALD, GC ;
GALARNEAU, DI .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1975, 70 (350) :407-416
[30]   SELECTION OF SUBSETS OF REGRESSION VARIABLES [J].
MILLER, AJ .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1984, 147 :389-425