Model selection and the principle of minimum description length

被引:444
作者
Hansen, MH [1 ]
Yu, B [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
AIC; Bayesian methods; Bayes information criterion; cluster analysis; code length; coding redundancy; information theory; model selection; pointwise and minimax lower bounds; regression; time series;
D O I
10.1198/016214501753168398
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This article reviews the principle of minimum description length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed attention within the statistics community. Here we review both the practical and the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines we find many interesting interpretations of popular frequentist and Bayesian procedures. As we show, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can coexist and be compared. We illustrate the MDL principle by considering problems in regression, nonparametric curve estimation, cluster analysis, and time series analysis. Because model selection in linear regression is an extremely common problem that arises in many applications, we present detailed derivations of several MDL criteria in this context and discuss their properties through a number of examples. Our emphasis is on the practical application of MDL, and hence we make extensive use of real datasets. in writing this review, we tried to make the descriptive philosophy of MDL natural to a statistics audience by examining classical problems in model selection. In the engineering literature, however, MDL is being applied to ever more exotic modeling situations. As a principle for statistical modeling in general, one strength of MDL is that it can be intuitively extended to provide useful toots for new problems.
引用
收藏
页码:746 / 774
页数:29
相关论文
共 105 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   OBJECTIVE USE OF BAYESIAN MODELS [J].
AKAIKE, H .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1977, 29 (01) :9-20
[3]  
An H, 1985, ACTA MATH APPL SIN-E, V2, P27
[4]   The minimum description length principle in coding and modeling [J].
Barron, A ;
Rissanen, J ;
Yu, B .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (06) :2743-2760
[5]  
BAXTER R, 1995, UNPUB MDL MML SIMI 3
[6]   The intrinsic Bayes factor for model selection and prediction [J].
Berger, JO ;
Pericchi, LR .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1996, 91 (433) :109-122
[7]  
BERNARDO JM, 1994, BAYESIAN TEHORY
[8]  
Brockwell P. J., 1991, TIME SERIES THEORY M
[9]  
BROMAN KW, 1997, THESIS U CALIFORNIA
[10]   A MONTE-CARLO APPROACH TO NONNORMAL AND NONLINEAR STATE-SPACE MODELING [J].
CARLIN, BP ;
POLSON, NG ;
STOFFER, DS .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1992, 87 (418) :493-500