On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification

被引:74
作者
Biau, Gerard [1 ,2 ,3 ]
Devroye, Luc [4 ]
机构
[1] Univ Paris 06, LSTA, F-75252 Paris 05, France
[2] Univ Paris 06, LPMA, F-75252 Paris 05, France
[3] Ecole Normale Super, DMA, F-75230 Paris 05, France
[4] McGill Univ, Sch Comp Sci, Montreal, PQ H3A 2K6, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Regression estimation; Layered nearest neighbours; One nearest neighbour estimate; Bagging; Random forests; NUMBER; MAXIMA;
D O I
10.1016/j.jmva.2010.06.019
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Let X-1 X be identically distributed random vectors in R-d, independently drawn according to some probability density. An observation Xi is said to be a layered nearest neighbour (LNN) of a point x if the hyperrectangle defined by x and Xi contains no other data points. We first establish consistency results on L(x), the number of LNN of x. Then, given a sample (X, Y), (X-1, Y-1),, (X-n, Y-n) of independent identically distributed random vectors from Rd x R, one may estimate the regression function r(x) = E[Y] X = x by the LNN estimate r(n)(x), defined as an average over the Y's corresponding to those X, which are LNN of x. Under mild conditions on r, we establish the consistency of El r (x) r(x) towards 0 as n -> infinity, for almost all x and all p >= 1, and discuss the links between r and the random forest estimates of Breiman (2001) [8]. We finally show the universal consistency of the bagged (bootstrap-aggregated) nearest neighbour method for regression and classification. (c) 0 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:2499 / 2518
页数:20
相关论文
共 24 条
[1]   Shape quantization and recognition with randomized trees [J].
Amit, Y ;
Geman, D .
NEURAL COMPUTATION, 1997, 9 (07) :1545-1588
[2]  
[Anonymous], 2001, Computing Science and Statistics
[3]  
[Anonymous], 1998, PROB APPL S, DOI 10.1007/b98894
[4]   Maxima in hypercubes [J].
Bai, ZD ;
Devroye, L ;
Hwang, HK ;
Tsai, TH .
RANDOM STRUCTURES & ALGORITHMS, 2005, 27 (03) :290-309
[5]  
Bai ZD, 1998, ANN APPL PROBAB, V8, P886
[6]   ON DISTRIBUTION OF NUMBER OF ADMISSIBLE POINTS IN A VECTOR RANDOM SAMPLE [J].
BARNDORF.O ;
SOBEL, M .
THEORY OF PROBILITY AND ITS APPLICATIONS,USSR, 1966, 11 (02) :249-&
[7]  
Biau G, 2008, J MACH LEARN RES, V9, P2015
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]  
BREIMAN L, 2004, 670 UC BERK STAT DEP