A method for initialising the K-means clustering algorithm using kd-trees

被引:212
作者
Redmond, Stephen J. [1 ]
Heneghan, Conor [1 ]
机构
[1] Univ Coll Dublin, Dept Elect Engn, Dublin 4, Ireland
关键词
clustering; K-means algorithm; kd-tree; initialisation; density estimation;
D O I
10.1016/j.patrec.2007.01.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method for initialising the K-means clustering algorithm. Our method hinges on the use of a kd-tree to perform a density estimation of the data at various locations. We then use a modification of Katsavounidis' algorithm, which incorporates this density information, to choose K seeds for the K-means algorithm. We test our algorithm on 36 synthetic datasets, and 2 datasets from the UCI Machine Learning Repository, and compare with 15 runs of Forgy's random initialisation method, Katsavounidis' algorithm, and Bradley and Fayyad's method. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:965 / 973
页数:9
相关论文
共 18 条
[11]   Cluster center initialization algorithm for K-means clustering [J].
Khan, SS ;
Ahmad, A .
PATTERN RECOGNITION LETTERS, 2004, 25 (11) :1293-1302
[12]  
Likas A, 2003, PATTERN RECOGN, V36, P451, DOI 10.1016/S0031-3203(02)00060-2
[13]   ALGORITHM FOR VECTOR QUANTIZER DESIGN [J].
LINDE, Y ;
BUZO, A ;
GRAY, RM .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1980, 28 (01) :84-95
[14]  
MacQueen J, 1967, P 5 BERKELEY S MATH, V1, P281
[15]   Density-based multiscale data condensation [J].
Mitra, P ;
Murthy, CA ;
Pal, SK .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (06) :734-747
[16]   An empirical comparison of four initialization methods for the K-Means algorithm [J].
Peña, JM ;
Lozano, JA ;
Larrañaga, P .
PATTERN RECOGNITION LETTERS, 1999, 20 (10) :1027-1040
[17]  
THIESSON B, 1997, TR9730 MICR
[18]  
Tou T., 1974, PATTERN RECOGN, DOI DOI 10.1002/ZAMM.19770570626