A method for initialising the K-means clustering algorithm using kd-trees

被引:212
作者
Redmond, Stephen J. [1 ]
Heneghan, Conor [1 ]
机构
[1] Univ Coll Dublin, Dept Elect Engn, Dublin 4, Ireland
关键词
clustering; K-means algorithm; kd-tree; initialisation; density estimation;
D O I
10.1016/j.patrec.2007.01.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method for initialising the K-means clustering algorithm. Our method hinges on the use of a kd-tree to perform a density estimation of the data at various locations. We then use a modification of Katsavounidis' algorithm, which incorporates this density information, to choose K seeds for the K-means algorithm. We test our algorithm on 36 synthetic datasets, and 2 datasets from the UCI Machine Learning Repository, and compare with 15 runs of Forgy's random initialisation method, Katsavounidis' algorithm, and Bradley and Fayyad's method. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:965 / 973
页数:9
相关论文
共 18 条
[1]   New methods for the initialisation of clusters [J].
AlDaoud, MB ;
Roberts, SA .
PATTERN RECOGNITION LETTERS, 1996, 17 (05) :451-455
[2]  
Anderberg M. R., 1973, CLUSTER ANAL APPL, DOI [10.1016/C2013-0-06161-0, DOI 10.1016/C2013-0-06161-0]
[3]   A NEAR-OPTIMAL INITIAL SEED VALUE SELECTION IN K-MEANS ALGORITHM USING A GENETIC ALGORITHM [J].
BABU, GP ;
MURTY, MN .
PATTERN RECOGNITION LETTERS, 1993, 14 (10) :763-769
[4]  
Blake C.L., 1998, UCI repository of machine learning databases
[5]  
BRADLEY PS, 1998, P 15 INT C MACH LEAR, P91
[6]   Initialization of cluster refinement algorithms: A review and comparative study [J].
He, J ;
Lan, M ;
Tan, CL ;
Sung, SY ;
Low, HB .
2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, :297-302
[7]   A comparison of several vector quantization codebook generation approaches [J].
Huang, C. -M. ;
Harris, R. W. .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 1993, 2 (01) :108-112
[8]   Data clustering: A review [J].
Jain, AK ;
Murty, MN ;
Flynn, PJ .
ACM COMPUTING SURVEYS, 1999, 31 (03) :264-323
[9]   A New Initialization Technique for Generalized Lloyd Iteration [J].
Katsavounidis, Ioannis ;
Kuo, C. -C. Jay ;
Zhang, Zhen .
IEEE SIGNAL PROCESSING LETTERS, 1994, 1 (10) :144-146
[10]  
Kaufman L., 1990, Finding Groups in Data: An Introduction to Cluster Analysis, DOI DOI 10.1002/9780470316801