Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms

被引:117
作者
Fernandez, Alberto [1 ]
Gomez, Sergio [1 ]
机构
[1] Univ Rovira & Virgili, Dept Engn Informat & Matemat, E-43007 Tarragona, Spain
关键词
agglomerative methods; cluster analysis; hierarchical classification; Lance and Williams' formula; ties in proximity;
D O I
10.1007/s00357-008-9004-x
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In agglomerative hierarchical clustering, pair-group methods suffer from a problem of non-uniqueness when two or more distances between different clusters coincide during the amalgamation process. The traditional approach for solving this drawback has been to take any arbitrary criterion in order to break ties between distances, which results in different hierarchical classifications depending on the criterion followed. In this article we propose a variable-group algorithm that consists in grouping more than two clusters at the same time when ties occur. We give a tree representation for the results of the algorithm, which we call a multidendrogram, as well as a generalization of the Lance and Williams' formula which enables the implementation of the algorithm in a recursive way.
引用
收藏
页码:43 / 65
页数:23
相关论文
共 12 条
[1]   Iterative cluster analysis of protein interaction data [J].
Arnau, V ;
Mars, S ;
Marín, I .
BIOINFORMATICS, 2005, 21 (03) :364-378
[2]   Multiple UPGMA and neighbor-joining trees and the performance of some computer packages [J].
Backeljau, T ;
DeBruyn, L ;
DeWolf, H ;
Jordaens, K ;
VanDongen, S ;
Winnepenninckx, B .
MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (02) :309-313
[3]   REVIEW OF CLASSIFICATION [J].
CORMACK, RM .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-GENERAL, 1971, 134 :321-+
[4]  
Gordon A, 1999, Classification
[5]  
Hart G., 1983, NATO ASI Series Series G Ecological Sciences, P254
[6]   A GENERALIZED SORTING STRATEGY FOR COMPUTER CLASSIFICATIONS [J].
LANCE, GN ;
WILLIAMS, WT .
NATURE, 1966, 212 (5058) :218-&
[7]   Ties in proximity and clustering compounds [J].
MacCuish, J ;
Nicolaou, C ;
MacCuish, NE .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (01) :134-146
[8]   NONUNIQUENESS AND INVERSIONS IN CLUSTER-ANALYSIS [J].
MORGAN, BJT ;
RAY, APG .
APPLIED STATISTICS-JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C, 1995, 44 (01) :117-134
[9]  
Sneath P. H. A., 1973, NUMERICAL TAXONOMY P
[10]   Hierarchical clustering via joint between-within distances:: Extending Ward's minimum variance method [J].
Székely, GJ ;
Rizzo, ML .
JOURNAL OF CLASSIFICATION, 2005, 22 (02) :151-183