Principal direction divisive partitioning

被引:198
作者
Boley, D [1 ]
机构
[1] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
基金
美国国家科学基金会;
关键词
D O I
10.1023/A:1009740529316
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a new algorithm capable of partitioning a set of documents or other samples based on an embedding in a high dimensional Euclidean space (i.e., in which every document is a vector of real numbers). The method is unusual in that it is divisive, as opposed to agglomerative, and operates by repeatedly splitting clusters into smaller clusters. The documents are assembled into a matrix which is very sparse, it is this sparsity that permits the algorithm to be very efficient. The performance of the method is illustrated with a set of text documents obtained from the World Wide Web. Some possible extensions are proposed for further investigation.
引用
收藏
页码:325 / 344
页数:20
相关论文
共 23 条
[1]  
Anderberg M.R., 1973, Probability and Mathematical Statistics
[2]  
[Anonymous], ACM SIGIR C
[3]   Using linear algebra for intelligent information retrieval [J].
Berry, MW ;
Dumais, ST ;
OBrien, GW .
SIAM REVIEW, 1995, 37 (04) :573-595
[4]   A hierarchical latent variable model for data visualization [J].
Bishop, CM ;
Tipping, ME .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (03) :281-293
[5]  
BOLEY D, 1998, EXPT PDDP SOFTWARE
[6]  
BOLEY D, 1998, IN PRESS AI REV
[7]  
Cheeseman P.C., 1996, ADV KNOWLEDGE DISCOV, V180, P153, DOI https://doi.org/10.5555/257938.257954
[8]  
Cutting D.R., 1992, 15 ANN INT ACM SIGIR, P318
[9]  
FRAKES W, 1992, INFORMATION RETRIEVA
[10]  
Golub G.H., 2013, MATRIX COMPUTATIONS