Principal direction divisive partitioning

被引：198

作者：

Boley, D ^{[1
]}

机构：

[1] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA

来源：

DATA MINING AND KNOWLEDGE DISCOVERY | 1998年 / 2卷 / 04期

基金：

美国国家科学基金会;

关键词：

D O I：

10.1023/A:1009740529316

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a new algorithm capable of partitioning a set of documents or other samples based on an embedding in a high dimensional Euclidean space (i.e., in which every document is a vector of real numbers). The method is unusual in that it is divisive, as opposed to agglomerative, and operates by repeatedly splitting clusters into smaller clusters. The documents are assembled into a matrix which is very sparse, it is this sparsity that permits the algorithm to be very efficient. The performance of the method is illustrated with a set of text documents obtained from the World Wide Web. Some possible extensions are proposed for further investigation.

引用

页码：325 / 344

页数：20

共 23 条

[1]

Anderberg M.R., 1973, Probability and Mathematical Statistics

[2]

[Anonymous], ACM SIGIR C

[3] Using linear algebra for intelligent information retrieval [J].

Berry, MW ;

Dumais, ST ;

OBrien, GW .

SIAM REVIEW, 1995, 37 (04) :573-595

[4] A hierarchical latent variable model for data visualization [J].

Bishop, CM ;

Tipping, ME .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (03) :281-293

[5]

BOLEY D, 1998, EXPT PDDP SOFTWARE

[6]

BOLEY D, 1998, IN PRESS AI REV

[7]

Cheeseman P.C., 1996, ADV KNOWLEDGE DISCOV, V180, P153, DOI https://doi.org/10.5555/257938.257954

[8]

Cutting D.R., 1992, 15 ANN INT ACM SIGIR, P318

[9]

FRAKES W, 1992, INFORMATION RETRIEVA

[10]

Golub G.H., 2013, MATRIX COMPUTATIONS

← 1 2 3 →