A new differential LSI space-based probabilistic document classifier

被引:19
作者
Chen, L [1 ]
Tokuda, N
Nagai, A
机构
[1] Univ No British Columbia, Dept Comp Sci, Prince George, BC V2N 4Z9, Canada
[2] Sunflare Co, Sinjuku Ku, Tokyo 1600004, Japan
[3] Utsunomiya Univ, Adv Media Network Ctr, Utsunomiya, Tochigi 3218585, Japan
基金
加拿大自然科学与工程研究理事会;
关键词
document classification; latent semantic indexing; differential document vector; information retrieval;
D O I
10.1016/j.ipl.2003.09.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We have developed a new effective probabilistic classifier for document classification by introducing the concept of differential document vectors and DLSI (differential latent semantic indexing) spaces. A combined use of the projections on and the distances to the DLSI spaces introduced from the differential document vectors improves the adaptability of the LSI (latent semantic indexing) method by capturing unique characteristics of documents. Using the intra- and extra-document statistics, both a simple posteriori calculation on a small example and an experiment on a large Reuters-21578 database demonstrate the advantage of the DLSI space-based probabilistic classifier over the LSI space-based classifier in classification performance. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:203 / 212
页数:10
相关论文
共 12 条
[1]  
AIZAWA A, 2001, P 6 NAT LANG PROC PA, P307
[2]   Matrices, vector spaces, and information retrieval [J].
Berry, MW ;
Drmac, Z ;
Jessup, ER .
SIAM REVIEW, 1999, 41 (02) :335-362
[3]  
FARKAS J, 1994, CAN C EL COMP ENG, V2, P710
[4]   Document ranking and the vector-space model [J].
Lee, DL ;
Chuang, H ;
Seamons, K .
IEEE SOFTWARE, 1997, 14 (02) :67-75
[5]  
LEWIS DD, 1992, 9193 U MASS COMP SCI
[6]   Beyond eigenfaces: Probabilistic matching for face recognition [J].
Moghaddam, B ;
Wahid, W ;
Pentland, A .
AUTOMATIC FACE AND GESTURE RECOGNITION - THIRD IEEE INTERNATIONAL CONFERENCE PROCEEDINGS, 1998, :30-35
[7]   Text classification from labeled and unlabeled documents using EM [J].
Nigam, K ;
McCallum, AK ;
Thrun, S ;
Mitchell, T .
MACHINE LEARNING, 2000, 39 (2-3) :103-134
[8]  
Schutze H, 1997, PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P74, DOI 10.1145/278459.258539
[9]   LOW-DIMENSIONAL PROCEDURE FOR THE CHARACTERIZATION OF HUMAN FACES [J].
SIROVICH, L ;
KIRBY, M .
JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1987, 4 (03) :519-524
[10]   The use of bigrams to enhance text categorization [J].
Tan, CM ;
Wang, YF ;
Lee, CD .
INFORMATION PROCESSING & MANAGEMENT, 2002, 38 (04) :529-546