AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION

被引：463

作者：

APTE, C ^{[1
]}

DAMERAU, F ^{[1
]}

WEISS, SM ^{[1
]}

机构：

[1] RUTGERS STATE UNIV,DEPT COMP SCI,NEW BRUNSWICK,NJ 08903

来源：

ACM TRANSACTIONS ON INFORMATION SYSTEMS | 1994年 / 12卷 / 03期

关键词：

EXPERIMENTATION; MEASUREMENT; PERFORMANCE;

D O I：

10.1145/183422.183423

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We describe the results of extensive experiments using optimized rule-based induction methods on large document collections. The goal of these methods is to discover automatically classification patterns that can be used for general document categorization or personalized filtering of free text. Previous reports indicate that human-engineered rule-based systems, requiring many man-years of developmental efforts, have been successfully built to ''read'' documents and assign topics to them. We show that machine-generated decision rules appear comparable to human performance, while using the identical rule-based representation. In comparison with other machine-learning techniques, results on a key benchmark from the Reuters collection show a large gain in performance, from a previously reported 67% recall/precision breakeven point to 80.5%. In the context of a very high-dimensional feature space, several methodological alternatives are examined, including universal versus local dictionaries, and binary versus frequency-related features.

引用

页码：233 / 251

页数：19

共 24 条

[1]

APTE C, 1993, 1993 WORK NOT AAAI W, P326

[2]

BIEBRICHER P, 1988, ACM SIGIR 11 INT C R, P333

[3]

Breiman L., 1984, CLASSIFICATION REGRE

[4]

Church K., 1989, 27TH P ANN M ASS COM, P76

[5]

Clark P., 1989, Machine Learning, V3, P261, DOI 10.1023/A:1022641700528

[6]

FLOWER M, 1992, 3RD AUSTR C NEUR NET

[7]

FUHR N, 1991, ACM SIGIR 91, P46

[8]

FUNG R, 1990, ACM SIGIR 90, P455

[9]

HAYES P, 1991, 1ST P INT C ART INT, P2

[10]

Hayes P. J., 1990, Sixth Conference on Artificial Intelligence Applications (Cat. No.90CH2842-3), P320, DOI 10.1109/CAIA.1990.89206

← 1 2 3 →