A WEIGHTED NEAREST NEIGHBOR ALGORITHM FOR LEARNING WITH SYMBOLIC FEATURES

被引:417
作者
COST, S
SALZBERG, S
机构
关键词
NEAREST NEIGHBOR; EXEMPLAR-BASED LEARNING; PROTEIN STRUCTURE; TEXT PRONUNCIATION; INSTANCE-BASED LEARNING;
D O I
10.1007/BF00993481
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the past, nearest neighbor algorithms for learning from examples have worked best in domains in which all features had numeric values. in such domains, the examples can be treated as points and distance metrics can use standard definitions. In symbolic domains, a more sophisticated treatment of the feature space is required. We introduce a nearest neighbor algorithm for learning in domains with symbolic features. Our algorithm calculates distance tables that allow it to produce real-valued distances between instances. and attaches weights to the instances to further modify the structure of feature space. We show that this technique produces excellent classification accuracy on three problems that have been studied by machine learning researchers: predicting protein secondary structure, identifying DNA promoter sequences, and pronouncing English text. Direct experimental comparisons with the other learning algorithms show that our nearest neighbor algorithm is comparable or superior in all three domains. In addition, our algorithm has advantages in training speed, simplicity, and perspicuity. We conclude that experimental evidence favors the use and continued development of nearest neighbor algorithms for domains such as the ones studied here.
引用
收藏
页码:57 / 78
页数:22
相关论文
共 44 条
[1]  
AHA D, 1990, 9042 U CAL DEP INF C
[2]   INSTANCE-BASED LEARNING ALGORITHMS [J].
AHA, DW ;
KIBLER, D ;
ALBERT, MK .
MACHINE LEARNING, 1991, 6 (01) :37-66
[3]  
AHA DW, 1989, 6TH P INT WORKSH MAC, P387
[4]  
AHA DW, 1989, 11TH P INT JOINT C A, P794
[5]  
Chou P Y, 1978, Adv Enzymol Relat Areas Mol Biol, V47, P45
[6]   PREDICTION OF PROTEIN CONFORMATION [J].
CHOU, PY ;
FASMAN, GD .
BIOCHEMISTRY, 1974, 13 (02) :222-245
[7]   TURN PREDICTION IN PROTEINS USING A PATTERN-MATCHING APPROACH [J].
COHEN, FE ;
ABARBANEL, RM ;
KUNTZ, ID ;
FLETTERICK, RJ .
BIOCHEMISTRY, 1986, 25 (01) :266-275
[8]  
COST S, 1990, THESIS J HOPKINS U
[9]  
COST S, 1990, P S COMPUTER APPLICA, P114
[10]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+