Dual-domain Hierarchical Classification of Phonetic Time Series

被引:14
作者
Hamooni, Hossein [1 ]
Mueen, Abdullah [1 ]
机构
[1] Univ New Mexico, Dept Comp Sci, Albuquerque, NM 87131 USA
来源
2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) | 2014年
关键词
D O I
10.1109/ICDM.2014.92
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Phonemes are the smallest units of sound produced by a human being. Automatic classification of phonemes is a well-researched topic in linguistics due to its potential for robust speech recognition. With the recent advancement of phonetic segmentation algorithms, it is now possible to generate datasets of millions of phonemes automatically. Phoneme classification on such datasets is a challenging data mining task because of the large number of classes (over a hundred) and complexities of the existing methods. In this paper, we introduce the phoneme classification problem as a data mining task. We propose a dual-domain (time and frequency) hierarchical classification algorithm. Our method uses a Dynamic Time Warping (DTW) based classifier in the top layers and time-frequency features in the lower layer. We cross-validate our method on phonemes from three online dictionaries and achieved up to 35% improvement in classification compared to existing techniques. We provide case studies on classifying accented phonemes and speaker invariant phoneme classification.
引用
收藏
页码:160 / 169
页数:10
相关论文
共 21 条
[1]  
[Anonymous], AC SPEECH SIGN PROC
[2]  
[Anonymous], P AC 2008
[3]  
[Anonymous], 2011, SPEECH TECHNOLOGIES
[4]  
[Anonymous], GODFREY RELATIVE FRE
[5]  
[Anonymous], 2007, ICDE
[6]  
[Anonymous], 2009, THESIS
[7]  
Assent Ira, 2009, PROCEEDINGS OF THE VLDB ENDOWMENT, V2
[8]  
Cesa-Bianchi N., 2006, Proceedings of the 23rd international conference on Machine learning, P177
[9]  
Dekel O, 2005, LECT NOTES COMPUT SC, V3361, P146
[10]  
Ding H, 2008, PROC VLDB ENDOW, V1, P1542