Support vector machines with adaptive Lq penalty

被引：71

作者：

Liu, Yufeng ^{[1
]}

Zhang, Hao Helen

Park, Cheolwoo

Ahn, Jeongyoun

机构：

[1] Univ N Carolina, Dept Stat & Operat Res, Carolina Ctr Genome Sci, Chapel Hill, NC 27515 USA

[2] N Carolina State Univ, Dept Stat, Raleigh, NC 27695 USA

[3] Univ Georgia, Dept Stat, Athens, GA 30602 USA

来源：

COMPUTATIONAL STATISTICS & DATA ANALYSIS | 2007年 / 51卷 / 12期

基金：

美国国家科学基金会;

关键词：

adaptive penalty; classification; shrinkage; support vector machine; variable selection;

D O I：

10.1016/j.csda.2007.02.006

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The standard support vector machine (SVM) minimizes the hinge loss function subject to the L-2 penalty or the roughness penalty. Recently, the L-1 SVM was suggested for variable selection by producing sparse solutions [Bradley, P., Mangasarian, O., 1998. Feature selection via concave minimization and support vector machines. In: Shavlik, J. (Ed.), ICML'98. Morgan Kaufmann, Los Altos, CA; Zhu, J., Hastie, T., Rosset, S., Tibshirani, R., 2003. 1-norm support vector machines. Neural Inform. Process. Systems 16]. These learning methods are non-adaptive since their penalty forms are pre-determined before looking at data, and they often perform well only in a certain type of situation. For instance, the L-2 SVM generally works well except when there are too many noise inputs, while the L-1 SVM is more preferred in the presence of many noise variables. In this article we propose and explore an adaptive learning procedure called the L-q SVM, Where the best q > 0 is automatically chosen by data. Both two- and multi-class classification problems are considered. We show that the new adaptive approach combines the benefit of a class of non-adaptive procedures and gives the best performance of this class across a variety of situations. Moreover, we observe that the proposed L-q penalty is more robust to noise variables than the L-1 and L-2 penalties. An iterative algorithm is suggested to solve the L-q SVM efficiently. Simulations and real data applications support the effectiveness of the proposed procedure. (C) 2007 Elsevier B.V. All rights reserved.

引用

页码：6380 / 6394

页数：15

共 23 条

[1]

[Anonymous], 1999, SUPPORT VECTOR MACHI

[2]

[Anonymous], ICML 98

[3] Regularization of wavelet approximations - Rejoinder [J].

Antoniadis, A ;

Fan, J .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (455) :964-967

[4]

BOSER B, 1992, 5 ANN C COMP LEARN T, P142

[5] Atomic decomposition by basis pursuit [J].

Chen, SSB ;

Donoho, DL ;

Saunders, MA .

SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 20 (01) :33-61

[6]

CRAMMER K, 2001, J MACHINE LEARNING R, V2, P265

[7] IDEAL SPATIAL ADAPTATION BY WAVELET SHRINKAGE [J].

DONOHO, DL ;

JOHNSTONE, IM .

BIOMETRIKA, 1994, 81 (03) :425-455

[8] Least angle regression - Rejoinder [J].

Efron, B ;

Hastie, T ;

Johnstone, I ;

Tibshirani, R .

ANNALS OF STATISTICS, 2004, 32 (02) :494-499

[9] Variable selection via nonconcave penalized likelihood and its oracle properties [J].

Fan, JQ ;

Li, RZ .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360

[10] A STATISTICAL VIEW OF SOME CHEMOMETRICS REGRESSION TOOLS [J].

FRANK, IE ;

FRIEDMAN, JH .

TECHNOMETRICS, 1993, 35 (02) :109-135

← 1 2 3 →