Comparison of discrimination methods for the classification of tumors using gene expression data

被引:1731
作者
Dudoit, S [1 ]
Fridlyand, J
Speed, TP
机构
[1] Univ Calif Berkeley, Div Biostat, Berkeley, CA 94720 USA
[2] Univ Calif San Francisco, Ctr Canc, San Francisco, CA 94143 USA
[3] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
关键词
cancer; discriminant analysis; microarray experiment; supervised learning; tumor classification; variable selection;
D O I
10.1198/016214502753479248
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A reliable and precise classification of tumors is essential for successful diagnosis and treatment of cancer. cDNA microarrays and high-density oligonucleotide chips are novel biotechnologies increasingly used in cancer research. By allowing the monitoring of expression levels in cells for thousands of genes simultaneously, microarray experiments may lead to a more complete understanding of the molecular variations among tumors and hence to a finer and more informative classification. The ability to successfully distinguish between tumor classes (already known or yet to be discovered) using gene expression data is an important aspect of this novel approach to cancer classification. This article compares the performance of different discrimination methods for the classification of tumors based on gene expression data. The methods include nearest-neighbor classifiers, linear discriminant analysis, and classification trees. Recent machine learning approaches, such as bagging and boosting, are also considered. The discrimination methods are applied to datasets from three recently published cancer gene expression studies.
引用
收藏
页码:77 / 87
页数:11
相关论文
共 29 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]  
[Anonymous], 1979, Multivariate analysis
[4]   The secular variations of skull characters in four series of Egyptian skulls [J].
Barnard, MM .
ANNALS OF EUGENICS, 1935, 6 :352-371
[5]  
Breiman L, 1998, ANN STAT, V26, P801
[6]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[7]  
BREIMAN L, 1998, 513 U CAL DEP STAT
[8]  
Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946
[9]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[10]   Identifying marker genes in transcription profiling data using a mixture of feature relevance experts [J].
Chow, ML ;
Moler, EJ ;
Mian, IS .
PHYSIOLOGICAL GENOMICS, 2001, 5 (02) :99-111