Tissue classification with gene expression profiles

被引:450
作者
Ben-Dor, A
Bruhn, L
Friedman, N
Nachman, I
Schummer, M
Yakhini, Z
机构
[1] Agilent Labs, Chem & Biol Syst Dept, Palo Alto, CA 94304 USA
[2] Hebrew Univ Jerusalem, Sch Engn & Comp Sci, IL-91904 Jerusalem, Israel
[3] Hebrew Univ Jerusalem, Ctr Computat Neurosci, IL-91904 Jerusalem, Israel
[4] Univ Washington, Seattle, WA 98105 USA
[5] Agilent Labs, IL-31905 Haifa, Israel
关键词
tissue classification; gene expression analysis; ovarian cancer; colon cancer;
D O I
10.1089/106652700750050943
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Constantly improving gene expression profiling technologies are expected to provide understanding and insight into cancer-related cellular processes. Gene expression data is also expected to significantly aid in the development of efficient cancer diagnosis and classification platforms. In this work we examine three sets of gene expression data measured across sets of tumor(s) and normal clinical samples: The first set consists of 2,000 genes, measured in 62 epithelial colon samples (Alon et al., 1999). The second consists of approximate to 100,000 clones, measured in 32 ovarian samples (unpublished extension of data set described in Schummer et al, (1999)). The third set consists of approximate to 7,100 genes, measured in 72 bone marrow and peripheral brood samples (Golub et al., 1999). We examine the use of scoring methods, measuring separation of tissue type (e.g., tumors from normals) using individual gene expression levels. These are then coupled with high-dimensional classification methods to assess the classification power of complete expression profiles. We present results of performing leave-one-one cross validation (LOOCV) experiments on the three data sets, employing nearest neighbor classifier, SVM (Cortes and Vapnik, 1995), AdaBoost (Freund and Schapire, 1997) and a novel clustering-based classification technique. As tumor samples can differ from normal samples in their cell-type composition, we also perform LOOCV experiments using appropriately modified sets of genes, attempting to eliminate the resulting bias. We demonstrate success rate of at least 90% in tumor versus normal classification, using sets of selected genes, with, as well as without, cellular-contamination-related members. These results are insensitive to the exact selection mechanism, over a certain range.
引用
收藏
页码:559 / 583
页数:25
相关论文
共 30 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]   Clustering gene expression patterns [J].
Ben-Dor, A ;
Shamir, R ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :281-297
[3]  
Bishop C. M., 1995, NEURAL NETWORKS PATT
[4]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[5]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[6]   The transcriptional program of sporulation in budding yeast [J].
Chu, S ;
DeRisi, J ;
Eisen, M ;
Mulholland, J ;
Botstein, D ;
Brown, PO ;
Herskowitz, I .
SCIENCE, 1998, 282 (5389) :699-705
[7]  
CLARKE PA, 1999, P NAT GEN MICR M 99, P39
[8]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[9]  
DERISI J, 1997, SCIENCE, V282, P699
[10]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868