Metagenes and molecular pattern discovery using matrix factorization

被引:1458
作者
Brunet, JP
Tamayo, P
Golub, TR
Mesirov, JP
机构
[1] MIT, Eli & Edythe L Broad Inst, Cambridge, MA 02141 USA
[2] Harvard Univ, Cambridge, MA 02141 USA
[3] Dana Farber Canc Inst, Boston, MA 02115 USA
[4] Harvard Univ, Sch Med, Boston, MA 02115 USA
关键词
D O I
10.1073/pnas.0308531101
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We describe here the use of nonnegative matrix factorization (NMF), an algorithm based on decomposition by parts that can reduce the dimension of expression data from thousands of genes to a handful of metagenes. Coupled with a model selection mechanism, adapted to work for any stochastic clustering algorithm, NMF is an efficient method for identification of distinct molecular patterns and provides a powerful method for class discovery. We demonstrate the ability of NMF to recover meaningful biological information from cancer-related microarray data. NMF appears to have advantages over other methods such as hierarchical clustering or self-organizing maps. We found it less sensitive to a priori selection of genes or initial conditions and able to detect alternative or context-dependent patterns of gene expression in complex biological systems. This ability, similar to semantic polysemy in text, provides a general method for robust molecular pattern discovery.
引用
收藏
页码:4164 / 4169
页数:6
相关论文
共 15 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[3]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[4]  
Gasch AP, 2002, GENOME BIOL, V3
[5]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[6]   Sensitive pattern discovery with 'fuzzy' alignments of distantly related proteins [J].
Heger, Andreas ;
Holm, Liisa .
BIOINFORMATICS, 2003, 19 :i130-i137
[7]   Subsystem identification through dimensionality reduction of large-scale gene expression data [J].
Kim, PM ;
Tidor, B .
GENOME RESEARCH, 2003, 13 (07) :1706-1718
[8]   Learning the parts of objects by non-negative matrix factorization [J].
Lee, DD ;
Seung, HS .
NATURE, 1999, 401 (6755) :788-791
[9]  
Lee DD, 2001, ADV NEUR IN, V13, P556
[10]   Application of Bayesian Decomposition for analysing microarray data [J].
Moloshok, TD ;
Klevecz, RR ;
Grant, JD ;
Manion, FJ ;
Speier, WF ;
Ochs, MF .
BIOINFORMATICS, 2002, 18 (04) :566-575