Email surveillance using non-negative matrix factorization

被引:77
作者
Berry M.W. [1 ]
Browne M. [1 ]
机构
[1] Department of Computer Science, University of Tennessee, Knoxville
关键词
Constrained least squares; Electronic mail; Enroncollection; Non-negativematrix factorization; Surveillance; Topicdetection;
D O I
10.1007/s10588-005-5380-5
中图分类号
学科分类号
摘要
In this study, we apply a non-negative matrix factorization approach for the extraction and detection of concepts or topics from electronic mail messages. For the publicly released Enron electronic mail collection, we encode sparse term-by-message matrices and use a low rank non-negative matrix factorization algorithm to preserve natural data non-negativity and avoid subtractive basis vector and encoding interactions present in techniques such as principal component analysis. Results in topic detection and message clustering are discussed in the context of published Enron business practices and activities, and benchmarks addressing the computational complexity of our approach are provided. The resulting basis vectors and matrix projections of this approach can be used to identify and monitor underlying semantic features (topics) and message clusters in a general or high-level way without the need to read individual electronic mail messages. © Springer Science + Business Media, Inc. 2006.
引用
收藏
页码:249 / 264
页数:15
相关论文
共 19 条
[1]  
Berry M., Browne M., Understanding Search Engines: Mathematical Modeling and Text Retrieval, (2005)
[2]  
Berry M., Drmac Z., Jessup E., Matrices, Vector Spaces, and Information Retrieval, SIAM Review, 41, 2, pp. 335-362, (1999)
[3]  
Donoho D., Stodden V., When does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?, (2003)
[4]  
Giles J., Wo L., Berry M., GTP (General Text Parser) Softwarefor Text Mining, Software for Text Mining, in Statistical Data Mining and Knowledge Discovery, pp. 455-471, (2003)
[5]  
Grieve T., The Decline and Fall of the Enron Empire. Slate, (2003)
[6]  
Guillamet D., Vitria J., Determining a Suitable Metricwhen Using Non-Negative Matrix Factorization, Sixteenth International Conference on Pattern Recognition (ICPR'02), 2, (2002)
[7]  
Hoyer P., Non-Negative Sparse Coding, Proceedings of the IEEE Workshop on Neural Networks for Signal Processing, (2002)
[8]  
Hyvarinen A., Hoyer P., Emergence of Phase and Shift Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces, Neural Computation, 12, 7, pp. 1705-1720, (2000)
[9]  
Jolliffe I., Principle Component Analysis, (2002)
[10]  
Keila P., Skillicorn D., Structure in the Enron Email Dataset, Proceedings of the Link Analysis, Counterterrorism, and Security Workshop, Fifth SIAM International Conference on Data Mining, pp. 55-64, (2005)