Clustering of document collection - A weighting approach

被引:35
作者
Aliguliyev, Ramiz M. [1 ]
机构
[1] Natl Acad Sci Azerbaijan, Inst Informat Technol, AZ-1141 Baku, Azerbaijan
关键词
Text mining; Weighted partitional clustering; Adjusted cosine similarity measure; Validity index; Differential evolution; GENETIC ALGORITHM; DIFFERENTIAL EVOLUTION; VALIDITY; OPTIMIZATION; KERNEL;
D O I
10.1016/j.eswa.2008.11.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering algorithms are used to assess the interaction among documents by organizing documents into clusters such that document within a cluster are more similar to each other than are documents belonging to different clusters. Document clustering has been traditionally investigated as a means of improving the performance of search engines by pre-clustering the entire corpus, and a post-retrieval document browsing technique as well. It has long been studied as a post-retrieval document visualization technique. The purpose of present paper to show that assignment weight to documents improves clustering solution. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:7904 / 7916
页数:13
相关论文
共 58 条
[1]  
Abraham A, 2006, IEEE C EVOL COMPUTAT, P1769
[2]   A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem [J].
Ahn, Hyung Jun .
INFORMATION SCIENCES, 2008, 178 (01) :37-51
[3]   A new approach of clustering based machine-learning algorithm [J].
Al-Omary, Alauddin Yousif ;
Jamil, Mohammad Shahid .
KNOWLEDGE-BASED SYSTEMS, 2006, 19 (04) :248-258
[4]   Summarization of Text-based Documents with a Determination of Latent Topical Sections and Information-Rich Sentences [J].
Alguliev, R. M. ;
Alyguliev, R. M. .
AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2007, 41 (03) :132-140
[5]  
Alguliev R. M., 2005, Automatic Control and Computer Sciences, V39, P42
[6]  
ALGULIEV RM, 2005, ARTIF INTELL, V3, P698
[7]  
[Алыгулиев Р.М. Aliguliyev R.M.], 2007, [Вычислительные технологии, Vychislitel'nye tekhnologii], V12, P5
[8]   A novel partitioning-based clustering method and generic document summarization [J].
Aliguliyev, Ramiz M. .
2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS PROCEEDINGS, 2006, :626-629
[9]  
ALIGULIYEV RM, 2006, ARTIF INTELL, V4, P651
[10]  
Allan James, 2002, Topic Detection and Tracking: Event-based Information Organization