Analyzing Knowledge Communities Using Foreground and Background Clusters

被引:19
作者
Kandylas, Vasileios [1 ]
Upham, S. Phineas [2 ]
Ungar, Lyle H. [1 ]
机构
[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
[2] Univ Penn, Whartom Sch, Philadelphia, PA 19104 USA
关键词
Text mining; clustering; knowledge communities; community evolution; citation analysis; COUNT DATA; SCIENCE; MODEL; COCITATIONS; IMPACT;
D O I
10.1145/1754428.1754430
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Insight into the growth (or shrinkage) of "knowledge communities" of authors that build on each other's work can be gained by studying the evolution over time of clusters of documents. We cluster documents based on the documents they cite in common using the Streemer clustering method, which finds cohesive foreground clusters (the knowledge communities) embedded in a diffuse background. We build predictive models with features based on the citation structure, the vocabulary of the papers, and the affiliations and prestige of the authors and use these models to study the drivers of community growth and the predictors of how widely a paper will be cited. We find that scientific knowledge communities tend to grow more rapidly if their publications build on diverse information and use narrow vocabulary and that papers that lie on the periphery of a community have the highest impact, while those not in any community have the lowest impact.
引用
收藏
页数:35
相关论文
共 74 条
[1]   Fixed-effects negative binomial regression models [J].
Allison, PD ;
Waterman, RP .
SOCIOLOGICAL METHODOLOGY 2002, VOL 32, 2002, 32 :247-265
[2]  
[Anonymous], 2007, P 13 ACM SIGKDD INT, DOI DOI 10.1145/1281192.1281266
[3]  
[Anonymous], CLASSIFYING SOCIAL D
[4]  
[Anonymous], 2003, Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining
[5]  
[Anonymous], 2003, ACM SIGKDD Explorations Newsletter, DOI DOI 10.1145/980972.980999
[6]  
[Anonymous], 1997, C UNCERTAINTY ARTIFI
[7]  
[Anonymous], 1972, Invisible colleges: diffusion of knowledge in scientific communities
[8]  
[Anonymous], 2005, P 11 ACM SIGKDD INT
[9]  
[Anonymous], 1983, Generalized Linear Models
[10]  
BANERJEE A, 2004, P 10 ACM SIGKDD INT, P509