主题模型LDA的多文档自动文摘

被引：24

作者：

杨潇 ^{[1
]}

马军 ^{[2
]}

杨同峰 ^{[2
]}

杜言琦 ^{[2
]}

邵海敏 ^{[2
]}

机构：

[1] 山东经济学院信息管理学院

[2] 山东大学计算机科学与技术学院

来源：

智能系统学报 | 2010年 / 5卷 / 02期

关键词：

多文档自动文摘; 句子分值计算; 主题模型; LDA; 主题数目;

D O I：

暂无

中图分类号：

TP391.1 [文字信息处理];

学科分类号：

081203 ; 0835 ;

摘要：

近年来使用概率主题模型表示多文档文摘问题受到研究者的关注.LDA(latent dirichlet allocation)是主题模型中具有代表性的概率生成性模型之一.提出了一种基于LDA的文摘方法,该方法以混乱度确定LDA模型的主题数目,以Gibbs抽样获得模型中句子的主题概率分布和主题的词汇概率分布,以句子中主题权重的加和确定各个主题的重要程度,并根据LDA模型中主题的概率分布和句子的概率分布提出了2种不同的句子权重计算模型.实验中使用ROUGE评测标准,与代表最新水平的SumBasic方法和其他2种基于LDA的多文档自动文摘方法在通用型多文档摘要测试集DUC2002上的评测数据进行比较,结果表明提出的基于LDA的多文档自动文摘方法在ROUGE的各个评测标准上均优于SumBasic方法,与其他基于LDA模型的文摘相比也具有优势.

引用

页码：169 / 176

页数：8

共 20 条

[11]

Latent Dirichlet allocation and singular value decomposition based multi-document summarization. ARORA R,RAVINDRAN B. Proc of Eighth IEEE International Conference on Data Mining . 2008

[12]

Latent Dirichlet co-cluste-ring. SHAFIEI M M,MILIOS E E. Proceedings of the Sixth International Confer-ence on Data Mining(ICDM) . 2006

[13]

Latent Dirichlet allocation. Blei D M,Ng A Y,Jordan M I. Journal of Machine Learning Research . 2003

[14]

A Probabilistic Generative Framework for Extractive Broadcast News Speech Summarization. Y T Chen,B Chen,H M Wang. IEEE Trans on Audio,Speech,and Language Processing . 2009

[15]

Hierarchical Dirichlet Processes. Y W Teh,M 1 Jordan,M J Beal,etc. Journal of the American Statistical Association . 2006

[16]

Probabilistic Topic Models. M Steyvers,T Griffiths. Handbook of Latent Semantic Analysis . 2007

[17]

ROUGE:A Package for Automatic Evaluation of Summaries. Lin C. Workshop on Text Summarization Branches Out . 2004

[18]

Latent Dirichlet Allocation Based Multi-Document Summarization. R Arora,B Ravindran. Proc of the second workshop on Analytics for noisy unstructured text data . 2008

[19]

Exploring Content Models for Multi-Document Summarization. A Haghighi,L Vanderwende. Human Language Technologies:The Annual Conference of the North American Chapter of the ACL . 2009

[20]

Satisfying information needs with multidocument summaries. S Harabagiu,A Hickl,F Lacatusu. Information Processing Letters . 2007

← 1 2 →