A brief survey of Web data extraction tools

被引:57
作者
Laender, AHF [1 ]
Ribeiro-Neto, BA [1 ]
da Silva, AS [1 ]
Teixeira, JS [1 ]
机构
[1] Univ Fed Minas Gerais, Dept Comp Sci, BR-31270901 Belo Horizonte, MG, Brazil
关键词
D O I
10.1145/565117.565137
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the last few years, several works in the literature have addressed the problem of data extraction from Web pages. The importance of this problem derives from the fact that, once extracted, the data can be handled in a way similar to instances of a traditional database. The approaches proposed in the literature to address the problem of Web data extraction use techniques borrowed from areas such as natural language processing, languages and grammars, machine learning, information retrieval, databases, and ontologies. As a consequence, they present very distinct features and capabilities which make a direct comparison difficult to be done. In this paper, we propose a taxonomy for characterizing Web data extraction tools, briefly survey major Web data extraction tools described in the literature, and provide a qualitative analysis of them. Hopefully, this work will stimulate other studies aimed at a more comprehensive analysis of data extraction approaches and tools for Web data.
引用
收藏
页码:84 / 93
页数:10
相关论文
共 34 条
[1]  
ABASCAL R, 1999, P STRING PROC INF RE, P2
[2]  
Abiteboul S, 1997, LECT NOTES COMPUT SC, V1186, P1
[3]  
Adelberg Brad, 1998, SIGMOD, 1998, P283, DOI [10.1145/276304.276330, DOI 10.1145/276304.276330]
[4]  
[Anonymous], 1997, P 1 E EUR S ADV DAT
[5]  
[Anonymous], 1998, IEEE Data Engineering Bulletin
[6]   WebOQL: Restructuring documents, databases and Webs [J].
Arocena, GO ;
Mendelzon, AO .
14TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1998, :24-33
[7]  
Baumgartner R., 2001, Proceedings of the 27th International Conference on Very Large Data Bases, P119
[8]  
Bray T., EXTENSIBLE MARKUP LA
[9]  
Califf ME, 1999, SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), P328
[10]  
Crescenzi V., 2001, Proceedings of the 27th International Conference on Very Large Data Bases, P109