Methods for evaluating and creating data quality

被引:57
作者
Winkler, WE [1 ]
机构
[1] US Bur Census, Div Stat Res, Washington, DC 20233 USA
关键词
integer programming; set covering; data cleaning; approximate string comparison; unsupervised and supervised learning;
D O I
10.1016/j.is.2003.12.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper provides a survey of two classes of methods that can be used in determining and improving the quality of individual files or groups of files. The first are edit/imputation methods for maintaining business rules and for imputing for missing data. The second are methods of data cleaning for finding duplicates within files or across files. Published by Elsevier Ltd.
引用
收藏
页码:531 / 550
页数:20
相关论文
共 75 条
[41]   A guided tour to approximate string matching [J].
Navarro, G .
ACM COMPUTING SURVEYS, 2001, 33 (01) :31-88
[42]  
NEILING M, 2003, IEEE WORKSH DAT QUAL
[43]   AUTOMATIC LINKAGE OF VITAL RECORDS [J].
NEWCOMBE, HB ;
KENNEDY, JM ;
AXFORD, SJ ;
JAMES, AP .
SCIENCE, 1959, 130 (3381) :954-959
[44]   RECORD LINKAGE - MAKING MAXIMUM USE OF THE DISCRIMINATING POWER OF IDENTIFYING INFORMATION [J].
NEWCOMBE, HB ;
KENNEDY, JM .
COMMUNICATIONS OF THE ACM, 1962, 5 (11) :563-566
[45]  
Newcombe HB., 1988, HDB RECORD LINKAGE M
[46]  
Ohanekwu T.E., 2003, IEEE WORKSH DAT QUAL
[47]   AUTOMATIC SPELLING CORRECTION IN SCIENTIFIC AND SCHOLARLY TEXT [J].
POLLOCK, JJ ;
ZAMORA, A .
COMMUNICATIONS OF THE ACM, 1984, 27 (04) :358-368
[48]   A survey of approaches to automatic schema matching [J].
Rahm, E ;
Bernstein, PA .
VLDB JOURNAL, 2001, 10 (04) :334-350
[49]  
Rahm E., 2000, IEEE Data Eng. Bull, V23, P3
[50]  
REDMAN T, 1996, DATA QUALITY INFORMA