A practical approximation algorithm for optimal k-anonymity

被引:25
作者
Kenig, Batya [1 ]
Tassa, Tamir [1 ]
机构
[1] Open Univ, Div Comp Sci, Raanana, Israel
关键词
Privacy-preserving data mining; k-Anonymity; l-Diversity; Approximation algorithms for NP-hard problems; Frequent generalized itemsets; ANONYMIZATION;
D O I
10.1007/s10618-011-0235-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
k-Anonymity is a privacy preserving method for limiting disclosure of private information in data mining. The process of anonymizing a database table typically involves generalizing table entries and, consequently, it incurs loss of relevant information. This motivates the search for anonymization algorithms that achieve the required level of anonymization while incurring a minimal loss of information. The problem of k-anonymization with minimal loss of information is NP-hard. We present a practical approximation algorithm that enables solving the k-anonymization problem with an approximation guarantee of O(ln k). That algorithm improves an algorithm due to Aggarwal et al. (Proceedings of the international conference on database theory (ICDT), 2005) that offers an approximation guarantee of O(k), and generalizes that of Park and Shim (SIGMOD '07: proceedings of the 2007 ACM SIGMOD international conference on management of data, 2007) that was limited to the case of generalization by suppression. Our algorithm uses techniques that we introduce herein for mining closed frequent generalized records. Our experiments show that the significance of our algorithm is not limited only to the theory of k-anonymization. The proposed algorithm achieves lower information losses than the leading approximation algorithm, as well as the leading heuristic algorithms. A modified version of our algorithm that issues a""-diverse k-anonymizations also achieves lower information losses than the corresponding modified versions of the leading algorithms.
引用
收藏
页码:134 / 168
页数:35
相关论文
共 28 条
[1]  
Aggarwal G, 2005, LECT NOTES COMPUT SC, V3363, P246
[2]  
Agrawal R, 2000, SIGMOD REC, V29, P439, DOI 10.1145/335191.335438
[3]  
Agrawal R., P 20 INT C VERY LARG
[4]  
[Anonymous], 2006, P 32 INT C VER LARG
[5]  
[Anonymous], 2005, P 2005 ACM SIGMOD IN
[6]  
[Anonymous], 2000, SIGMOD INT WORKSHOP
[7]  
Bayardo RJ, 2005, PROC INT CONF DATA, P217
[8]  
Byun JW, 2007, LECT NOTES COMPUT SC, V4443, P188
[9]   A Framework for Efficient Data Anonymization under Privacy and Accuracy Constraints [J].
Ghinita, Gabriel ;
Karras, Panagiotis ;
Kalnis, Panos ;
Mamoulis, Nikos .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2009, 34 (02)
[10]   K-anonymization revisited [J].
Gionis, Aristides ;
Mazza, Arnon ;
Tassa, Tamir .
2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, :744-+