APPROXIMATION ALGORITHMS FOR DATA PLACEMENT PROBLEMS

被引:142
作者
Baev, Ivan [1 ]
Rajaraman, Rajmohan [2 ]
Swamy, Chaitanya [3 ]
机构
[1] Hewlett Packard Corp, Java Compilers & Tools Lab, Cupertino, CA 95014 USA
[2] Northeastern Univ, Coll Comp Sci, Boston, MA 02115 USA
[3] Univ Waterloo, Waterloo, ON N2L 3G1, Canada
关键词
approximation algorithms; data placement; facility location; linear programming;
D O I
10.1137/080715421
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We develop approximation algorithms for the problem of placing replicated data in arbitrary networks, where the nodes may both issue requests for data objects and have capacity for storing data objects so as to minimize the average data-access cost. We introduce the data placement problem to model this problem. We have a set of caches F, a set of clients D, and a set of data objects O. Each cache i can store at most u(i) data objects. Each client j is an element of D has demand d(j) for a specific data object o(j) is an element of O and has to be assigned to a cache that stores that object. Storing an object o in cache i incurs a storage cost of f(i)(o), and assigning client j to cache i incurs an access cost of d(j)c(ij). The goal is to find a placement of the data objects to caches respecting the capacity constraints, and an assignment of clients to caches so as to minimize the total storage and client access costs. We present a 10-approximation algorithm for this problem. Our algorithm is based on rounding an optimal solution to a natural linear-programming relaxation of the problem. One of the main technical challenges encountered during rounding is to preserve the cache capacities while incurring only a constant-factor increase in the solution cost. We also introduce the connected data placement problem to capture settings where write-requests are also issued for data objects, so that one requires a mechanism to maintain consistency of data. We model this by requiring that all caches containing a given object be connected by a Steiner tree to a root for that object, which issues a multicast message upon a write to (any copy of) that object. The total cost now includes the cost of these Steiner trees. We devise a 14-approximation algorithm for this problem. We show that our algorithms can be adapted to handle two variants of the problem: (a) a k-median variant, where there is a specified bound on the number of caches that may contain a given object, and (b) a generalization where objects have lengths and the total length of the objects stored in any cache must not exceed its capacity.
引用
收藏
页码:1411 / 1429
页数:19
相关论文
共 42 条
[1]  
Awerbuch B., 1993, Proceedings. 34th Annual Symposium on Foundations of Computer Science (Cat. No.93CH3368-8), P22, DOI 10.1109/SFCS.1993.366885
[2]   Distributed paging for general networks [J].
Awerbuch, B ;
Bartal, Y ;
Fiat, A .
JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 1998, 28 (01) :67-104
[3]  
Baev ID, 2001, SIAM PROC S, P661
[4]   Improved approximation algorithms for Broadcast Scheduling [J].
Bansal, Nikhil ;
Coppersmith, Don ;
Sviridenko, Maxim .
PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, :344-353
[5]  
Bartal Y, 1995, J COMPUT SYST SCI, V51, P341, DOI 10.1006/jcss.1995.1073
[6]  
Byrka J, 2007, LECT NOTES COMPUT SC, V4627, P29
[7]   A constant-factor approximation algorithm for the k-median problem [J].
Charikar, M ;
Guha, S ;
Tardos, E ;
Shmoys, DB .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2002, 65 (01) :129-149
[8]   Improved approximation algorithms for capacitated facility location problems [J].
Chudak, FA ;
Williamson, DP .
MATHEMATICAL PROGRAMMING, 2005, 102 (02) :207-222
[9]   Improved approximation algorithms for the uncapacitated facility location problem [J].
Chudak, FA ;
Shmoys, DB .
SIAM JOURNAL ON COMPUTING, 2003, 33 (01) :1-25
[10]  
DOWDY LW, 1982, COMPUT SURV, V14, P287, DOI 10.1145/356876.356883