Composing schema mappings: Second-order dependencies to the rescue

被引:131
作者
Fagin, R
Kolaitis, PG
Popa, L
Tan, WC
机构
[1] IBM Corp, Almaden Res Ctr, San Jose, CA 95120 USA
[2] Univ Calif Santa Cruz, Santa Cruz, CA 95064 USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2005年 / 30卷 / 04期
关键词
algorithms; theory; data exchange; data integration; composition; schema mapping; certain answers; conjunctive queries; dependencies; chase; computational complexity; query answering; second-order logic; universal solution; metadata model management; DATA EXCHANGE;
D O I
10.1145/1114244.1114249
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A schema mapping is a specification that describes how data structured under one schema (the source schema) is to be transformed into data structured under a different schema (the target schema). A fundamental problem is composing schema mappings: given two successive schema mappings, derive a schema mapping between the source schema of the first and the target schema of the second that has the same effect as applying successively the two schema mappings. In this article, we give a rigorous semantics to the composition of schema mappings and investigate the definability and computational complexity of the composition of two schema mappings. We first study the important case of schema mappings in which the specification is given by a finite set of source-to-target tuple-generating dependencies (source-to-target tgds). We show that the composition of a finite set of full source-to-target tgds with a finite set of tgds is always definable by a finite set of source-to-target tgds, but the composition of a finite set of source-to-target tgds with a finite set of full source-to-target tgds may not be definable by any set (finite or infinite) of source-to-target tgds; furthermore, it may not be definable by any formula of least fixed-point logic, and the associated composition query may be NP-complete. After this, we introduce a class of existential second-order formulas with function symbols and equalities, which we call second-order tgds, and make a case that they are the "right" language for composing schema mappings. Specifically, we show that second-order tgds form the smallest class (up to logical equivalence) that contains every source-to-target tgd and is closed under conjunction and composition. Allowing equalities in second-order tgds turns out to be of the essence, even though the "obvious" way to define second-order tgds does not require equalities. We show that second-order tgds without equalities are not sufficiently expressive to define the composition of finite sets of source-to-target tgds. Finally, we show that second-order tgds possess good properties for data exchange and query answering: the chase procedure can be extended to second-order tgds so that it produces polynomial-time computable universal solutions in data exchange settings specified by second-order tgds.
引用
收藏
页码:994 / 1055
页数:62
相关论文
共 23 条
[1]  
Abiteboul S., 1998, Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. PODS 1998, P254, DOI 10.1145/275487.275516
[2]  
Abiteboul S., 1995, Foundations of databases: the logical level, V1st
[3]  
[Anonymous], [No title captured]
[4]   A PROOF PROCEDURE FOR DATA DEPENDENCIES [J].
BEERI, C ;
VARDI, MY .
JOURNAL OF THE ACM, 1984, 31 (04) :718-741
[5]   FORMAL SYSTEMS FOR TUPLE AND EQUALITY GENERATING DEPENDENCIES [J].
BEERI, C ;
VARDI, MY .
SIAM JOURNAL ON COMPUTING, 1984, 13 (01) :76-98
[6]   STRUCTURE AND COMPLEXITY OF RELATIONAL QUERIES [J].
CHANDRA, A ;
HAREL, D .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1982, 25 (01) :99-128
[7]   A restricted second order logic for finite structures [J].
Dawar, A .
INFORMATION AND COMPUTATION, 1998, 143 (02) :154-174
[8]  
Ebbinghaus H. D., 1999, Finite Model Theory
[9]  
Enderton Herbert B., 2001, MATH INTRO LOGIC
[10]   HORN CLAUSES AND DATABASE DEPENDENCIES [J].
FAGIN, R .
JOURNAL OF THE ACM, 1982, 29 (04) :952-985