Consistent Answers to SQL Queries

Contact Information

Jan Chomicki, Ph.D.
Department of Computer Science and Engineering, 201 Bell Hall
University at Buffalo, Buffalo, NY 14860
PHONE: (716) 645-3180 ext.103, FAX: (716) 645-3464
chomicki@cse.buffalo.edu , www.cse.buffalo.edu/~chomicki

WWW Page: www.cse.buffalo.edu/~chomicki/cqa.html

Project Award Information

Keywords

inconsistency, integrity constraints, query languages, SQL, data integration

Project Summary

As the amount of information available in online data sources explodes, there is a growing concern about the consistency and quality of answers to user queries. This project addresses the issue of using logical integrity constraints to gauge the consistency and quality of query answers. Although it is impractical to enforce global integrity constraints across different data sources and correct integrity violations by updating individual sources, integrity constraints capture important semantic properties of data. This project studies the formal notions of database repair and consistent query answer: a consistent answer is true in every minimal repair of the database. The information about answer consistency serves as an important indication of its quality and reliability.

A variety of procedures for computing consistent query answers in the context of the relational data model and SQL are developed, and their computational complexity analyzed. The procedures exploit the properties of specific subsets of SQL and specific classes of integrity constraints. By providing information about query answer consistency, such procedures will enhance the functionality of existing DBMS in a non-intrusive way, particularly in the context of data integration applications.

Goals, Objectives, and Targeted Activities

Together with Jerzy Marcinkowski, Wroclaw University, Poland, the PI obtained a complete characterization of the computational complexity of consistent query answers for first-order queries and integrity constraints that are denial constraints or functional dependencies. They have studied the impact of the number of constraints, as well as the type and the size of the query on the complexity of this problem. They have identified several new tractable cases (compared to [5]) and proposed new algorithms. The results are reported in the paper [1], currently under submission to a journal.

The PI expanded (in collaboration with several researchers) the conference papers [3] and [6] to journal versions. The paper [2] studies the issue of computing consistent query answers for aggregation queries and functional dependencies. A new notion of consistent answer, suitable to such queries, is proposed. A complete classification of tractable/intractable cases of the problem of computing consistent query answers is provided. The paper [7] considers logical specification of repairs as answer sets of logic programs.

Together with Leopoldo Bertossi, Carleton University, Ottawa, Canada, the PI is completing a survey [4] summarizing the up-to-date research on computing consistent query answers. The survey will appear in the book "Logics for Emerging Applications of Databases," to be published by Springer-Verlag. It is interesting to note that the notion of consistent query answer proposed in the course of this research has been used and extended by a research group at the University of Calabria, Italy, led by Sergio Greco.

Indication of Success

The ultimate goal of this research is to develop methods to compute consistent query answers for practically important classes of SQL queries and constraints. In the past year, the PI and his collaborators have made a significant progress towards this goal by clarifying the computational complexity picture, identifying new tractable cases, and developing new algorithms.

Project Impact

GPRA Outcome Goals

Discoveries at and across the frontier of science and engineering.

There are numerous practical scenarios, listed below, in which databases may fail to satisfy integrity constraints. In all those cases, the approach of this project provides an alternative to - manual or automatic - conflict resolution and supplies a way to use integrity constraints even when they cannot be enforced.

Integration of autonomous data sources. The sources may separately satisfy the constraints, but when they are integrated together they constraints may stop to hold. Moreover, since the sources are autonomous they can not be simply fixed to satisfy the dependency by removing one of the conflicting tuples.

Unenforced integrity constraints. Even though integrity constraints capture an important part of the semantics of a given application, they may still fail to be enforced for a variety of reasons.

Temporary inconsistencies. It may often be the case that the database consistency is only temporarily violated and further updates or transactions are expected to restore it.

Project References

[1] Jan Chomicki, Jerzy Marcinkowski. On the Computational Complexity of Consistent Query Answers Submitted for journal publication. CoRR paper cs.DB/0204010.

[2] Marcelo Arenas, Leopoldo Bertossi, Jan Chomicki, Xin He, Vijay Raghavan, Jeremy Spinrad. Scalar Aggregation in Inconsistent Databases. To appear in a special issue of Theoretical Computer Science, consisting of selected papers from ICDT 2001.

[3] Marcelo Arenas, Leopoldo Bertossi, Jan Chomicki. Scalar Aggregation in FD-Inconsistent Databases. Proc. 8th International Conference on Database Theory (ICDT), London ,UK, January 2001, Springer-Verlag, LNCS 1973, pp. 39-53, pp. 39-53.

[4] Leopoldo Bertossi, Jan Chomicki.Consistent Query Answers in Inconsistent Databases. To appear in Logics for Emerging Applications of Databases, J. Chomicki, R. van der Meyden, G. Saake, editors, Springer-Verlag, 2003.

[5] Marcelo Arenas, Leopoldo Bertossi, Jan Chomicki. Consistent Query Answers in Inconsistent Databases.Proc. 18th ACM Symposium on Principles of Database Systems (PODS), June 1999, Philadelphia, Pennsylvania, pp. 68-79.

[6] Marcelo Arenas, Leopoldo Bertossi, Jan Chomicki. Specifying and Querying Database Repairs Using Logic Programs with Exceptions. Proc. 4th International Conference on Flexible Query Answering Systems, Warsaw, Poland, October 2000, Springer-Verlag, pp.27-41.

[7] Marcelo Arenas, Leopoldo Bertossi, Jan Chomicki. Answer Sets for Consistent Query Answering in Inconsistent Databases. Submitted for journal publication.

Area Background

The problem of obtaining consistent information from inconsistent databases has been studied in the context of multiple-source data integration [8], disjunctive databases [9], and conflict resolution [10]. For a more complete discussion, see the references [5] and [2]. Our approach is unique in that it is not limited to specific application scenarios or particular classes of constraints or queries, and addresses detailed computational problems.

[8] A. Motro. Multiplex: A Formal Model for Multidatabases and Its Implementation. Proc. NGITS'99, Springer-Verlag, LNCS 1649, pp. 138--158.

[9] S. Agarwal, A.M. Keller, G. Wiederhold, and K. Saraswat. Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases. Proc. IEEE International Conference on Data Engineering, 1995.

[10] J. Lin and A. O. Mendelzon. Merging Databases under Constraints. International Journal of Cooperative Information Systems, 7(1):55--76, 1996.

Potential Related Projects

Data Integration and Cleaning, Query Answering Using Distributed Data Sources, Materialized Views, Semantic Query Optimization.