A variety of procedures for computing consistent query answers in the context of the relational data model and SQL are developed, and their computational complexity analyzed. The procedures exploit the properties of specific subsets of SQL and specific classes of integrity constraints. By providing information about query answer consistency, such procedures will enhance the functionality of existing DBMS in a non-intrusive way, particularly in the context of data integration applications.
[1] Jan Chomicki, J. Marcinkowski. Minimal-Change Integrity Maintenance Using Tuple Deletions. Information and Computation, to appear. Early version: CoRR paper cs.DB/0212004.
[2] Jan Chomicki, J. Marcinkowski, S. Staworko. Computing Consistent Query Answers using Conflict Hypergraphs. Proc. Thirteenth Conference on Information and Knowledge Management (CIKM), 2004, to appear. Short version in IIWeb'04.
[3] Jan Chomicki, J. Marcinkowski. On the Computational Complexity of Minimal-Change Integrity Maintenance in Relational Databases. To appear in Inconsistency Tolerance, L. Bertossi, A. Hunter, T. Schaub, editors, Springer, 2005, 32pp.
[4] L. Bertossi, Jan Chomicki. Query Answering in Inconsistent Databases. In Logics for Emerging Applications of Databases, J. Chomicki, R. van der Meyden, G. Saake, editors, Springer-Verlag, 2003, 41pp.
[5] M. Arenas, L. Bertossi, Jan Chomicki. Consistent Query Answers in Inconsistent Databases. Proc. ACM Symposium on Principles of Database Systems (PODS), 1999, pp. 68-79.
[6] M. Arenas, L. Bertossi, Jan Chomicki, Xin He, Vijay Raghavan, Jeremy Spinrad. Scalar Aggregation in Inconsistent Databases. Theoretical Computer Science 296(3), 2003, pp. 405-434. Early version: ICDT'01.
[7] M. Arenas, L. Bertossi, Jan Chomicki. Answer Sets for Consistent Query Answering in Inconsistent Databases. Theory and Practice of Logic Programming, 3(4&5), July/September 2003, pp. 393-424. arXiv.org paper cs.DB/0207094. Early version: FQAS'00.
[8] L. Bertossi, Jan Chomicki, Alvaro Cortes, Claudio Gutierrez. Consistent Answers from Integrated Data Sources. Proc. International Conference on Flexible Query Answering Systems, Springer-Verlag, 2002, LNCS 2522, pp.71-85.
Human Resources. Research assistants at the University at Buffalo: Slawomir Staworko (Fall 03, Summer 04), Priyanka Kasliwal (Summer 04).
Education and curriculum development. The PI has incorporated the results of the project in the syllabus of an advanced graduate database course on Data Integration. The course is now offered at the University at Buffalo every academic year. The PI also gave several talks about the project.
Other. The PI is one of the editors of the book "Logics for Emerging Applications of Databases," published by Springer-Verlag in August 2003. He presented a keynote talk on consistent query answering at the IFIP TC-11 WG 11.5 Working Conference on Integrity and Internal Control in Information Systems, Nov. 2003, Lausanne, Switzerland. The notion of consistent query answer proposed in the course of this research has been applied and extended by research groups at 6 European and 2 Canadian universities.
The PI, together with S. Staworko (a Ph.D. student supported by the current project) has extended the algorithms proposed in [1] to form the foundation of HIPPO - a front-end system for the computation of consistent query answers. The system uses PostgreSQL as a backend and scales up to large databases. It handles projection-free queries (projection leads to co-NP-completeness [1]) and denial constraints. The PI and S. Staworko have developed optimization techniques that reduce the number of backend accesses. Those techniques, the implementation of HIPPO, and some experimental results are described in [2]. The current version of HIPPO is operational and has been demonstrated at the IDM'03 workshop and EDBT'04. HIPPO will be made available to the public.
Together with J. Marcinkowski, Wroclaw University, Poland, the PI obtained a complete characterization of the computational complexity of consistent query answers (and related problems) for first-order queries and integrity constraints that are denial constraints, functional dependencies, and inclusion dependencies. The results are reported in [1] and [3] (a survey of the area).
Current work in the project includes: computing consistent query answers in the presence of preferences, dealing with more general queries and constraints, repairing XML documents, experimental comparisons with other systems that compute consistent query answers, generation of synthetic inconsistent databases.
The problem of obtaining consistent information from inconsistent databases has been studied in the context of multiple-source data integration [9], disjunctive databases [10], and conflict resolution [11]. For a more complete discussion, see the references [4] and [5]. Our approach is unique in that it is not limited to specific application scenarios or particular classes of constraints or queries, and addresses relevant computational problems in detail.
[9] A. Motro. Multiplex: A Formal Model for Multidatabases and Its Implementation. Proc. NGITS'99, Springer-Verlag, LNCS 1649, pp. 138--158.
[10] S. Agarwal, A.M. Keller, G. Wiederhold, and K. Saraswat. Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases. Proc. IEEE International Conference on Data Engineering, 1995.
[11] J. Lin and A. O. Mendelzon. Merging Databases under Constraints. International Journal of Cooperative Information Systems, 7(1):55--76, 1996.
Data Integration and Cleaning, Query Answering Using Distributed Data Sources, Materialized Views, Semantic Query Optimization.