A variety of procedures for computing consistent query answers in the context of the relational data model and SQL are developed, and their computational complexity analyzed. The procedures exploit the properties of specific subsets of SQL and specific classes of integrity constraints. By providing information about query answer consistency, such procedures will enhance the functionality of existing DBMS in a non-intrusive way, particularly in the context of data integration applications.
[1] Jan Chomicki, J. Marcinkowski. Minimal-Change Integrity Maintenance Using Tuple Deletions. Submitted for journal publication, December 2002, 26pp. CoRR paper cs.DB/0212004.
[2] M. Arenas, L. Bertossi, Jan Chomicki, Xin He, Vijay Raghavan, Jeremy Spinrad. Scalar Aggregation in Inconsistent Databases. Theoretical Computer Science 296(3), 2003, pp. 405-434.
[3] M. Arenas, L. Bertossi, Jan Chomicki. Scalar Aggregation in FD-Inconsistent Databases. Proc. International Conference on Database Theory (ICDT), Springer-Verlag, 2001, LNCS 1973, pp. 39-53.
[4] L. Bertossi, Jan Chomicki. Query Answering in Inconsistent Databases. To appear in Logics for Emerging Applications of Databases, J. Chomicki, R. van der Meyden, G. Saake, editors, Springer-Verlag, 2003, 41pp.
[5] M. Arenas, L. Bertossi, Jan Chomicki. Consistent Query Answers in Inconsistent Databases. Proc. ACM Symposium on Principles of Database Systems (PODS), 1999, pp. 68-79.
[6] M. Arenas, L. Bertossi, Jan Chomicki. Specifying and Querying Database Repairs Using Logic Programs with Exceptions. Proc. International Conference on Flexible Query Answering Systems, Springer-Verlag, 2000, pp.27-41.
[7] M. Arenas, L. Bertossi, Jan Chomicki. Answer Sets for Consistent Query Answering in Inconsistent Databases. Theory and Practice of Logic Programming, 3(4&5), July/September 2003, pp. 393-424. arXiv.org paper cs.DB/0207094.
[8] L. Bertossi, Jan Chomicki, Alvaro Cortes, Claudio Gutierrez. Consistent Answers from Integrated Data Sources. Proc. International Conference on Flexible Query Answering Systems, Springer-Verlag, 2002, LNCS 2522, pp.71-85.
Human Resources. Two research assistants at the University at Buffalo were funded by the project for the total of one year.
Education and curriculum development. The PI has incorporated the results of the project in the syllabus of an advanced graduate database course on Data Integration. The course has become permanent and is offered at the University at Buffalo every academic year. It has been taught twice so far, in fall 2001 and spring 2003, with the total enrollment of 28. The PI also gave talks about the project at a Dagstuhl seminar and 7 universities in U.S., Canada, and Europe.
Other. The PI is one of the editors of the book "Logics for Emerging Applications of Databases," to be published by Springer-Verlag. He also co-organized the IJCAI 2001 Workshop on Inconsistency in Data and Knowledge and served on the program committee of the FLOC 2002 Workshop on Paraconsistent Computational Logic. He has been invited to present a talk at the workshop on Logic-based Methods for Information Integration, Aug. 2003, Vienna, Austria, and a keynote talk at the Conference on Integrity and Internal Control in Information Systems, Nov. 2003, Lausanne, Switzerland. The notion of consistent query answer proposed in the course of this research has been applied and extended by research groups at 4 European and 1 Canadian universities.
Together with J. Marcinkowski, Wroclaw University, Poland, the PI obtained a complete characterization of the computational complexity of consistent query answers (and related problems) for first-order queries and integrity constraints that are denial constraints, functional dependencies, and inclusion dependencies. They have studied the impact of the number of constraints, as well as the type and the size of the query on the complexity of this problem. They have identified several new tractable cases (compared to [5]) and proposed new algorithms. The results are reported in the paper [1], currently under submission to a journal. The PI has extended the algorithms proposed in [1] to form the foundation of HIPPO - a front-end system for the computation of consistent query answers. The first version of HIPPO, developed by Slawomir Staworko (a Ph. D. student supported by the current project), is operational. It uses the PostgresSQL back-end. Some preliminary experimental results have been obtained. Current work includes: optimizing HIPPO, broadening the class of queries that it can handle, experimental comparison with other approaches to the computation of consistent query answers.
In collaboration with several researchers, the PI has studied the computation of consistent query answers in the presence of distributed data sources [8]. The PI expanded (in collaboration with several researchers) the conference papers [3] and [6] to journal versions. The paper [2] studies the issue of computing consistent query answers for aggregation queries and functional dependencies. The paper [7] considers logical specification of repairs as answer sets of logic programs. Together with L. Bertossi, Carleton University, Ottawa, Canada, the PI has completed a survey [4] summarizing the up-to-date research on computing consistent query answers.
The problem of obtaining consistent information from inconsistent databases has been studied in the context of multiple-source data integration [9], disjunctive databases [10], and conflict resolution [11]. For a more complete discussion, see the references [5] and [2]. Our approach is unique in that it is not limited to specific application scenarios or particular classes of constraints or queries, and addresses detailed computational problems.
[9] A. Motro. Multiplex: A Formal Model for Multidatabases and Its Implementation. Proc. NGITS'99, Springer-Verlag, LNCS 1649, pp. 138--158.
[10] S. Agarwal, A.M. Keller, G. Wiederhold, and K. Saraswat. Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases. Proc. IEEE International Conference on Data Engineering, 1995.
[11] J. Lin and A. O. Mendelzon. Merging Databases under Constraints. International Journal of Cooperative Information Systems, 7(1):55--76, 1996.
Data Integration and Cleaning, Query Answering Using Distributed Data Sources, Materialized Views, Semantic Query Optimization.