Principal Investigator: Jan Chomicki, SUNY at Buffalo.
Project Title: Curation And Integration Of Inconsistent And Incomplete Temporal Data
NSF Award: 1524469.
In today's digital world, incomplete and inconsistent information about an entity can often be found in multiple different data sources, or in different versions of the same source at different times. To harness the rich amount of information available, many organizations curate and integrate public (and private) data over time to create a comprehensive temporal view of relevant entities. This project will have the potential to make significant societal impact through the development of a software prototype for data curation and integration. Such a prototype will enable the users to construct robust, application-specific, consistent views of temporal data on demand, which will lead to new classes of applications, particularly those involving profiling, tracking, monitoring, archiving and understanding of the evolution of entities over time. This project will also provide the opportunity to train students in several critical research areas central to temporal databases, data analytics and big data. The results obtained in this investigation will be incorporated into advanced graduate courses on large-scale data integration offered at both SUNY at Buffalo and UC Santa Cruz. The software artifacts and datasets produced will be made freely available for broad dissemination and sharing.
This project will advance the state-of-the-art in data curation and integration over temporal data through addressing three fundamental challenges. First, unlike incompleteness in non-temporal data, incompleteness in temporal data may be time-varying. Time-varying incompleteness will necessitate the introduction of time-dependent nulls. The second challenge is that inconsistency can also occur in temporal data. Inconsistency will be handled by declaratively specifying temporal preferences and algorithms to resolve conflicts based on those preferences. Alternatively, under the assumption that the inconsistent temporal database is left as is, this project will consider how consistent query answers can be computed over such a database. This project will also consider, for the first time, data with both incompleteness and inconsistency in a single formal framework, which will require significant conceptual and practical advances. The third challenge is to achieve temporal data independence through a conceptual framework that will use a formal, two-tier approach to manage temporal data; the abstract view provides the semantics and the concrete view for the physical representation of temporal data. The views hide lower-level details and allow for declarative, logical specification of queries, mappings, and dependencies.
This is a collaborative grant with University of California Santa Cruz (Wang-Chiew Tan, collaborator PI).