University at Buffalo, The State University of New York

Menu

CSE 636: Data Integration

Fall 2008 (#486324)


What's New

Please check the newsgroup for announcements. Lecture notes will be posted on this web page. A substantial part of the site will be in HTML and PDF formats.

Staff

Newsgroup

sunyab.cse.636

Course Description

Data integration has been recognized as a research topic of big practical importance. The availability of integrated data from multiple independent, heterogenous data sources is crucial for many applications. Data integration requires combining and matching information in different sources, and resolving a variety of conflicts. XML is becoming a de facto data integration standard. With the number of data sources growing very quickly, data integration is bound to become even more important in the future. This course will survey selected theoretical and practical issues arising in data integration.

The course is based on the instructor's lectures and guest lectures. The lectures cover most of the material in the course.

Schedule

. Mon Tue Wed Thu Fri
Lecture 12:00pm - 12:50pm
200G Baldy Hall
. 12:00pm - 12:50pm
200G Baldy Hall
. 12:00pm - 12:50pm
200G Baldy Hall
Instructor Office Hours . . 1:00pm - 2:00pm
210 Bell Hall
. 1:00pm - 2:00pm
210 Bell Hall

The following is a tentative schedule of lectures. Changes will be posted to the newsgroup.

Since the lectures are being recorded, here is the link for the videos.

  Week Monday
Lecture
Wednesday
Lecture
Friday
Lecture
1 08/25 Introduction & Overview
Lecture Slides: Set 1, Set 2
XML Data Model / Document Type Definition (DTD)
Lecture Slides: Set 1
2 09/01 No Lecture (Labor Day) XML Schema
Lecture Slides: Set 1
3 09/08 XPath
Lecture Slides: Set 1
XQuery
4 09/15 XQuery
Lecture Slides: Set 1
The XML files used on the XQuery slides
5 09/22 Project Discussion
6 09/29 Global-As-View / Local-As-View
Lecture Slides: Set 1
7 10/06 Global-As-View Example
Example Files
Distributed Query Processing
Lecture Slides: Set 1
8 10/13 Distributed Query Processing
Lecture Slides: Set 2 referring to this paper
Limited Source Capabilities / Web Services
Lecture Slides: Set 1, Set 2
9 10/20 Datalog
Lecture Slides: Set 1
Query Containment
Lecture Slides: Set 1
10 10/27 Query Containment Answering Queries Using Views Algorithms (Overview)
Lecture Slides: Set 1
Answering Queries Using Views Algorithms (Bucket Algorithm)
Lecture Slides: Set 1
11 11/03 Answering Queries Using Views Algorithms (Bucket Algorithm) Answering Queries Using Views Algorithms (MiniCon Algorithm)
Lecture Slides: Set 1
12 11/10 Interactive Query Formulation
Lecture Slides: Set 1
Project Discussion Interactive Query Formulation
13 11/17 SchemaSQL
Lecture Slides: Set 1
SchemaSQL
Lecture Slides: Set 2
Consistent Query Answering
Guest Lecture by Jan Chomicki
Lecture Slides
14 11/24 Consistent Query Answering
Guest Lecture by Jan Chomicki
No Lectures (Fall Recess)
15 12/01 SchemaSQL Exercise Schema Matching
Lecture Slides: Set 1
Final Review
Final Exam Due
Monday 12/15 11:00am
Bell Hall 210

Prerequisites

An introductory database course equivalent to CSE 560 or CSE 562

Text

None required. A list of recommended books goes as follows:

Grade Computation

  • Assignments: 15% (set of 3, 5% each)
  • Final: 20%
  • Project: 60%
  • Participation: 5%
  • Grades will be posted here...

Problems

Project

  • Phase 1 Specification (Updated on Friday, November 7th)
    1. Deliverable 1
      • Here is the JavaCC Parser Generator web page. Here is a very simple XQueryParser you can use as a starting point, and here is a test XQuery expression.
    2. Deliverable 2
      • Here is a skeleton of an XPathProcessor you can use as a starting point, and here is a test XPath expression and a test XML file.
      • To access XML files, you should use the standard DOM interface. There is a number of XML DOM parser implementations. We recommend the DOM parser that comes with Java (1.5 and above). The relevant packages are: The W3C spec of DOM is located here but we believe it is write-only (nobody may have been able to fully read it). The JavaDocs for tha above mentioned packages are a good reference, but the easiest way to get started with DOM is to walk through the DOMPrinter sample program that uses these packages and Sun's DOM Tutorial.
      • For test data, download Shakespeare's play, Julius Caesar, in XML form. Here is the associated DTD and here is a test XQuery expression.
  • Phase 2 Specification (Updated on Friday, November 14th)

Resources

XML & DTDs XML Schema XQuery/XPath Web Services Datalog

Rules & Policies

Exam Rules

Group study and discussion are encouraged, but exams must be your own work. For coding assignments, if you use a piece of code which you borrowed from elsewhere and therefore did not write yourself, make sure you comment it to show this. Zero tolerance on plagiarism/cheating: consult the University Code of Conduct for details on consequences of academic misconduct, and see also the academic integrity policy of the CSE department.

Grading Policies

Write clear arguments. Be neat and precise. Getting the right answer may not be enough. The derivation and quality of writing counts! Don't write many different things in hope that you'll get the points if one of them is the right one. Indeed, you will lose points if you follow such a policy.