Menu
|
|
|
|
CSE 636: Data Integration
Fall 2008 (#486324)
What's New
Please check the newsgroup for announcements.
Lecture notes will be posted on this web page. A substantial part of
the site will be in HTML and PDF formats.
Staff
Newsgroup
sunyab.cse.636
Course Description
Data integration has been recognized as a research
topic of big practical importance. The availability of integrated data
from multiple independent, heterogenous data sources is crucial for
many applications. Data integration requires combining and matching
information in different sources, and resolving a variety of
conflicts. XML is becoming a de facto data integration standard. With
the number of data sources growing very quickly, data integration is
bound to become even more important in the future. This course will
survey selected theoretical and practical issues arising in data
integration.
The course is based on the instructor's lectures and
guest lectures. The lectures cover most of the material in the course.
Schedule
. |
Mon |
Tue |
Wed |
Thu |
Fri |
Lecture |
12:00pm - 12:50pm
200G Baldy Hall |
. |
12:00pm - 12:50pm
200G Baldy Hall |
. |
12:00pm - 12:50pm
200G Baldy Hall |
Instructor Office Hours |
. |
. |
1:00pm - 2:00pm
210 Bell Hall |
. |
1:00pm - 2:00pm
210 Bell Hall |
The following is a tentative schedule of lectures.
Changes will be posted to the newsgroup.
Since the lectures are being recorded, here is the
link for the videos.
|
Week |
Monday
Lecture |
Wednesday
Lecture |
Friday
Lecture |
1 |
08/25 |
Introduction & Overview
Lecture Slides: Set
1, Set 2 |
XML Data Model / Document Type
Definition (DTD)
Lecture Slides: Set
1 |
2 |
09/01 |
No Lecture (Labor Day) |
XML Schema
Lecture Slides: Set
1 |
3 |
09/08 |
XPath
Lecture Slides: Set
1 |
XQuery |
4 |
09/15 |
XQuery
Lecture Slides: Set
1
The XML files used on the XQuery
slides |
5 |
09/22 |
Project Discussion |
6 |
09/29 |
Global-As-View / Local-As-View
Lecture Slides: Set 1 |
7 |
10/06 |
Global-As-View Example
Example Files |
Distributed Query Processing
Lecture Slides: Set 1 |
8 |
10/13 |
Distributed Query Processing
Lecture Slides: Set 2
referring to this paper |
Limited Source Capabilities / Web Services
Lecture Slides: Set
1, Set 2 |
9 |
10/20 |
Datalog
Lecture Slides: Set
1 |
Query Containment
Lecture Slides: Set
1 |
10 |
10/27 |
Query Containment |
Answering Queries Using Views Algorithms
(Overview)
Lecture Slides: Set
1 |
Answering Queries Using Views Algorithms (Bucket
Algorithm)
Lecture Slides: Set
1 |
11 |
11/03 |
Answering Queries Using Views Algorithms (Bucket
Algorithm) |
Answering Queries Using Views
Algorithms (MiniCon Algorithm)
Lecture Slides: Set
1 |
12 |
11/10 |
Interactive Query Formulation
Lecture Slides: Set
1 |
Project Discussion |
Interactive Query Formulation |
13 |
11/17 |
SchemaSQL
Lecture Slides: Set
1 |
SchemaSQL
Lecture Slides: Set
2 |
Consistent Query Answering
Guest Lecture by Jan Chomicki
Lecture Slides |
14 |
11/24 |
Consistent Query Answering
Guest Lecture by Jan Chomicki |
No Lectures (Fall Recess) |
15 |
12/01 |
SchemaSQL Exercise |
Schema Matching
Lecture Slides: Set
1 |
Final Review |
Final Exam Due
Monday 12/15 11:00am
Bell Hall 210 |
Prerequisites
An introductory database course equivalent to CSE
560 or CSE 562
Text
None required. A list of recommended books goes as
follows:
Grade Computation
- Assignments: 15% (set of 3, 5% each)
- Final: 20%
- Project: 60%
- Participation: 5%
- Grades will be posted here...
Problems
Project
- Phase 1
Specification (Updated on Friday, November 7th)
- Deliverable 1
- Here is the JavaCC
Parser Generator web page. Here is a very simple XQueryParser you can use as a
starting point, and here is a
test XQuery expression.
- Deliverable 2
- Here is a skeleton of an XPathProcessor you can use
as a starting point, and here is a test XPath expression and a test XML file.
- To access XML files, you should use the standard DOM
interface. There is a number of XML DOM parser implementations. We
recommend the DOM parser that comes with Java (1.5 and above). The
relevant packages are:
The W3C spec of DOM is located here but we
believe it is write-only (nobody may have been able to fully read
it). The JavaDocs for tha above mentioned packages are a good
reference, but the easiest way to get started with DOM is to walk
through the DOMPrinter sample
program that uses these packages and Sun's DOM
Tutorial.
- For test data, download Shakespeare's play, Julius Caesar, in XML form.
Here is the associated DTD and
here is a test XQuery
expression.
- Phase 2
Specification (Updated on Friday, November 14th)
Resources
XML & DTDs
XML Schema
XQuery/XPath
Web Services
Datalog
Rules & Policies
Exam Rules
Group study and discussion are encouraged, but exams
must be your own work. For coding assignments, if you use a piece of
code which you borrowed from elsewhere and therefore did not write
yourself, make sure you comment it to show this. Zero tolerance on
plagiarism/cheating: consult the University Code of Conduct for
details on consequences of academic misconduct, and see also the academic
integrity policy of the CSE department.
Grading Policies
Write clear arguments. Be neat and precise. Getting
the right answer may not be enough. The derivation and quality of
writing counts! Don't write many different things in hope that you'll
get the points if one of them is the right one. Indeed, you will lose
points if you follow such a policy.
|