This seminar course will cover a range of topics related to storing and querying large datasets. Specific topics covered will include a variety of distributed systems and primitives, including data processing, synchronization, key-value stores, stream processors, as well as full SQL database systems.
This course meets WEEKLY on MONDAYS from 9:00 AM to 10:50 AM in Davis 113A
Office hours are Monday and Thursday from 2:00 to 4:00, or by appointment
Grading is S/U. (yes, the system says letter grades, grading will be S/U regardless)
All students are expected to submit a short, weekly abstract and critical analysis of the week's papers, and to participate in class discussion of the paper. The weekly report is due via email to okennedy at buffalo before class starts, or in class. Students are allowed to miss up to 2 weeks worth of abstracts (out of a total of 10) without penalty.
The top 3 abstracts for each week will be posted on the site.
Students enrolled for at least 2 credits MUST contact the instructor to sign up to present and lead a discussion about at least one of the papers below. Students enrolled for 1 credit may also choose to present, pending availability of the desired papers.
Students enrolled for 3 credits will also be required to submit a simple experimental project and a short report/presentation on their results. Computing resources will be provided.
Student project ideas should be approved by the instructor by the beginning of October.
Several resources will be made available for student use and testing. See me for access details.
When writing a critique, I like to distill the essence of each paper by asking myself a few questions (presented here with some example answers for Map/Reduce):
Questions 1 and 2 should help with the summary, and a good critique is based on your answers to 3 and 4.
Week | Presenter | Theme | System (url) | Notes |
---|---|---|---|---|
August 27 | No Class - Oliver away | |||
Sept 3 | No Class - Labor Day | |||
Sept 10 | Oliver (slides) | Data Flow |
Course Introduction Dryad | |
Sept 17 | No Class - Rosh Hashanah | |||
Sept 24 | Jon L. (slides) Ying Y. (slides) | Map/Reduce |
MapReduce (Original Paper) HDFS | |
Oct 1 | Raghav A. (slides) Ying Y. (slides) | Extremely Parallel Query Languages 1 |
Hive (Demo Paper) HadoopDB (Demo Paper) | Project proposals due Top Critiques |
Oct 8 | Gomathivinayagam M. (slides) Raghav A. (slides) | Extremely Parallel Query Languages 2 |
Pig Dremel | Top Critiques |
Oct 15 | Janhavi D. (slides) Niccolo M. (slides) | Column Stores |
MonetDB DataCyclotron | Top Critiques |
Oct 22 | Ravi M. (slides) Kyungho J. (slides) | NoSQL Databases |
Cassandra BigTable/HBase | Top Critiques |
Oct 29 | Mike O. (slides) Dinesh R. (slides) | Distributed Consistency |
Percolator ZooKeeper | First Project Milestone (code) Top Critiques |
Nov 5 | Gomathivinayagam M. (slides) Sakthi G.(slides) | Distributed Hash Tables |
Chord Dynamo | Top Critiques |
Nov 12 | Niccolo M. (slides) Ravi M. (slides) | WAN Datastores |
PNUTS Spanner | Top Critiques |
Nov 19 | Kyungho J. (slides) Oliver (slides) | Stream Processors |
Borealis DBToaster and Laasie (no paper) | |
Nov 26 | Janhavi D. (slides) Sakthi G. (slides) | Misc Topics |
PIQL Lipstick | Second Project Milestone (benchmarks) |
Dec 3 | Student Presentations |