CSE 707 Seminar: Select Topics on Modern Database Systems (Fall 2022)
Modern database systems have different designs compared to traditional RDBMS in many aspects. In this seminar, we will review and discuss a range of papers of modern database system designs. The topics will include query processing, transaction processing, indexing and storage, and etc.
Prerequisites: CSE 462/562 Database Systems or equivalents.
Logistics
Location and Time: Davis Hall 113A, Wednesday 10:00 am to 12:50 pm.
Instructor: Zhuoyue Zhao, zzhao35 [at] buffalo [dot] edu, Davis Hall 338I.
Office hours are on demand. Email me for an appointment.
No required textbook.
Optional readings: Readings in Database
Systems, 5th Edition, by Peter Bailis, Joseph M. Heller- stein,
and Michael Stonebraker. Available
online.
Attendance is
required (see grading policy).
We'll be using Piazza for discussion and Q&A.
Find our class page here.
Please make all required assignment submissions to UBLearns.
Course Requirements (Please Read)
- Read the required paper before coming to the lectures.
- List three questions before each lecture and submit
them to UBLearns one day in advance (due on every Tuesday,
10:00 AM).
-
Write a short paper summary after each lecture and submit
it to UBLearns within two days (due on every Friday,
10:00 AM). The only exception is the weeks with two
presentations, where the deadline will be extended to (Saturday,
11:00 PM).
-
Present one of the selected papers. Your presentation will
be graded on its soundness and completeness. Please
submit your presentation slides before the lecture and no
later than one day prior to your presentation (due
Tuesday 10 AM). You may also be required to make
changes to your presentation slides and resubmit it prior to
the presentation Therefore, make sure you are reachable through
Piazza before your presentation to avoid losing points.
The presentation should be longer than a usual conference
talk (plan for 1 hour presentation + 30 min discussion),
and discuss the topics in depth, including a brief
introduction to the background, the problem solved, the
solution, as well as experimental evaluation if available
in the paper.
Note that the presentation slides may be based on existing
works of others (e.g., the conference talk slides from the
paper authors), as long as there are proper citations and
acknowledgments. However, you should add your own contents
in your presentation slides in most cases because 1) the
slides need to reflect your own opinions; 2) conference
talk slides usually do not cover enough technical details
due to the time limit of conference talks.
Some useful tips for presentation preparation:
you are encouraged to start early on your presentation (e.g., one or two weeks in advance) if
not familiar with the topics and you may have to spend a
few days to go through the background and the technical
details of the paper. You are also encouraged to discuss
the paper with me and Feel free to post a private message on
Piazza for any questions or for scheduling office hours.
Grading
- 8% for the pre-lecture questions (0.5% each).
- 32% for the paper summaries (2% each).
- 30% for the presentation.
- 30% for participation (evaluated based on your attendance, in-class discussion, and random quizzes).
The seminar is graded on S/U basis. Score >= 75%: satisfactory and score < 75: unsatisfactory.
Course Schedule
For UB students: Some of the following links may require an ACM
Digital Library subscription. UB library has a paid
subscription available for all students so you do not need to pay.
You may either connect to eduroam when you're on campus, or replace dl.acm.org
with dl-acm-org.gate.lib.buffalo.edu
and enter your UBIT login credentials when you're off campus.
* Paper summary deadline is extended to Saturday at 11:00 PM
for the weeks with two presentations.
The following schedule is subject to change due to student
add/drop. Please double check the latest schedules when
completing your assignments.
Date |
Topic |
#
| Required Readings |
Presenter |
8/31/2022 |
Logistics and Introduction |
|
Lecture Slides
|
N/A |
Online Analytical Query Processing |
9/7/2022 |
Columnar store |
10 |
Andrew Lamb, et al. The Vertica Analytic Database: CStore 7 Years Later. In VLDB '12
|
Anirudh |
9/14/2022 |
Vectorized query execution |
20 |
Peter Boncz, et al. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR '05.
|
Rohit |
9/21/2022 |
Query compilation |
30 |
Thomas Neumann. Efficiently Compiling Efficient Query Plans for Modern Hardware. In VLDB '11.
|
Rahul |
9/28/2022 |
SIMD |
40 |
Orestis Polychroniou, et al.
Rethinking SIMD Vectorization for In-Memory Databases. In SIGMOD '15. |
Haohua |
Approximate Query Processing |
*10/5/2022 (Davis 338A) |
Online Aggregation |
50 |
Joseph M. Hellerstein, et al. Online aggregation. In SIGMOD '97.
|
Chaoping |
Overview of Approximate Query Processing |
60 |
Chaudhuri et al. Approximate Query Processing: No Silver Bullet |
Hon Ching |
*10/12/2022
| Random Sample Generation |
70 |
Frank Olken. Random Sampling From Databases, Section 2.4 - 2.6; and
Jeffery S. Vitter. Random Sampling with a Reservoir.
|
Roshan
|
80 |
Frank Olken. Random Sampling From Databases, Section 2.7 and Section 3; and
Alastair J. Walker. An Efficient Method for Generating Discrete Random Variables with General Distributions. In ACM TOMS'77.
|
Pratik |
10/19/2022 |
Independent Range Sampling |
90 |
Xiaocheng Hu, et al. Independent Range Sampling. In PODS '14.
|
Harish |
10/26/2022 |
Join Sampling |
100 |
Zhuoyue Zhao, et al. Random Sampling over Joins Revisited. In SIGMOD '18. |
Karan |
11/2/2022 |
Sample Set Selection |
110 |
Sameer Agarwal, et al. BlinkDB: Queries with Bounded Errors and
Bounded Response Times on Very Large Data. In Eurosys '13.
|
Gunjan |
11/9/2022 |
Joins in AQP |
120 |
Peter J. Haas, et al. Ripple Joins for Online Aggregation. In SIGMOD '99; and
Chris Jermaine, et al. Scalable approximate query processing with the DBO engine. In SIGMOD '07;
|
Raman |
11/16/2022 |
Joins in AQP (cont'd) |
130 |
Feifei Li, et al. Wander Join: Online Aggregation via Random Walks. In SIGMOD '16.
|
Gang |
11/23/2022 |
Fall recess, no lecture today. |
11/30/2022 |
Joins in AQP (cont'd) |
140 |
Yu Chen, et al. Two-Level Sampling for Join Size Estimation. In SIGMOD '17. |
Sphoorthi |
*12/7/2022 |
ML-based AQP (cont'd) |
150 |
Qingzhi Ma, et al. DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models. In SIGMOD '19.
|
Jiaheng |
Academic Integrity
All assignments (pre-lecture questions, assignments and
presentation slides) must be prepared and written independently
and reflect the student's own opinions. Simply paraphrasing
other students' work is considered as plagiarism and we take
the recommended actions for any discovered academic integrity
violation per
Departmental and University policies, including receiving an F grade or other appropriate penalties depending on the severity of the violation.
While discussion among students prior to
and after lectures are allowed, it is your responsibility
to ensure that your submission is not substantially similar to
any other student's submission.
Note that it is generally acceptable to use part or all of the
presentation slides found on conference website or the author's
website, as long as there are proper citations and
acknowledgments.