CSE 707 Seminar: Select Topics on Modern Database Systems (Fall 2022)

Modern database systems have different designs compared to traditional RDBMS in many aspects. In this seminar, we will review and discuss a range of papers of modern database system designs. The topics will include query processing, transaction processing, indexing and storage, and etc.
Prerequisites: CSE 462/562 Database Systems or equivalents.

Logistics

Location and Time: Davis Hall 113A, Wednesday 10:00 am to 12:50 pm.
Instructor: Zhuoyue Zhao, zzhao35 [at] buffalo [dot] edu, Davis Hall 338I.
Office hours are on demand. Email me for an appointment.
No required textbook.
Optional readings: Readings in Database Systems, 5th Edition, by Peter Bailis, Joseph M. Heller- stein, and Michael Stonebraker. Available online.
Attendance is required (see grading policy).
We'll be using Piazza for discussion and Q&A. Find our class page here.
Please make all required assignment submissions to UBLearns.

Course Requirements (Please Read)

Grading

The seminar is graded on S/U basis. Score >= 75%: satisfactory and score < 75: unsatisfactory.

Course Schedule

For UB students: Some of the following links may require an ACM Digital Library subscription. UB library has a paid subscription available for all students so you do not need to pay. You may either connect to eduroam when you're on campus, or replace dl.acm.org with dl-acm-org.gate.lib.buffalo.edu and enter your UBIT login credentials when you're off campus.

* Paper summary deadline is extended to Saturday at 11:00 PM for the weeks with two presentations.

The following schedule is subject to change due to student add/drop. Please double check the latest schedules when completing your assignments.

Date Topic # Required Readings Presenter
8/31/2022 Logistics and Introduction Lecture Slides N/A
Online Analytical Query Processing
9/7/2022 Columnar store 10 Andrew Lamb, et al. The Vertica Analytic Database: CStore 7 Years Later. In VLDB '12 Anirudh
9/14/2022 Vectorized query execution 20 Peter Boncz, et al. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR '05. Rohit
9/21/2022 Query compilation 30 Thomas Neumann. Efficiently Compiling Efficient Query Plans for Modern Hardware. In VLDB '11. Rahul
9/28/2022 SIMD 40 Orestis Polychroniou, et al. Rethinking SIMD Vectorization for In-Memory Databases. In SIGMOD '15. Haohua
Approximate Query Processing
*10/5/2022
(Davis 338A)
Online Aggregation 50 Joseph M. Hellerstein, et al. Online aggregation. In SIGMOD '97. Chaoping
Overview of Approximate Query Processing 60 Chaudhuri et al. Approximate Query Processing: No Silver Bullet Hon Ching
*10/12/2022 Random Sample Generation 70 Frank Olken. Random Sampling From Databases, Section 2.4 - 2.6; and
Jeffery S. Vitter. Random Sampling with a Reservoir.
Roshan
80 Frank Olken. Random Sampling From Databases, Section 2.7 and Section 3; and
Alastair J. Walker. An Efficient Method for Generating Discrete Random Variables with General Distributions. In ACM TOMS'77.
Pratik
10/19/2022 Independent Range Sampling 90 Xiaocheng Hu, et al. Independent Range Sampling. In PODS '14. Harish
10/26/2022 Join Sampling 100 Zhuoyue Zhao, et al. Random Sampling over Joins Revisited. In SIGMOD '18. Karan
11/2/2022 Sample Set Selection 110 Sameer Agarwal, et al. BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data. In Eurosys '13. Gunjan
11/9/2022 Joins in AQP 120 Peter J. Haas, et al. Ripple Joins for Online Aggregation. In SIGMOD '99; and
Chris Jermaine, et al. Scalable approximate query processing with the DBO engine. In SIGMOD '07;
Raman
11/16/2022 Joins in AQP (cont'd) 130 Feifei Li, et al. Wander Join: Online Aggregation via Random Walks. In SIGMOD '16. Gang
11/23/2022 Fall recess, no lecture today.
11/30/2022 Joins in AQP (cont'd) 140 Yu Chen, et al. Two-Level Sampling for Join Size Estimation. In SIGMOD '17. Sphoorthi
*12/7/2022 ML-based AQP (cont'd) 150 Qingzhi Ma, et al. DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models. In SIGMOD '19. Jiaheng

Academic Integrity

All assignments (pre-lecture questions, assignments and presentation slides) must be prepared and written independently and reflect the student's own opinions. Simply paraphrasing other students' work is considered as plagiarism and we take the recommended actions for any discovered academic integrity violation per Departmental and University policies, including receiving an F grade or other appropriate penalties depending on the severity of the violation. While discussion among students prior to and after lectures are allowed, it is your responsibility to ensure that your submission is not substantially similar to any other student's submission. Note that it is generally acceptable to use part or all of the presentation slides found on conference website or the author's website, as long as there are proper citations and acknowledgments.