CSE 462/562: Database Systems (Spring 2022)

Project 1: Project sign-up and File I/O interface

Assigned: Tuesday, 2/1/2022
Project sign-up (lab0) due: Tuesday, 2/8/2022, 11:59 pm EST
Project (lab1) due: Tuesday, 2/15/2022, 11:59 pm EST
No write-up due
Grading break-down: RepoSetup (0.5%) + FSFile (4.5%)
Last updated: 1/26/2022

0. Project overview

In this course, you will work in teams of up to 2 people to build a mini database system Taco-DB in C++11. There are 5 projects throughout the semester, covering the various layers in Taco-DB from bottom up. Here's a list of the projects: storage (I/O, buffer manager, heap file), indexing (B-tree), query processing (relational operators, join, external sorting), and query optimization (access path selection, size estimation). These cover most of the topics we discussed in the lectures, with the exception of transaction processing, concurrency control and crash recovery due to time constraints. Each project except Project 1 will have a single submission deadline.

Working in a two-person team is only meant to reduce your workload in terms of coding rather than excusing you from working on any part of the projects. In other words, you will first need to come up with a design and implementation plan together with your teammate; clearly and fairly divide and/or coordinate your coding responsibilities; and finish your share of work responsibly. Remember, it is your team's code submission that gets graded, not each individual's. If you do not complete your share of task, your teammate will also lose the points for that.

In addition to that, each individual student will need to independently complete and submit a write-up for each project (except project 1), without sharing with or copying from your teammate. The write-up should include all a brief description of your solution and your division of coding responsibilities, as well as additional questions we may require you to answer. You will receive no points for the entire project (but not your teammate) if you do not submit the write-up on time.

You will need to set up a private repository on Github and grant us access to your repository to make submissions. We will use Autolab to pull your code and test it. Please refer to lab 0 below for details of how to set it up and how the grading works. The independently completed write-ups should be submitted to UBLearns, in PDF format by the project deadline + 2 days, unless otherwise posted. Note that you may not make any of your project code publicly available during or after this semester, or make them available priviately to any current or future students who may take the course. Please carefully review the academic integrity policy on the course homepage for more details.

In this project, your task is 1) lab0: sign up for the project and set up your Github repository; and 2) lab 1: implement a simple File interface based on Linux I/O syscalls. This project is less relevant to Database, instead, the goal is to help you set up your coding environment, and get familiar with C++ and Linux programming if you were not. At the end of this project, you should be familiar with Git and Github; be familiar with the project codebase and the cmake build system; and understand the semantics of the Linux I/O syscalls.

1. Lab 0 - Project sign-up

[Back to Project Overview]

Your task is to set up a working repository and your build environment on your local machine or CSE student server. You'll complete the project sign-up by making a successful submission on Autolab - lab0.

Special instruction for lab 0: Each team only has to set up a single repository -- some of the steps below only have to be completed by one person in a team, while some of them need to be done by both.

You should also get familiar with basic shell and git operations, basic Github workflow as well as C++11 if not already. Here're a few good resources: POSIX shell, git, Github, C++11: here, and here

System requirements: You'll need a desktop or laptop with an x86-64 CPU and a reasonably recent Linux distribution installed. The code is tested on Ubuntu 20.04 (either bare machine or virtual machine, including Windows Subsystem for Linux 2) and Fedora 35. It may or may not compile on non-x86-64 or non-Linux systems, and even if it compiles, it may not function as intended (especially if you are using a laptop with ARM CPU such as Apple M1, Microsoft SQ1 or SQ2, and/or operating systems like Mac OS or Windows Win32).

For CSE students: if you do not have access to a required system, you may remotely access these two CSE student servers using SSH: timberlake.cse.buffalo.edu and metallica.cse.buffalo.edu. They are x86-64 systems installed with CentOS-7, and configured with all the required software packages (see alternative guides in lab 0). However, they are shared among all CSE students, so you should expect them to be slower and should not run any resource intensive jobs on them.

For non-CSE students: if you do not have access to the CSE student servers and do not have access to a required system, please send a private message on Piazza as soon as possible.

The following is a list of software packages you need to install:

The autograder and our final test server are both installed with Ubuntu 20.04 and gcc/g++ 9.3.

Step 1: Create a private repository on Github using your personal account. Follow the guides below.
create a new repository on Github

Step 2: Add buffalo-cse562-sp22 and your teammate's Github user (if any) as collaborators in settings -> Manage access -> Add people. Do not add anyone else as collaborators. The grading script will reject the submission if you have more than your group size + 1 collaborators (including yourself).

Step 3: Download the lab0.tar.xz tarball from Autolab and put it in a place where you'd like to set up your local Git repository. Then enter the following in a shell (e.g., bash) with the <github-username> and <repo-name> replaced with your actual Github user name and repository names:


    # change into the directory where you put lab0.tar.gz
    cd <dir-where-you-put-lab0.tar.gz>
    # extract the tarball
    tar xf lab0.tar.xz
    # rename the extracted directory to the same as your repository
    mv lab0 <repo-name>
    # change into the directory
    cd <repo-name>
    # setting up the git repo
    ./setup_repo.sh git@github.com:<github-username>/<repo-name>

                

If everything goes well, the script will print Repo setup is finished. Here are a few post-setup steps to follow:... Please follow the post-setup steps.

Your teammate should also be able to clone the repository with the imported code and follow the remaining steps at this time.

Step 4: Build the code. We use cmake as the build system. Most likely you will not need to modify any of the CMakeLists.txt, but you should know how to invoke the cmake.

Here's how to create a debug build in the build directory:


    cd <dir-to-local-repository>
    cmake -B build . # don't miss the dot at the end or cmake will report errors
    cd build
    make
                

The first build will also build a few dependencies in external/ directory: Abeisl, GoogleTest, Jemalloc. Future builds will be faster as long as you do not remove the external/ directory locally. You don't have to run cmake again in future build either -- just run make in the build/ directory.

Alternative guides for those who are using CSE student servers: you should use the following command instead to configure the cmake build diretory because the default toolchain on timberlake and metallica won't work:


    cmake -B build -DCMAKE_C_COMPILER=/util/gcc-7.2.0/bin/gcc -DCMAKE_CXX_COMPILER=/util/gcc-7.2.0/bin/g++ -DPKG_CONFIG_EXECUTABLE=/util/pkg-config-0.29.2/bin/pkg-config .
                

Step 5: Run the tests locally with ctest (which comes with cmake):


    cd build
    ctest -V
                

There is only one test in lab 0: BasicTestRepoCompilesAndRuns.TestShouldAlwaysSucceed, which, as its name suggests, should always succeed without modifying any source code.

Alternative guides for those who are using CSE student servers: you need to add /util/gcc-7.2.0/lib64 to the LD_LIBRARY_PATH environment variable for the dynamic link loader to locate the newer version of glibc. If you're using the default tcsh and never configured your rc file, you may do so by finding and replacing these lines in your ${HOME}/.cshrc:


if (! $?LD_LIBRARY_PATH) then
    setenv LD_LIBRARY_PATH /extra
else
    setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/extra
endif
                

with the following:


if (! $?LD_LIBRARY_PATH) then
    setenv LD_LIBRARY_PATH /util/gcc-7.2.0/lib64
else
    setenv LD_LIBRARY_PATH /util/gcc-7.2.0/lib64:${LD_LIBRARY_PATH}
endif
                

Step 6: Make a submission for lab 0 on Autolab. We will only count the last submission made on Autolab by each team as your final submission. Later on, the final submission is the code we'll test offline on a different server.

Before you submit anything, you might want to click on the Group options tab and invite your teammate to form a group. Both of you will be able to see all your submissions once your teammate accepts the invitation. You might need to recreate the group for every project in Autolab. This has no effect on your grade even if you don't create the group as we will keep track of the team members using the lab 0 submission form below. We disabled group options in Autolab due to a technical issue with system. You are still working in teams, but your teammate's submission and grade is not visible to you. Please coordinate with each other in private. That said, your teammate's code submission still count towards yours and your final grade for the coding component will be based on the last submission made by your group, even if you don't have a grade on your gradebook (on Autolab).

To make a submission, fill in the form (shown below) with your teammate's UBIT name (or leave it empty if you'd like to work individually) and your repository information, and then click submit. If you're not sure what your or your teammate's UBIT name is, that is the username part in your @buffalo.edu email address (e.g., your UBIT name is xxxx, if your email address is xxxx@buffalo.edu). Our submission system on our offline test server will reject any UBIT name it does not recognize, but once it accepts the UBIT name, you'll have to contact the TAs and the instructor to make changes. You can make as many submissions as you need to in order to correct your repository settings (though we do enforce a maximum submission rate of 10 submissions per hour per team).

For students who enrolled close to or after the add/drop deadline, please send a private message on Piazza under requests folder to all TAs and the instructor to manually add your UBIT name to our backend server, before you can submit lab 0. There's no late penalty for that, but you won't be able to submit later projects without lab 0 successfully submitted.
Autolab lab 0 form

If your did not receive the full score, you may click on the score to check its output log in order to identify issues.
Autolab 0 click on score for feedback

And the test log for a failed submission should contain enough information for debugging (similar to below).
Autolab 0 failed submission log

You may change your repository link until the deadline of lab 0 but do not enter any repository link that do not belong to your team! You won't be able to change the repository link once the deadline of lab 0 passes. We may keep a log of all the submissions you have made, and submitting a repository link that does not belong to your team is an AI violation.

2. lab 1 - File Interface

[Back to Project Overview]

This is supposed to be a easy lab that walks you through of the process of importing new files into the repository, building, testing, debugging, and submitting the labs. We will also provide a bit more detailed instructions than later labs.

Importing supplemental files into your repository: Download lab1.tar.xz and extract its content in your local repository root using tar xf. Then run the script lab1/import_supplemental_files.sh. If it is successful, you will see an output similar to below:
importing supplemental files in lab 1

The code should successfully build at the moment but none of the new tests should pass. We recommend that you make a new branch for each lab using git branch <branch-name> and/or create a tag git tag <branch-name> for the starting point, in case you need to revert back to an earlier point. At this point, you may commit and push the new code to Github using git push. If you made a tag, you'll additionally call git push --tags as well.

Task: Use the Linux I/O syscalls (e.g., open(2), close(2), pread(2), pwrite(2), etc.) to implement the FSFile class in src/storage/FSFile.cpp and include/storage/FSFile.h. See the header and source file for requirements and hints.
Note: you may ignore all thread-safety related requirements, since we will be building a single-threaded database.

Testing and debugging: For lab 1, there are 14 basic tests BasicTestFSFile.XXX defined in tests/storage/BasicTestFSFile.cpp and another 14 basic tests BasicTestFSFile.XXXNoFallocate with an additional --test_never_call_fallocate argument. The NoFallocate variants test the fallback implementation of FSFile::Allocate() when fallocate(2) is not supported by your OS or file system. Your implementation should pass each of these 28 tests within tens of milliseconds. There are also 2 hidden system tests SystemTestFSFile.TestAllocateLarge and SystemTestFSFile.TestAllocateLargeNoFallocate, which are not visible to you and you can only see the result of these tests when you submit the code to Autolab. In later labs, the hidden tests are those that are more complex, runs on a larger data and/or have a longer running time that are likely to fail on Autolab -- we will test all the test cases offline after the deadline and take the higher score in the Autolab test and the offline test for each of the test cases. Nevertheless, they should not fail as long as your code was implemented per specification.

You may use ctest -V to run all the tests as in the previous lab. Alternatively, you may test and/or debug an individual test by invoking its binary directly. For instance, the following shows how to test BasicTestFSFile.TestCreateFile:


    cd build
    # to test BasicTestFSFile.TestCreateFile
    ./tests/storage/BasicTestFSFile --gtest_filter=BasicTestFSFile.TestCreateFile
    # to list more options
    ./tests/storage/BasicTestFSFile --help
                

You may debug your code using GDB and/or log messages. We provide a LOG() macro which you can use to print any messages in your code (see include/base/logging.h for details). As a concrete example, you may use the following code to print the value of a variable x:


    int x = 0;
    // do something with x
    LOG(kInfo, "the value of x is %d", x);
                

Note that the log messages are all disabled by default in the tests, because we do not want to see too many log messages when we are expecting errors in the function calls during tests. To enable the log messages, run the test with an additional --disable_logs=false argument, e.g.,


    cd build
    # run BasicTestFSFile.TestCreateFile with log enabled
    ./tests/storage/BasicTestFSFile --disable_logs=false --gtest_filter=BasicTestFSFile.TestCreateFile
                

Submitting your code: Starting from lab 1, you only need to provide a branch or a tag name that references the git commit you want to test. Suppose you have tested your implementation locally, and want to test the latest commit, you may do the following to create a tag:


    git commit -a
    git push
    git tag <some-tag-name>
    git push --tags
                

Then make a submission at the lab 1 assessment page on Autolab with the tag name:
submitting a tag or branch in lab 1