CSE 462/562: Database Systems (Spring 2022)

Project 2: Buffer Manager and Heap File

Assigned: Thursday, 2/17/2022
Project due: Thursday, 3/10/2022, 11:59 pm EST (extended)
Write-up due:Saturday, 3/12/2022, 11:59 pm EST (extended)
Last updated: 3/6/2022

0. Getting Started

This project will be built upon your code for project 1 FSFile. To get started, download lab2.tar.xz from Autolab and import the supplemental files as your did in project 1. The code should build without compilation errors once the supplemental files are imported, but most of the additional tests are likely to fail. You may list all the tests in your build directory using the ctest -N command. As before, you may attempt any number of code submissions at Autolab, project 2 (the same hourly submission rate limit applies as before). To do so, please tag or create a new branch of the commit you want to submit, and provide the tag or branch name in Autolab.

Grading policy: Your grade will be based on the testing result of your code, and an independently completed write-up (see write-up section for more details). There're no hidden tests in project 2. Once you submit your code on Autolab, it will run the tests on its VM and give you a list of scores for individual test cases. As some of the tests may timeout on the VM due to its unstable I/O performance. we will also run an offline test on your final submission after the deadline and take the summation of the higher score of the offline test score and the autolab score for each individual test case as your final grade of the coding component. For example, if your code passes all the tests except BasicTestEvictionPolicy.TestMemoryLimit on Autolab but passes it in our offline test, you will still get the full score for the coding component.

A few useful hints:

  1. We strongly encourage you to read through the entire document to get a feel of the high-level design of the storage layer in Taco-DB before you start writing code. You might also find it useful to review the lectures slides for the database storage layer. The project requires you to write a fair amount of code, so you might want to start early.
  2. The description below is meant to be a brief overview of the tasks. Always refer to the special document blocks in the headers and source files for a more detailed specification of how the functions should behave.
  3. You may add any definitions/declarations to the classes or functions you're implementing, but do no change the signatures of the public fucntions. Also do not edit those source files not listed in the source files to modify below, as they will be replaced during tests.
  4. You may ignore any function/class that's related to multi-threading and concurrency control since we are building a single-threaded DBMS and your implementation doesn't have to be thread safe.
  5. We provide a list of tasks throughout this document, which is a suggestion of the order of implementation such that your code may pass a few more tests after you complete a task. It, however, is not the only possible order and you may also find alternative order that makes more sense to you. It is also possible that your code still could not pass certain test that is supposed to because of some unexpected dependencies on later tasks, in which case you might want to refer to the test code implementation in the tests/ directory.
  6. If you're working in a team of two, you might want to first go through the list of tasks, agree on a big picture of design, and divide the implementation tasks into indepedent pieces. One strategy is to implement tasks that may be tested independently. For instance, the tasks in buffer manager and those in data page may be tested completely independently, while those in heap file depend on a correct implementation of buffer manager and data page. Alternatively, you could also design the algorithm and data structures together, and identify independent components to complete. You may also come up with better ways of collaboration that makes more sense to you. In any case, the grade for the code implementation will be based the test result of your group's final submission.

1. Project overview

Taco-DB storage layer overview

Figure 1: Taco-DB Storage Layer Overview

In this project, you will work on the storage layer of Taco-DB (see Figure 1). More specifically, the goal is to implement the buffer manager, the data page and the heap file on top of a disk space management component file manager that we provide. There is also an excercise in order to help you understand the record layout in DBMS better: to read the code of the record layout and Schema object implementation in Taco-DB, and answer a few questions in your write-up.

1.1. File Manager

[Back to Project Overview]

Source files to modify: none. You don't have to implement anything in the file manager and the following is only provided as an overview that might be useful to you.

file manager

Figure 2: File Manager

Taco-DB manages its data in a flat main data storage space opprganized as fixed-size pages (see Figure 2), through the FileManager interface defined in include/storage/FileManager.h. The size of each page is fixed to PAGE_SIZE = 4096 bytes. The page numbers are represented as the unsigned 32-bit integer type PageNumber, where page number 0 (INVALID_PID) is an invalid page number reserved for file manager use only. Recall that you have implemented an interface FSFile for file I/O over the file system. The file manager breaks up the entire space into up to 256 files stored on your file system (usually in build/tmp/tmpd.XXXXXX/main when you're running tests). There is no need to worry about the internals of the file manager, but it is useful to understand how the file manager manages the page for debugging purpose.

There is a single global instance of file manager g_fileman defined in include/dbmain/Database.h.

Note: the FileManager also manages the temporary data files, but each temporary file has its own page number space that is independent of the main data storage space. They are not used in this project so we will cover the details in later project that use temporary files. Within the context of this project, a file page refers to a page in the main data storage space.

file manager managed (virtual) file

Figure 3: File manager manged (virtual) file

The file manager further divides the entire main data storage space as (virtual) files (see Figure 3). Each file is simply a linked list of pages in the main data storage space, which may be created and opened through the file manager and you may find the first page, the last page, or extend the file through the File interface. Each data page has a fixed-size header PageHeaderData at the beginning that is managed by the file manager, which stores the next and previous page number and some other information internal to the file manager. When you access a page in this project, you must not modify the area occupied by the PageHeaderData outside the file manager.

1.2. Buffer Manager

[Back to Project Overview]

Source files to modify:

While one could read/write pages directly through FileManager::ReadPage() and FileManager::WritePage(), it is not very efficient as each read/write incurs an I/O syscall into the operating system. As we covered in the lecture, DBMS utilizes a page buffer manager to keep a set of pages in the memory in order to reduce I/O syscalls and increases system throughput. In Taco-DB, the BufferManager is built on top of the FileManager . It handles all the page access requests from other system components and maintains a buffer of disk pages in a fixed number of buffer frames, by issuing I/O calls on the global file manager instance g_fileman.

There is a single global instance of buffer manager g_bufman defined in include/dbmain/Database.h except in buffer manager tests.

buffer manager

Figure 4: Buffer Manager PinPage() call

Figure 4 is a simplistic sketch of how BufferManager works. It allocates a number of consecutive pages in the memory indexed by an unsigned integer type BufferId from 0, with INVALID_BUFID defined as std::numeric_limits::max(). The buffer frames must be aligned to 512-byte boundaries (using aligned_alloc()), because the FileManager uses O_DIRECT I/Os over the file system. For each buffer frame, the buffer manager maintains some meta information in a separate structure, including the current pin count, a dirty bit, the page number its currently holding, as well as necessary information needed by the eviction policy. In addition, the buffer frame should maintain a hash table that maps page numbers to the buffer ids to speed up lookups.

Task 1: Add the necesssary class members to BufferManager and implement:

Your code should still pass BasicTestBufferManager.TestNormalInitAndDestroy at this point (it passes without adding any code!).

Upon a PinPage() call, the buffer manager should look up the page number in the hash table. If it is found in some buffer frame, it increments the pin count and returns its buffer id. Otherwise, the buffer manager selects an unpinned buffer frame using some eviction policy (e.g., LRU, MRU, Clock, etc., but don't use random eviction policy!) and writes back the page currently in the selected frame if it holds a dirty page. Then the requested page is read into the select buffer frame, which is returned to the caller with its pin count incremented. Note that a buffer frame may have a pin count greater than 1. An UnPinPage() call simply decrements the specified buffer frame's pin count.

Task 2: Implement BufferManager::PinPage() and BufferManager::UnpinPage() without a working eviction policy.

Your code is likely to pass the following tests at this point:

Task 3: Implement a reasonably good eviction policy of your choice (e.g., LRU, MRU, Clock, etc.).

Your code is likely to pass the following tests at this point. Note that these tests do not test the efficiency of your eviction policy yet.

The buffer manager in Taco-DB does not flush pages back unless the page is evicted or the database shuts down, as opposed to in a more sophisticated DBMS where the buffer pool may flush pages from time to time for performance and recovery consideration. In Taco-DB, there is a BufferManager::Flush() function that flushes all the dirty pages back to the disk. It should only be called during database shutdown from BufferManager::Destroy(). When that happens, all the buffer frames (whether it is clean or dirty) should have zero pin counts. It is considered as a fatal error if some pin count is not zero, because non-zero pin count indicates bugs in other components.

Task 4: Implement BufferManager::Flush() and add a call to it in BufferManager::Destroy().

Your code should pass all the tests in BasicTestBufferManager and BasicTestEvictionPolicy at this point. The two tests in BasicTestEvictionPolicy checks if your eviction policy is working and reasonably good (i.e., not random eviction). They take a bit longer to run (tens of seconds in debug build), so their timeout are set to higher values on Autolab and in the offline tests.

1.3 Schema and Record Layout

[Back to Project Overview]

The next thing you'll need to understand is how fields are laid out given a particular schema in Taco-DB and answer a set of questions in a writeup submitted along with your code. Specifically, you should read the following source files:

First thing to understand is Taco-DB stores data and type information separately both in memory and on disk, and they are accessed differently. A Datum (see include/base/datum.h) stores an object of any type in memory as a sequence of bytes. You may think of it as a union of values of different types (e.g., integers, floats, strings), but internally it has differnt ways of storing things. In general, a fixed-length object of length 1, 2, 4 or 8 bytes is stored inline in a Datum object (e.g., INT4, INT8), while a variable-length object (e.g., VARCHAR(5)) or a fixed-length object with other number of bytes is stored in a buffer and Datum keeps a pointer to it. Depending on where this buffer is located (on buffer page or in some malloced memory), the buffer may be owned or not owned by the Datum. Note that we do not allow copy a Datum because of that. To reference a Datum, either use const Datum& or the DatumRef class. When an object is stored on a data page, it is always stored as a flat array of bytes without pointers or references. Note that a fixed-length object may be stored as a variable-length datum (e.g., CHAR(5)), which should be accessed using Datum::GetVarlenBytes(), Datum::GetVarlenSize() or Datum::GetVarlenAsStringView() rather than Datum::GetFixedlenBytes().

On the other hand, Taco-DB relies on the type catalog information to interpret the values. The type information is looked up from the catalog table Type. These types are SQL types rather than C++ types, and they can be used to find information about how to manipute these objects. There are a few important fields of the Type catalog table (src/catalog/systables/Type.inc) exlained below:

Finally, a built-in function is some C++ function that accepts a list of Datum of certain SQL types and returns a Datum of a certain SQL type. These built-in functions are managed by a centralized function catalog, and can be looked up/called dynamically through FunctionCall method. For instance, the typlenfunc for a predefined SQL type is a built-in function that accepts a UINT8 Datum and returns a INT2 length. It calculates the binary length of this given type based on the type parameter. You may find the details of built-in functions in include/base/fmgr.h and include/utils/builtin_funcs.h. In later projects, we will add additional catalog information to identify functions that may operate on certain data types, such as numeric operations, string length and etc. For the task below, typlenfunc of CHAR returns the type parameter itself, i.e., the maximum length of the string.

Task 5 (optional, 1 pt): Read the implementation of the Schema class and answer the following questions in your write-up (must be completely individually, not in teams!). Noteh there's a clarification on a comment block which is contradicting the code implementation in a Piazza note.

Considering the following table schema defined in SQL:


  CREATE TABLE A(
	  f1 INT2 NOT NULL,
	  f2 VARCHAR(5),
	  f3 CHAR(10),
	  f4 INT4 NOT NULL,
	  f5 VARCHAR(10),
	  f6 UINT8);
				
  1. Draw a data layout diagram for record {f1: -10, f2: 'abc', f3: 'abcde12345', f4: 15645134, f5: null, f6: 126}. Please include essetial information including where each field is layout (byte range), important byte information stored in this layout, and meta-information stored along with the record (if any) on your diagram.
  2. Briefly explain how the function GetField(field_id, payload) return the datum for the field field_id in a binary-encodded record payload. Use getting field 4 (f4), field 6 (f6) and field 2 (f2) from the record given in the first question as an example. Specifically, go through essential code logic on how offset and length of each field if acquired.

1.4 Data Page

[Back to Project Overview]

Source files to modify:

How to run parameterized tests individually:

The tests for the data page are parameterized. You won't be able to run each individual test case separated if running them with ctest. To run a test with a specific parameter, e.g., the page initialization test with min_reclen = 8, max_reclen = 8 and usr_data_sz = 0, run it directly using the test executable with command: ./tests/storage/BasicTestVarlenDataPage --gtest_filter='*TestInitialize/8_8_0'

Taco-DB manages its data in the fixed-size pages, where we may store a number of records as well as some additional information needed by the database. Taco-DB supports variable-length data types (e.g., VARCHAR), as well as nullable table fields, and thus it needs to have the capability of storing variable-length records on the data pages using a slotted data page design. In this project, a Record is simply an array of bytes of certain length. Note that a record must be aligned to some 8-byte address (you may use MAXALIGN() for that). While it is not required in this project, you may find the details of the variable-length record layout in Taco-DB in include/catalog/Schema.h and src/catalog/Schema.cpp.

data page that supports variable-length records

Figure 5: Data Page that supports variable-length records

The layout of a data page is shown in Figure 5. The page begins with a 24-byte header VarlenDataPageHeader, which includes the PageHeaderData at offset 0. Following that is an optional fixed-size user data area (you'll need it for the later B-tree file project). Slots on the page are indexed as a SlotId type from 1 upwards (with INVALID_SID=0). The slot array starts at the end of the page and grows backwards. Hence the SlotData of slot sid may be accessed as ((SlotData*)(pagebuf + PAGE_SIZE))[-(int) sid], if char *pagebuf points to the beginning of the page. The actual record payload for slot sid is stored somewhere between the end of the user data area and the begin offset of the free space. To insert a record, find a free existing slot or create a new slot in the slot array, and find some empty space to copy the record payload. To erase a record, set its slot to be invalid and possibly remove all the invalid slot at the end of the slot array if any.

Task 6: Implement the page initialization and a few page meta information functions:

Your code is likely to pass the tests */BasicTestVarlenDataPage.TestInitialize/* at this point.

Task 7: Implement the slot query and insertion functions: VarlenDataPage::InsertRecord(), VarlenDataPage::IsOccupied(), and VarlenDataPage::GetRecordBuffer().

Your code is likely to pass the tests */BasicTestVarlenDataPage.TestInsertRecord/* at this point.

Task 8: Implement the record erasure function: VarlenDataPage::EraseRecord().

Your code is likely to pass the tests */BasicTestVarlenDataPage.TestEraseRecord/* at this point.

Task 9: Implement the record update function: VarlenDataPage::UpdateRecord().

Your code is likely to pass the tests */BasicTestVarlenDataPage.TestUpdateRecord/* and */BasicTestVarlenDataPage.TestSidNotValidErrors/* at this point.

As you might have noticed, there could be unoccupied spaces between offsets m_ph_sz and m_fs_begin (i.e., holes) after erasure or update. Hence, the data page should try to compact those space by moving the record payloads to be consecutive. This process is called page compaction. During a page compaction, the SlotData in the slot array may have to be updated, but the slots themselves may not be moved. In other words, a valid slot remains valid and still refers to the same record; an invalid slot remains invalid; and the total number of slots remains unchanged after a compaction. To avoid trying to compact a page every time an insertion or an out-of-place update happens, you should maintain a bit m_has_hole in the page header to indicate whether it is possible to (not must) have holes in the occupied space.

Task 10: Implement the page compaction and update the page update functions accordingly.

You should pass all the tests in BasicTestVarlenDataPage at this point. Hint: not implementing compaction correctly is highly likely to only fail the last three tests with compactions, so you might want to priortize your effort appropriately to maximize your score.

Optional: there are two functions for maintainng a sorted array of records on the page: VarlenDataPage::InsertRecordAt() and VarlenDataPage::RemoveSlot(). These are not required in this project and we do not provide the tests for them yet. You'll need to implement only in a later project of implementing B-trees.

1.4 Heap File (Table interface)

[Back to Project Overview]

Source files to modify:

Tips on the tests:

There are two versions of each tests for the heap files: BasicTestTable and SystemTestTable. You might want to debug your implementation using the basic test first. Their test code is actually the same (see tests/storage/TestTable_common.inc). The difference is that the basic test uses the BootstrapCatCache to create the table descriptor and thus do not have dependencies on your heap file implementation, while the system test uses the PersistentCatCache which initializes a catalog using your heap file implementation and thus is more likely to fail to initialize if your implementation is buggy.

A heap file is simply an unordered (multi-)set of records. It is the most basic way of organizing records in DBMS. In Taco-DB, a heap file is implemented on top of the file manager managed virtual files and stores records in some arbitrary order in the list of pages. A table descriptor TabDesc stores essential information about a table, which includes the file ID of the table. The Table class should open the file through the global file manager g_fileman, make calls to the global buffer manager g_bufman to access the pages, and updates or retrives records on the page using VarlenDataPage. It should support inserting, updating, deleting and iterating over records. To simplify the implementation, you only need to allocate pages when there's no space for insertion, and do not have to reuse free space on previous pages (but you could if you'd like to). In addition, you should leave empty pages as is when they become empty rather than free them in the file manager, because of the nuances in terms of crash recovery and concurrency control (which you do not need to build during this semester). The Table::Iterator class provides forward-only iterator over all the records, and it should only keep at most one page pin at a time.

You might want to pay attention to the specifications in the header file in order to pass all the tests, especially the system tests.

Task 11: Implement the table initialization and construction functions:

Your code is likely to pass BasicTestTable.TestEmptyTable at this point.

Task 12: Implement Table::InsertRecord(), Table::StartScan, Table::StartScanBefore and the Table::Iterator class. You may change, delete, or add to the existing members in the Table and Table::Iterator if necessary, and the signature of any private functions -- they only exist as a suggestion.

You code is likely to pass BasicTestTable.TestInsertRecord and BasicTestTable.TestStartScanBefore at this point.

Task 13: Implement Table::EraseRecord().

Your code is likely to pass BasicTestTable.TestEraseRecord at this point.

Task 14: Implement Table::UpdateRecord().

Your code should pass all the tests in BasicTestTable and SystemTestTable at this point.

Write-up requirement

Each individual student needs to submit an independent write-up to UBLearns for the project design, division of implementation responsibility (no more than 1 page in letter size for design and coding responsibility), and optionally answer the follow-up questions in task 5. Please find the due date of the write-up at the beginning of this web page and submit a PDF document to UBLearns by the deadline. Please find a more detailed description of the requirements and a sample for the write-up on UBLearns.