CSE 462/562: Database Systems (Spring 2022)

Project 2: Buffer Manager

Released: Tuesday, 2/7/2023
Project due: Thursday, 2/16/2023, 01:00 am EST
Write-up due: Saturday, 2/18/2023, 01:00 am EST
Last updated: 2/10/2023

0. Getting Started

This project will be built upon your code for project 1 FSFile. If you need the solution code for project 1, please download it from UBBox (link released on Piazza, under Homework Solutions). To get started, extract /local/Spring_2023/cse562/lab2.tar.xz on the student servers and import the supplemental files as your did in project 1. The code should build without compilation errors once the supplemental files are imported, but most of the additional tests are likely to fail. You may list all the tests in your build directory using the ctest -N command, view them in the Run and Debug window in VSCode. As before, you may attempt any number of code submissions at Autolab, project 2 (the same hourly submission rate limit applies as before). To do so, please tag the commit you want to submit, and provide the tag in Autolab.

Grading policy: Your grade will be based on the testing result of your code, and an independently completed write-up (see write-up section for more details). There're no hidden tests in project 2. Once you submit your code on Autolab, it will run the tests on its VM and give you a list of scores for individual test cases. As some of the tests may timeout on the VM due to its unstable performance, we will run an offline test on your final submission after the deadline. You final score for each test case will be the higher one between the last submission on Autolab and the offline test. For example, if your code passes all the tests except BasicTestEvictionPolicy.TestMemoryLimit on Autolab (which may randomly fail due to timeout from time to time) but passes it in the offline test, you will still get the full score for the coding component.

A few useful hints:

  1. We strongly encourage you to read through the entire document for the high-level design of the storage layer in Taco-DB before you start writing code. The project requires you to write a fair amount of code, so you might want to start early.
  2. The description below is meant to be a brief overview of the tasks. Always refer to the special documentation blocks in the headers and source files for a more detailed specification of how the functions should behave.
  3. You may add any definitions/declarations to the classes or functions you're implementing, but do not change the signatures of the public fucntions. Also do not edit those source files not listed in the source files to modify below, as they will be replaced during tests.
  4. You may ignore any function/class that's related to multi-threading and concurrency control since we are building a single-threaded DBMS. Your implementation doesn't have to be thread safe.
  5. We provide a list of tasks throughout this document, which is a suggestion of the order of implementation such that your code may pass a few more tests after you complete a task. It, however, is not the only possible order and you may also find alternative order that makes more sense to you. It is also possible that your code still could not pass certain test that is supposed to because of some unexpected dependencies on prior/later tasks, in which case you might want to refer to the test code implementation in the tests/ directory.
  6. If you're working in a team of two, you might want to first go through the list of tasks, agree on a big picture of design, and divide the implementation tasks into indepedent pieces. One strategy is to implement tasks that may be tested independently. For instance, the tasks buffer eviction policy and buffer pin logics can be implemented independently as long as you both agree on the same data layouts. Alternatively, you could also do pair coding. You may also come up with better ways of collaboration that makes more sense to you. In any case, the grade for the code implementation will be based the test result of your group's final submission.

1. Project overview

Taco-DB storage layer overview

Figure 1: Taco-DB Storage Layer Overview

Figure 1 shows the storage layer of Taco-DB We have provided the implementation of file manager which manages disk space in a set of data files on the file system. You don't have to modify any code of the file manager but you should read Section 1.1 in order to know the APIs. The goal of project 2 is to implement the buffer manager (see Section 1.2). There is also an excercise in order to help you understand the record layout in DBMS better: to read the code of the record layout and Schema object implementation in Taco-DB, and answer a few questions in your write-up.

1.1. File Manager

[Back to Project Overview]

Source files to modify: none.

Source files to read:

You don't have to implement anything in the file manager. You only need to skim through the file manager API at this stage. You may always revisit this section and the implementation when it's needed.

file manager

Figure 2: File Manager

Taco-DB manages its data in a flat main data storage space opprganized as fixed-size pages (see Figure 2), through the FileManager interface defined in include/storage/FileManager.h. The size of each page is fixed to PAGE_SIZE = 4096 bytes. The page numbers are represented as the unsigned 32-bit integer type PageNumber, where page number 0 (INVALID_PID) is an invalid page number reserved for file manager use. Recall that you have implemented an interface FSFile for file I/O over the file system. The file manager breaks up the entire space into up to 256 files stored on your file system (usually in a tmpd.XXXXXX/main when you're running tests). There is no need to worry about every detail of the file manager, but it is useful to understand how the file manager manages the page for debugging purpose.

Note: the tmpd.XXXX directory is removed upon normal exit of a test case so that your quota on the student servers will not be depleted very quickly. However, it may remain in build.Debug/tmp (or build.Release/tmp if you're building the code in release mode) if the program exits abnormally (e.g., SEGFAULT, stopped during debugging in VSCode/GDB, ...). If you're running out of disk quota (e.g., FSFile complains about out of disk space, or tests fail due to failure to open files), try to remove everything in this directory first (DO NOT remove the tmp directory, or the tests will not run properly). The build script will also remove data directories created more than 1 day ago when you build your code. Later in the second week, all the temporary data will be automatically placed into a local directory /local/Spring_2023/your-ubit-name/cse562_tmps instead so that you will less likely run out of the quota.

There is a single global instance of file manager g_fileman declared in include/dbmain/Database.h.

Note: the FileManager also manages the temporary data files, but each temporary file has its own page number space that is independent of the main data storage space. They are not used in this project so we will cover the details in later projects that use temporary files. Within the context of this project, a file page refers to a page in the main data storage space.

file manager managed (virtual) file

Figure 3: File manager manged (virtual) file

The file manager further divides the entire main data storage space as (virtual) files (see Figure 3). Each file is simply a linked list of pages in the main data storage space, which may be created and opened through the file manager. An open virtual file is returned as a object of File class, and you may find the first page, the last page, or extend the file through it. Each data page has a fixed-size header PageHeaderData at the beginning that is managed by the file manager and stores the next and previous page number and some other information internal to the file manager. When you access a page in Taco-DB, you must not modify page header.

Question 1: How to create/open a virtual file through the global file manager instance g_fileman? How to get the file ID of an open virtual file? Fill each blank in the following snippet with a C++ expression to answer this question. In your write-up submission, you only need to write the three C++ expressions in the order they appear below.


    // create a new virtual file and get its file ID
    std::unique_ptr<File> f =                
    FileID fid =                        
    f->Close(); 

    // reopen the file
    f =                   
                
1.2. Buffer Manager

[Back to Project Overview]

Source files to modify:

While one can read/write pages directly through FileManager::ReadPage() and FileManager::WritePage() on demand, it is not very efficient as each read/write incurs a disk I/O. As we covered in the lecture, DBMS utilizes a page buffer manager to keep a set of pages in the memory in order to reduce I/O requests and improve the data access throughput and latency. In Taco-DB, the BufferManager is built on top of the FileManager . It handles all the page access requests from other system components and maintains a buffer of disk pages in a fixed number of buffer frames, by issuing I/O calls on the global file manager instance g_fileman.

There is a single global instance of buffer manager g_bufman declared in include/dbmain/Database.h except in buffer manager tests.

buffer manager

Figure 4: Buffer Manager PinPage() call

Figure 4 is a sketch of how BufferManager works. It allocates a number of consecutive pages in the memory indexed by an unsigned integer type BufferId from 0, with INVALID_BUFID defined as std::numeric_limits::max(). The buffer frames must be aligned to 512-byte boundaries (using aligned_alloc()), because the FileManager uses O_DIRECT I/Os over the file system. For each buffer frame, the buffer manager maintains some meta information in a separate structure, including the current pin count, a dirty bit, the page number its currently holding, as well as necessary information needed by the eviction policy. In addition, the buffer frame should maintain a hash table that maps page numbers to the buffer ids to speed up lookups.

Question 2: Since the pages are part of the virtual files, do we need to store the file IDs of the pages in the buffer meta information? Please explain why using a line of code that reads a page with page number pid in the file with file ID fid into a page buffer pointed by char *pagebuf from the file manager g_fileman. In your write-up submission, you only need to answer yes/no and show the line of code. (Hint: if you answer no, then fid should not appear in this line of code.)

Task 1: Add the necesssary class members for the data structures described above to BufferManager and implement:

Your code should still pass BasicTestBufferManager.TestNormalInitAndDestroy at this point (it passes before you modify any source code).

Upon a PinPage() call, the buffer manager should look up the page number in the hash table. If it is found in some buffer frame, it increments the pin count and returns its buffer ID. Otherwise, the buffer manager selects an unpinned buffer frame using some eviction policy (e.g., LRU, MRU, Clock, etc., but don't use random eviction policy!) and writes back the page currently in the selected frame if it holds a dirty page. Once the frame is clean, you may read the requested page into the selected buffer frame. It is then returned to the caller with its pin count incremented. Note that a buffer frame may have a pin count greater than 1 (think: why and when?). An UnPinPage() call simply decrements the specified buffer frame's pin count.

Question 3: When a buffer manager is initially created, all buffer frames are free and may be used to handle a request for a new page to be pinned. Since we only have a limited number of buffer frames, all buffer frames will be occupied by some page as long as there is sufficient number of pin page requests on distinct pages. Could you determine whether there is a free frame in O(1) time using the hash table for looking up buffer IDs by page numbers, assuming it is an absl::flat_hash_map or an std::unordered_map? You may denote the hash map as h. Write a single C++ expression for this question.

Question 4: Can you design a function that returns the buffer ID of a free buffer frame in O(1) time using the hash table h in the previous question, assuming the function is only called when there is at least one free buffer frame? Write down the code of this function and briefly explain why it is correct.

Task 2: Implement BufferManager::PinPage() and BufferManager::UnpinPage() without a working eviction policy.

Your code is likely to pass the following tests at this point:

If there are a sufficient number of pin page requests over distinct pages, we will eventually run out of free frames. In that case, the current PinPage() implementation will not work unless we implement an eviction policy for choosing a victim page to evict from the buffer pool. We have discussed several basic eviction policies (LRU, MRU, Clock) in the lecture, you may choose any of them to implement.

Question 5: What will be the size of the hash table for mapping page numbers to buffer IDs eventually, given a sequence of pin page requests over a sufficient number of distinct pages that is more than the number of buffer frames?

Task 3: Implement a reasonably good eviction policy of your choice (e.g., LRU, MRU, Clock, etc.).

Your code is likely to pass the following tests at this point. Note that these tests do not test the efficiency of your eviction policy yet.

The buffer manager in Taco-DB does not flush pages back unless the page is evicted or the database shuts down, as opposed to in a more sophisticated DBMS where the buffer pool may flush pages from time to time for performance and recovery consideration. In Taco-DB, there is a BufferManager::Flush() function that flushes all the dirty pages back to the disk. It should only be called during database shutdown from BufferManager::Destroy(). When that happens, all the buffer frames (whether it is clean or dirty) should have zero pin counts. You should throw a fatal error if some pin count is not zero, because non-zero pin count indicates bugs in other components.

Task 4: Implement BufferManager::Flush() and add a call to it in BufferManager::Destroy().

Your code should pass all the tests in BasicTestBufferManager and BasicTestEvictionPolicy at this point. The two tests in BasicTestEvictionPolicy checks if your eviction policy is working and reasonably good (i.e., not random eviction). They take a bit longer to run (tens of seconds in debug build), so their timeout are set to higher values on Autolab and in the offline tests. If it runs for more than 1 minute in the debug build, there is a high chance that your implementation is not efficient enough to pass the tests.

1.3 Schema and Record Layout

[Back to Project Overview]

Source files to modify: none.

Source files to read:

In this Section, we will explore how fields are laid out given a particular schema in Taco-DB. This is an implementation of one possible record layout discussed in the lectures. You do not have to modify the code but you will need to read the code to answer the questions in your project write-up.

Before we talk about the Schema objects and record layout, let's have a brief overview of data types and data representation in Taco-DB. You might be quite familiar with what data types are in common programming languages (e.g., int, float, std::string, ... in C++) -- a data type usually defines a set of possible values, a set of allowed operations as well as the representation of these values in terms of low-level primitives such as bits, bytes, or words, etc. Since we are implementing an RDBMS, it has a different type system (of SQL) than the language it is implemented in (i.e., C++), an important distinction to make.

Each SQL type (e.g., INT4, FLOAT, VARCHAR(256)) has its own valid set of values and operations permitted. A datum of a SQL type is always stored as a fixed-length or variable-length object (a sequence of bytes in memory or on disk) in Taco-DB without the type information attached. For instance, an INT4 is stored as a 4-byte object with its initial address aligned (cf. Data Alignment in Records of Lecture 4) to 4-byte memory address/file offset boundaries. There are a set of functions and operators that can be applied to INT4, mostly defined in src/utils/typsupp/int4.cpp. It may be conveniently stored in an int32_t in C++ for simplifying the implementation. Another example is VARCHAR(n), a variable-length type, which is stored as an object with the same length as the string itself of up to n characters with no special alignment (i.e., alignment = 1). Most of its functions and operators are defined in src/utils/typsupp/varchar.cpp. In C++, we pass around a pointer (char*) to the beginning of a VARCHAR(n) object when it's in memory. You may wonder how we can determine the length of a VARCHAR string since we don't store a length with the object or a null-terminator ('\0' as a C string). The short answer is the length is encoded in the record layout and you will find out how once you finish reading this section.

Because the database has to operate on data with dynamic types specified during runtime, the system must know the type information of a datum during run time in order to operate on it. Each type has its own requirements and allowed operators and functions stored in a database catalog (a set of special tables internal to the system), and we use a unique Type ID to denote each type. Whenever we process a table in a database, we obtain a Schema object, which stores the type IDs for each field. That is one of the two main purposes of the Schema objects -- to keep track of the types of data.

Question 6: Why don't we store the type information along with data? Hint: to answer this question, please show the answers for the following questions and briefly explain why for Q6: 1) What is the least amount of space in KiloBytes is needed for storing a table A(x: INT4, y: INT4)10,000 rows if we store a Type ID along with each datum in the table. 2) What is the least amount of space if we store one Type ID per field instead (outside the table data storage)?

Since different data types have different representation and types are dynamic during run time, we use a Datum object (see include/base/datum.h) to store an object of any type in memory (as a sequence of bytes). You may think of it as a union of values of different types (e.g., integers, floats, strings), but internally it has differnt ways of storing data. In general, a fixed-length object of length 1, 2, 4 or 8 bytes is stored inline in a Datum object (e.g., int32_t, int64_t), while a variable-length object (e.g., strings) or a fixed-length object with other number of bytes is stored in a buffer and Datum keeps a pointer (char*, which can be cast to any other pointers to specific type structures). Depending on where this buffer is located (on buffer page or in some malloc-allocated memory), the buffer may be owned or not owned by the Datum. Note that Datum does not have a copy constructor or a copy assignment operator because of that (more on this in project 5). To reference a Datum, you may either use const Datum& or the DatumRef class. When an object is stored on a data page, it is always stored as a flat array of bytes without pointers or references. Note that a fixed-length object may be stored as a variable-length datum (e.g., CHAR(5)), which should be accessed using Datum::GetVarlenBytes(), Datum::GetVarlenSize() or Datum::GetVarlenAsStringView() rather than Datum::GetFixedlenBytes().

On the other hand, Taco-DB relies on the type catalog information to interpret the values. The type information is looked up from the catalog table Type. These types are SQL types rather than C++ types, and they can be used to find information about how to manipute these objects. There are a few important fields of the Type catalog table (<builddir>/generated_source/catalog/systables/Type.h -- you must have buit the code to see this file) exlained below:

A built-in function is some function implemented in C++ that accepts a list of Datum of certain SQL types and returns a Datum of a certain SQL type. These built-in functions are managed by a centralized function catalog, and can be looked up/called dynamically through FunctionCall method. For instance, the typlenfunc for a predefined SQL type is a built-in function that accepts a UINT8 Datum and returns a INT2 length. It calculates the binary length of this given type based on the type parameter. You may find the details of built-in functions in include/base/fmgr.h and include/utils/builtin_funcs.h. As an example, for the fixed-length string CHAR(n), its corresponding typlenfunc will return n because we always store it as an n-byte object.

Finally, you are ready for understanding how the Schema object computes record layout, the second main purpose of it. The record layout is briefly described in the documentation comment block of the class in include/catalog/Schema.h, and the implementation is in Schema::ComputeLayoutImpl(), Schema::WritePayloadToBufferImpl(), Schema::GetOffsetAndLength(). We will leave this as an exercise to you.

Hint: if a non-negative integer x is of some integral type T (e.g., T = int) and d is an non-negative integer, then the following are always true as long as the result does not overflow:

  1. x << d == x * (T) pow(2,d)
  2. x >> d == x / (T) pow(2,d)
  3. x & ((1 << d) - 1) == x % (1 << d)

Question 7: How would you get a field as a Datum from a record in memory, given a pointer char *rec to it, its corresponding schema object Schame *sch and its field ID FieldId fid? Write down a single C++ expression for this question.

Question 8: Assuming the Datum you get in Q7 has the type of INT4, how would you get its value as an int32_t in C++? Write down a single C++ expression for this question using rec, sch and fid in Q7. (note: this will be useful for debugging code in later projects)

Consider the following table schema defined in SQL for questions 9 - 15.


  CREATE TABLE A(
    f1 CHAR(2),
    f2 VARCHAR(100) NOT NULL,
    f3 CHAR(5) NOT NULL,
    f4 INT2 NOT NULL,
    f5 VARCHAR(2),
    f6 INT4
  );
				

Question 9: For a record in table A that does not have any NULL fields, what is the order of all the fields sorted by their offsets?

For questions 10 - 15, let t1 be a record in the table with the following values: (f1 = 'xy', f2 = '1', f3 = 'abcde', f4 = 78, f5 = NULL, f6 = 201)

Question 10: What is the offset of field f1 in record t1? Is it always at the same offset for any record in table A?

Question 11: What is the offset of field f2 in record t1? Is it always at the same offset for any record in table A?

Question 12: What is the offset of field f3 in record t1? Is it always at the same offset for any record in table A?

Question 13: What is the offset of field f4 in record t1? Is it always at the same offset for any record in table A?

Question 14: What is the offset of field f6 in record t1? Is it always at the same offset for any record in table A?

Question 15: What is the total length of record t1? Is the total length always the same as the offset of the end of the last field (i.e., the offset of the last field + the length of the last field) you listed in Q9, for any record in table A?

2. Write-up requirement

Each individual student needs to complete a write-up independently and submit it to UBLearns with answers to Questions 1 - 15. Please find the due date of the write-up at the beginning of this web page and submit a PDF document to UBLearns by the deadline. Please do not submit any other formats such as .docx, .doc, .txt -- we will not grade any non-PDF submission.