Released: Thursday, 9/5/2024
Project due: Monday, 9/16/2024, 23:59:59 PM EDT
Last updated: 9/12/2024
This project will be built upon your code for project 1 FSFile.
If your code does not pass tests and you'd like to use the
solution code for Project 1, please extract
/ro-data/labs/lab1_sol.tar.xz
(to be released
after Project 1 grace period on 9/11) within your local
repository root in your dev container. Then, run
./import_supplemental_files.sh
. If you run into any errors,
please read the third paragraph below in bold font.
To get started with project 2, extract
/ro-data/labs/lab2.tar.xz
within your
local repository root in your dev container. Then import the
supplemental files using
./import_supplemental_files.sh
. The code should
build without compilation errors once the supplemental files
are imported, but most of the additional tests are likely to
fail. In addition, existing test cases may also fail due to the
introduction of the additional system components, which should
pass once you correctly implement the components in project 2.
You may list all the tests in your build directory using the
ctest -N
command, view them in the Run and
Debug
window in VSCode.
Note: your repository
must be free of any uncommitted or untracked files before
running the import script. If you run into an error during
import, please remove the lab2
directory and import_supplemental_files.sh
script; commit your repo; and retry.
As before, you may attempt any number of code submissions,
subject to the same hourly submission rate limit of
10/hour/group applies as before. To do so, please tag the
commit you want to submit, submit your commit using the
following command: submit test 2
<git-commit-id-or-tag-name>
.
To list your submissions, please enter submit list-subs 2
.
To list the submission details of a submission with a specific
sequence number, please enter: submit list-sub-details 2 <seqno>
.
To find the project deadline and last date to submit, please enter:
submit list-labs
.
Grading policy: Your grade will be based on
the testing result of your code. There is a hidden test suite
in project 2 SystemTestTable
whose source code is
not available in the lab tarball. Once you submit your code,
the hidden test cases will also run on your code, and any scores
you obtain from the hidden test cases will also count towards
your grade.
A few useful hints:
tests/
directory.
Figure 1 shows the storage layer of Taco-DB We have provided the implementation of file manager which manages disk space in a set of data files on the file system. You don't have to modify any code of the file manager but you should read Section 1.1 in order to know the APIs. On top of that, we provide a buffer manager that implements the buffer pool we discussed in the class, in order to reduce I/Os and improve data access latency. You do not need to implement anything, but you should also read Section 1.2 and the associated source code to understand the implementation and interfaces.
The goal of project 2 is to implement the buffer manager (see Section 1.2), the (slotted) data page (Section 1.4). There is also an exercise in order to help you understand the record layout in DBMS better: to read the code of Section 1.3: the record layout and Schema object implementation in Taco-DB (no submission needed).
Source files to modify: none.
Source files to read:
include/storage/FileManager.h
src/storage/FileManager.cpp
You don't have to implement anything in the file manager. You only need to skim through the file manager API at this stage. You may always revisit this section and the implementation when it's needed.
Taco-DB manages its data in a flat main data storage space
opprganized as fixed-size pages (see Figure 2), through the
FileManager
interface defined in
include/storage/FileManager.h
. The size of each page
is fixed to PAGE_SIZE
= 4096 bytes. The page
numbers are represented as the unsigned 32-bit integer type
PageNumber
, where page number
0 (INVALID_PID)
is an invalid page number reserved for file manager use.
Recall that you have implemented an interface
FSFile
for file I/O over the file system. The
file manager breaks up the entire space into up to 256 files
stored on your file system (usually in a
tmpd.XXXXXX/main
directory when you're running
tests). There is no need to worry about every detail of the
file manager, but it is useful to understand how the file
manager manages the page for debugging purpose.
Note: the tmpd.XXXX
directory is removed upon
normal exit of a test case so that your will not deplete the
shared storage space on minsky
. However, it may
remain if the program exits abnormally (e.g., SEGFAULT, stopped
during debugging in VSCode/GDB, ...). The build script will
also remove data directories created more than 1 day ago when
you build your code.
There is a single global instance of file manager
g_fileman
declared in
include/dbmain/Database.h
.
Note: the FileManager also manages the temporary data files, but each temporary file has its own page number space that is independent of the main data storage space. They are not used in this project and we will cover the details in later projects that use temporary files. Within the context of this project, a file page refers to a page in the main data storage space.
The file manager further divides the entire main data storage
space as (virtual) files (see Figure 3). Each file is simply a
linked list of pages in the main data storage space, which may
be created and opened through the file manager. An open virtual file
is returned as a object of File
class, and you may
find the first page, find the last page, or extend the file through
it. Each data page has a fixed-size
header PageHeaderData
at the beginning that is
managed by the file manager and stores the next and previous
page number and some other information internal to the file
manager. When you access a page in Taco-DB, you must not
modify the page header.
Source files to modify: none
Source files to read:
While one can read/write pages directly through
FileManager::ReadPage()
and
FileManager::WritePage()
on demand, it is not very
efficient as each read/write incurs a disk I/O. As we covered
in the lecture, DBMS utilizes a page buffer manager to keep a
set of pages in the memory in order to reduce I/O requests and
improve the data access throughput and latency. In Taco-DB, the
BufferManager
is built on top of the
FileManager
. It handles all the page access
requests from other system components and maintains a buffer of
disk pages in a fixed number of buffer frames, by issuing I/O
calls on the global file manager instance
g_fileman
.
There is a single global instance of buffer manager
g_bufman
declared in
include/dbmain/Database.h
except in buffer manager
tests.
Figure 4 is a sketch of how BufferManager works. It
allocates a number of consecutive pages in the memory indexed
by an unsigned integer type BufferId
from 0, with
INVALID_BUFID
defined as
std::numeric_limits
. The buffer
frames must be aligned to 512-byte boundaries
(using aligned_alloc()
), because the FileManager
uses O_DIRECT I/Os over the file system. For each buffer
frame, the buffer manager maintains some meta information in a
separate structure, including the current pin count, a dirty
bit, the page number its currently holding, as well as
necessary information needed by the eviction policy. In
addition, the buffer frame should maintain a hash table that
maps page numbers to the buffer ids to speed up lookups.
Upon a PinPage()
call, the buffer manager should
look up the page number in the hash table. If it is found in
some buffer frame, it increments the pin count and returns its
buffer ID. Otherwise, the buffer manager selects an unpinned
buffer frame using some eviction policy (e.g., LRU, MRU, Clock,
etc., but don't use random eviction policy!) and writes back
the page currently in the selected frame if it holds a dirty
page. Once the frame is clean, you may read
the requested page into the selected buffer frame.
It is then returned to the caller with its pin count
incremented. Note that a buffer frame may have a pin count
greater than 1 (think: why and when?). An UnPinPage() call
simply decrements the specified buffer frame's pin count.
If there are a sufficient number of pin page requests over
distinct pages, we will eventually run out of free frames. In
that case, the current PinPage()
implementation
will not work unless we implement an eviction policy for
choosing a victim page to evict from the buffer pool. We have
discussed several basic eviction policies (LRU, MRU, Clock) in
the lecture, you may choose any of them to implement.
The buffer manager in Taco-DB does not flush pages back unless
the page is evicted or the database shuts down, as opposed to
in a more sophisticated DBMS where the buffer pool may flush
pages from time to time for performance and recovery
consideration.
In Taco-DB, there is a BufferManager::Flush()
function that flushes all the dirty pages back to the disk. It
should only be called during database shutdown from
BufferManager::Destroy()
. When that happens, all
the buffer frames (whether it is clean or dirty) should have
zero pin counts. You should throw a fatal error if some pin
count is not zero, because non-zero pin count indicates bugs in
other components.
Source files to modify: none.
Source files to read:
include/catalog/Schema.h
src/catalog/Schema.cpp
include/base/datum.h
In this Section, we will explore how fields are laid out given a particular schema in Taco-DB. This is a variant of one of the record layouts discussed in the lectures. Your task is to understand the details and implement a few functions to access fields in a record correctly.
Before we talk about the Schema
objects and record
layout, let's have a brief overview of data types and data
representation in Taco-DB. You might be quite familiar with
what data types are in common programming languages (e.g.,
int, float, std::string, ...
in C++) -- a
data type
usually defines a set of possible values, a set of allowed
operations as well as the representation of these values in
terms of low-level primitives such as bits, bytes, or words,
etc. Since we are implementing an RDBMS, it has a different
type system (of SQL) than the language it is implemented in
(i.e., C++), an important distinction to make.
Each SQL type (e.g., INT4, FLOAT, VARCHAR(256)
)
has its
own valid set of values and operations permitted. A datum of a
SQL type is always stored as a fixed-length or variable-length
object (a sequence of bytes in memory or on disk) in Taco-DB
without the type information attached. For instance, an
INT4
is stored as a 4-byte object with its initial
address aligned (cf. Data Alignment in Records of Lecture 4) to 4-byte
memory address/file offset boundaries. There are a set of
functions and operators that can be applied to
INT4
, mostly defined in
src/utils/typsupp/int4.cpp
. It may be conveniently
stored in an int32_t
in C++ for simplifying the
implementation. Another example is
VARCHAR(n)
, a variable-length type, which is
stored as an object with the same length as the string itself
of up to n
characters with no special alignment
(i.e., alignment = 1). Most of its functions and operators are
defined in src/utils/typsupp/varchar.cpp
. In C++,
we pass around a pointer (char*
) to the beginning
of a VARCHAR(n)
object when it's in memory. You may
wonder how we can determine the length of a
VARCHAR
string since we don't store a length with
the object or a null-terminator ('\0'
as a C
string). The short answer is the length is encoded in the
record layout and you will find out how once you finish reading
this section.
Because the database has to operate on data with dynamic types
specified during runtime, the system must know the type
information of a datum during run time in order to operate on
it. Each type has its own requirements and allowed operators
and functions stored in a database catalog (a set of special
tables internal to the system), and we use a unique Type
ID
to denote each type. Whenever we process a table in a
database, we obtain a Schema
object, which stores the
type IDs for each field. That is one of the two main purposes of
the Schema
objects -- to keep track of the types of
data.
Since different data types have different representation and
types are dynamic during run time, we use a Datum
object (see include/base/datum.h
) to store an
object of any type in memory (as a sequence of bytes).
You may think of it as a union
of values of
different types (e.g., integers, floats, strings), but
internally it has differnt ways of storing data. In general,
a fixed-length object of length 1, 2, 4 or 8 bytes is stored
inline in a Datum object (e.g., int32_t
,
int64_t
), while a variable-length object (e.g.,
strings) or a fixed-length object with other
number of bytes is stored in a buffer and Datum keeps a pointer
(char*
, which can be cast to any other pointers
to specific type structures).
Depending on where this buffer is located (on buffer
page or in some malloc-allocated memory), the buffer may be owned or
not owned by the Datum. Note that Datum
does not have
a copy constructor or a copy assignment operator because of
that (more on this in project 5). To reference a
Datum
, you may either use const
Datum&
or the DatumRef
class. When an
object is stored on a data page, it is always stored as a flat
array of bytes without pointers or references. Note that a
fixed-length object may be stored as a variable-length
datum (e.g., CHAR(5)
), which should
be accessed using Datum::GetVarlenBytes()
,
Datum::GetVarlenSize()
or
Datum::GetVarlenAsStringView()
rather than
Datum::GetFixedlenBytes()
.
On the other hand, Taco-DB relies on the type catalog
information to interpret the values. The type information is looked up from the
catalog table Type. These types are SQL types rather than C++ types, and they
can be used to find information about how to manipute these objects. There are
a few important fields of the Type catalog table
(<builddir>/generated_source/catalog/systables/Type.h
-- you must have built
the code to see this file) exlained
below:
typid
: each type has a unique object id (Oid) in the systemtyplen
: the length of a type if it is fixed-length and has no type parameter that may determine its length. If this is a fixed-length record and also has a type parameter that determines its length, this is undefined. If this is a variable-length type, this is always -1.typisvarlen
: whether this type is variable-length.typalign
: a value of this type must be aligned to typalign
byte memory address or within a data page. This is 1, 2, 4, or 8.typlenfunc
: a function id such that the function accepts a type parameter (INT8
) and returns its length if there's one. This is only valid when the type is fixed-length but its length is determined by its type parameter.
A built-in function is some function implemented in C++ that
accepts a list of Datum
of certain SQL types and
returns a Datum of a certain SQL type. These built-in functions
are managed by a centralized function catalog, and can be
looked up/called dynamically through FunctionCall
method. For instance, the typlenfunc
for a
predefined SQL type is a built-in function that accepts a
UINT8
Datum and returns a INT2
length. It calculates the binary length of this given type
based on the type parameter. You may find the details of
built-in functions in include/base/fmgr.h
and
include/utils/builtin_funcs.h
. As an example, for
the fixed-length string CHAR(n)
, its corresponding
typlenfunc
will return n
because we
always store it as an n
-byte object.
Finally, you are ready for understanding how the
Schema
object computes record layout, the second
main purpose of it. The record layout is briefly described in
the documentation comment block of the class in
include/catalog/Schema.h
, and the implementation
is in Schema::ComputeLayoutImpl()
,
Schema::WritePayloadToBufferImpl()
. You should
read the code to figure out the specification of the layout. As
an exercise, your task is
Task 1.3.1: Implement the record field accessor functions:
Schema::FieldIsNull()
Schema::GetOffsetAndLength()
Your code is likely to pass the tests
BasicTestSchema.*
at this point.
Hint: You might find reading the test cases in
tests/catalog/BasicTestSchema.cpp
and
tests/catalog/TestSchema.h
as useful examples to
help you understand the layout implementation.
Source files to modify:
How to run parameterized tests individually: The tests
for the data page are parameterized (meaning each named test case
is actually a suite of test cases instantiated from a template with
different parameter values).
You won't be able to run each individual test case if running
them with ctest. Instead, ctest
will run a test
template for multiple rounds with all the parameter values.
To run a test with a specific
parameter, e.g., the page initialization test with
min_reclen = 8
, max_reclen = 8
and
usr_data_sz = 0
,
./tests/storage/BasicTestVarlenDataPage
--gtest_filter='*TestInitialize/8_8_0'
.
If you are running the test case using gdb
in
command line, start it as usual: gdb
./tests/storage/BasicTestVarlenDataPage
. Then enter
r --gtest_filter='*TestInitialize/8_8_0'
in GDB to
start the program with a specific test case.Run and Debug
(Ctrl + Shift + d
). Locate the test suite
(e.g., */BasicTestVarlenDataPage.TestInitialize/*
)
from the drop down menu and click on the settings
button right next to it. This will open .vscode/launch.json
in the text editor and the cursor will be located at the entry for
that specific test case. Update the first argument in the
args
field with the specific parameter that you
want to run (e.g., */BasicTestVarlenDataPage.TestInitialize/8_8_0
).
Brief overview of data pages:
Taco-DB manages its data in the fixed-size pages (of PAGE_SIZE
bytes), where we may
store a number of records as well as some additional
information needed by the database. Taco-DB supports
variable-length data types (e.g., VARCHAR), as well as nullable
table fields, and thus it needs to have the capability of
storing variable-length records on the data pages using a
slotted data page design. In this project, a
record is simply an array of bytes of certain
length. Note that a record must be aligned to
some 8-byte address (you may use MAXALIGN()
for
that).
The layout of a data page is shown in Figure 5. The page begins
with a 24-byte header VarlenDataPageHeader
, which
includes the PageHeaderData
at offset 0.
Following that is an optional fixed-size user data area (you'll
need it for the later B-tree project). The user of a VarlenDataPage
may request reserving the user data area for a specific size at initialization, and
your implementation must make sure the bytes in this area are not
modified by the VarlenDataPage
. Slots on the page
are indexed by SlotId
typed integers, from 1 upwards (with
INVALID_SID=0
). The slot array starts at the end
of the page and grows backwards. Hence the
SlotData
of slot sid
may be accessed
as ((SlotData*)(pagebuf + PAGE_SIZE))[-(int) sid]
,
if char *pagebuf
points to the beginning of the
page. The actual record payload for slot sid
is
stored somewhere between the end of the user data area and the
begin offset of the free space. To insert a record, find a free
existing slot or create a new slot in the slot array, and find
some empty space to copy the record payload. To erase a record,
set its slot to be invalid and possibly remove all the invalid
slot at the end of the slot array if any. You may refer to the
class and function documentation in the header file for a more
detailed specification.
Task 1.4.1: Implement the page initialization and a few page meta information functions:
VarlenDataPage::InitializePage()
VarlenDataPage::VarlenDataPage()
VarlenDataPage::GetUserData()
VarlenDataPage::GetMaxSlotId()
VarlenDataPasge::GetRecordCount()
Your code is likely to pass the tests
*/BasicTestVarlenDataPage.TestInitialize/*
at this
point.
Task 1.4.2: Implement the slot query and insertion functions:
VarlenDataPage::InsertRecord()
,
VarlenDataPage::IsOccupied()
, and
VarlenDataPage::GetRecordBuffer()
.
Your code is likely to pass the tests
*/BasicTestVarlenDataPage.TestInsertRecord/*
at
this point.
Task 1.4.3: Implement the record erasure function:
VarlenDataPage::EraseRecord()
.
Your code is likely to pass the tests
*/BasicTestVarlenDataPage.TestEraseRecord/*
at this
point.
Task 1.4.4: Implement the record update function:
VarlenDataPage::UpdateRecord()
.
Your code is likely to pass the tests
*/BasicTestVarlenDataPage.TestUpdateRecord/*
and
*/BasicTestVarlenDataPage.TestSidNotValidErrors/*
at this point.
As you might have noticed, there could be unoccupied spaces
between offsets m_ph_sz
and
m_fs_begin
(i.e., holes) after erasure or update.
Hence, the data page may have to compact those space by moving
the record payloads to be consecutive. During a page
compaction, the SlotData in the slot array may have to be
updated, but the slots themselves may not be moved. In other
words, a valid slot remains valid and still refers to the same
record; an invalid slot remains invalid; and the total number
of slots remains unchanged after a compaction. To avoid trying
to compact a page every time an insertion or an out-of-place
update happens, you should maintain a bit
m_has_hole
in the page header to indicate whether
it is possible to (not must) have holes in the occupied
space. In the tests, we assume the data page performs lazy
compaction, which means the compaction process will not be
triggered until it becomes necessary. Please refer to the class
documentation for more details.
Task 1.4.5: Implement the page compaction and update the page update functions accordingly.
You should pass all the tests in
BasicTestVarlenDataPage
at this point.
You might also noticed there are four additional functions for
maintainng a sorted array of records and computing the
available space: VarlenDataPage::InsertRecordAt()
,
VarlenDataPage::RemoveSlot()
,
VarlenDataPage::ShiftSlots()
, and
VarlenDataPage::ComputeFreeSpace()
. These are not
required in this project and we do not provide the tests for
them yet. You will need to implement some of these in future
projects. We will also provide additional test cases when you
are required to implement them.
Source files to modify:
Tips on the tests:
There are two versions of each tests
for the heap files: BasicTestTable
and
SystemTestTable
. You might want to debug your
implementation using the basic test first. The code
for both tests
is actually the same (see
tests/storage/TestTable_common.inc
). The
difference is that the basic test uses the
an in-memory catalog implementation
BootstrapCatCache
to create the table descriptors
in tests, and thus do not have dependencies on your heap file
implementation. The hidden system test uses the a heap file
based catalog implementation PersistentCatCache
,
which initializes a catalog using your heap file
implementation, and thus is more likely to fail to initialize
if your implementation is buggy.
The test
BasicTestRepoCompilesAndRuns.TestShouldAlwaysSucceed
,
which also uses PersistentCatCache
, is a good
indicator for whether your heap file implementation is buggy.
If it passes, your code may or may not pass the system test.
However, it probably will not pass the system tests if this
test fails.
Brief overview of heap file:
A heap file is simply an unordered (multi-)set of records. It
is the most basic way of organizing records in DBMS. In
Taco-DB, a heap file is implemented on top of the file manager
managed virtual files and stores records in some arbitrary
order in the list of pages. A table descriptor
TabDesc
stores essential information about a
table, which includes the file ID of the table. The
Table
class should open the file through the
global file manager g_fileman
, make calls to the
global buffer manager g_bufman
to access the
pages, and updates or retrives records on the page using
VarlenDataPage
. It should support inserting,
updating, deleting and iterating over records. To simplify the
implementation, you only need to allocate pages when there's no
space for insertion, and do not have to reuse free space on
previous pages (but you could if you'd like to). In addition,
you should free the page in the file manager when they become
empty. The Table::Iterator
class provides forward-only
iterator over all the records, and it should only keep at most
one page pin at a time.
Note: in a more realistic DBMS, this may cause an issue of trying to deallocating a page that is actively being scanned (which may be in the same transaction/thread, such as an UPDATE statement). You do not have to worry about the tricky case in this project because we ensure that there will be either no concurrent scans over the file when a deletion or update happens, or the concurrent scan may only cause in-place update of records (which can never result in empty pages).
You might want to pay attention to the specifications in the header file in order to pass all the tests, especially the system tests.
Before you start working on the code, please answer the
following questions by reading the specifications in
include/storage/Table.h
.
Task 1.5.6: Implement the table initialization and construction functions:
Table::Initialize()
Table::Create()
Table::Table()
Table::~Table()
Your code is likely to pass
BasicTestTable.TestEmptyTable
at this point.
Task 1.5.7: Implement Table::InsertRecord()
,
Table::StartScan
,
Table::StartScanBefore
and the
Table::Iterator
class. You may change, delete, or
add to the existing members in the Table
and
Table::Iterator
if necessary, and the signature of
any private functions -- they only exist as a suggestion.
You code is likely to pass
BasicTestTable.TestInsertRecord
and
BasicTestTable.TestStartScanBefore
at this point.
Task 1.5.8: Implement Table::EraseRecord()
.
Your code is likely to pass
BasicTestTable.TestEraseRecord
at this point.
Task 1.5.9: Implement Table::UpdateRecord()
.
Your code should pass all the tests in
BasicTestTable
and SystemTestTable
at
this point.