Released: Tuesday, 2/7/2023
Project due: Thursday, 2/16/2023, 01:00 am EST
Write-up due: Saturday, 2/18/2023, 01:00 am EST
Last updated: 2/10/2023
This project will be built upon your code for project 1 FSFile.
If you need the solution code for project 1, please download it
from UBBox (link released on Piazza, under Homework Solutions).
To get started, extract
/local/Spring_2023/cse562/lab2.tar.xz
on the
student servers and import the supplemental files as your did
in project 1. The code should build without compilation errors
once the supplemental files are imported, but most of the
additional tests are likely to fail. You may list all the tests
in your build directory using the ctest -N
command, view them in the Run and Debug
window in
VSCode. As before, you may attempt any number of code
submissions at Autolab, project 2 (the same hourly submission
rate limit applies as before). To do so, please tag the commit
you want to submit, and provide the tag in Autolab.
Grading policy: Your grade will be based on
the testing result of your code, and an
independently completed write-up (see
write-up section for more details).
There're no hidden tests in project 2. Once you submit your
code on Autolab, it will run the tests on its VM and give you a
list of scores for individual test cases. As some of the tests
may timeout on the VM due to its unstable performance, we
will run an offline test on your final submission after the
deadline. You final score for each test case will be the higher
one between the last submission on Autolab and the offline
test. For example, if your code passes all the tests except
BasicTestEvictionPolicy.TestMemoryLimit
on Autolab
(which may randomly fail due to timeout from time to time)
but passes it in the offline test, you will still get the full
score for the coding component.
A few useful hints:
tests/
directory.
Figure 1 shows the storage layer of Taco-DB We have provided the implementation of file manager which manages disk space in a set of data files on the file system. You don't have to modify any code of the file manager but you should read Section 1.1 in order to know the APIs. The goal of project 2 is to implement the buffer manager (see Section 1.2). There is also an excercise in order to help you understand the record layout in DBMS better: to read the code of the record layout and Schema object implementation in Taco-DB, and answer a few questions in your write-up.
Source files to modify: none.
Source files to read:
include/storage/FileManager.h
src/storage/FileManager.cpp
You don't have to implement anything in the file manager. You only need to skim through the file manager API at this stage. You may always revisit this section and the implementation when it's needed.
Taco-DB manages its data in a flat main data storage space
opprganized as fixed-size pages (see Figure 2), through the
FileManager
interface defined in
include/storage/FileManager.h
. The size of each page
is fixed to PAGE_SIZE
= 4096 bytes. The page
numbers are represented as the unsigned 32-bit integer type
PageNumber
, where page number
0 (INVALID_PID)
is an invalid page number reserved for file manager use.
Recall that you have implemented an interface
FSFile
for file I/O over the file system. The
file manager breaks up the entire space into up to 256 files
stored on your file system (usually in a
tmpd.XXXXXX/main
when you're running tests).
There is no need to worry about every detail of the
file manager, but it is useful to understand how the file manager
manages the page for debugging purpose.
Note: the tmpd.XXXX
directory is removed upon normal exit
of a test case so that your quota on the student servers will
not be depleted very quickly. However, it may remain in
build.Debug/tmp
(or build.Release/tmp
if
you're building the code in release mode) if the program exits
abnormally (e.g., SEGFAULT, stopped during debugging in
VSCode/GDB, ...). If you're running out of disk quota (e.g.,
FSFile complains about out of disk space, or tests fail due to
failure to open files), try to remove everything in this
directory first (DO NOT remove the tmp directory, or the tests
will not run properly). The build script will also remove
data directories created more than 1 day ago when you build your code.
Later in the second week, all the temporary data will be
automatically placed into a local directory
/local/Spring_2023/your-ubit-name/cse562_tmps
instead so that you will less likely run out of the quota.
There is a single global instance of file manager
g_fileman
declared in
include/dbmain/Database.h
.
Note: the FileManager also manages the temporary data files, but each temporary file has its own page number space that is independent of the main data storage space. They are not used in this project so we will cover the details in later projects that use temporary files. Within the context of this project, a file page refers to a page in the main data storage space.
The file manager further divides the entire main data storage
space as (virtual) files (see Figure 3). Each file is simply a
linked list of pages in the main data storage space, which may
be created and opened through the file manager. An open virtual file
is returned as a object of File
class, and you may
find the first page, the last page, or extend the file through
it. Each data page has a fixed-size
header PageHeaderData
at the beginning that is
managed by the file manager and stores the next and previous
page number and some other information internal to the file
manager. When you access a page in Taco-DB, you must not
modify page header.
Question 1: How to create/open a virtual file through the global
file manager instance g_fileman
? How to get the file ID
of an open virtual file? Fill each blank in the following
snippet with a C++ expression to answer this question. In your write-up submission, you only
need to write the three C++ expressions in the order they appear below.
// create a new virtual file and get its file ID
std::unique_ptr<File> f =
FileID fid =
f->Close();
// reopen the file
f =
Source files to modify:
While one can read/write pages directly through
FileManager::ReadPage()
and
FileManager::WritePage()
on demand, it is not very
efficient as each read/write incurs a disk I/O. As we covered
in the lecture, DBMS utilizes a page buffer manager to keep a
set of pages in the memory in order to reduce I/O requests and
improve the data access throughput and latency. In Taco-DB, the
BufferManager
is built on top of the
FileManager
. It handles all the page access
requests from other system components and maintains a buffer of
disk pages in a fixed number of buffer frames, by issuing I/O
calls on the global file manager instance
g_fileman
.
There is a single global instance of buffer manager
g_bufman
declared in
include/dbmain/Database.h
except in buffer manager
tests.
Figure 4 is a sketch of how BufferManager works. It
allocates a number of consecutive pages in the memory indexed
by an unsigned integer type BufferId
from 0, with
INVALID_BUFID
defined as
std::numeric_limits
. The buffer
frames must be aligned to 512-byte boundaries
(using aligned_alloc()
), because the FileManager
uses O_DIRECT I/Os over the file system. For each buffer
frame, the buffer manager maintains some meta information in a
separate structure, including the current pin count, a dirty
bit, the page number its currently holding, as well as
necessary information needed by the eviction policy. In
addition, the buffer frame should maintain a hash table that
maps page numbers to the buffer ids to speed up lookups.
Question 2: Since the pages are part of the virtual
files, do we need to store the file IDs of the pages in the
buffer meta information? Please explain why using a
line of code that reads a page with page number
pid
in the file with file ID fid
into
a page buffer pointed by char *pagebuf
from the
file manager g_fileman
. In your write-up
submission, you only need to answer yes/no and show the line of
code. (Hint: if you answer no, then fid
should not
appear in this line of code.)
Task 1: Add the necesssary class members for the data structures described above to BufferManager and implement:
BufferManager::BufferManager()
BufferManager::~BufferManager()
BufferManager::Init()
BufferManager::Destroy()
BufferManager::MarkDirty()
BufferManager::GetPageNumber()
BufferManager::GetBuffer()
Your code should still pass
BasicTestBufferManager.TestNormalInitAndDestroy
at
this point (it passes before you modify any source code).
Upon a PinPage()
call, the buffer manager should
look up the page number in the hash table. If it is found in
some buffer frame, it increments the pin count and returns its
buffer ID. Otherwise, the buffer manager selects an unpinned
buffer frame using some eviction policy (e.g., LRU, MRU, Clock,
etc., but don't use random eviction policy!) and writes back
the page currently in the selected frame if it holds a dirty
page. Once the frame is clean, you may read
the requested page into the selected buffer frame.
It is then returned to the caller with its pin count
incremented. Note that a buffer frame may have a pin count
greater than 1 (think: why and when?). An UnPinPage() call
simply decrements the specified buffer frame's pin count.
Question 3: When a buffer manager is initially created,
all buffer frames are free and may be used to handle a request
for a new page to be pinned. Since we only have a limited
number of buffer frames, all buffer frames will be occupied by
some page as long as there is sufficient number of pin page
requests on distinct pages. Could you determine whether there
is a free frame in O(1)
time using the hash table
for looking up buffer IDs by page numbers, assuming it is an
absl::flat_hash_map
or
an std::unordered_map
?
You may denote the hash map as h
.
Write a single C++ expression for this question.
Question 4: Can you design a function that returns the
buffer ID of a free buffer frame in O(1)
time
using the hash table h
in the previous question,
assuming the function is only called when there is at least one
free buffer frame? Write down the code of this function and
briefly explain why it is correct.
Task 2: Implement BufferManager::PinPage()
and BufferManager::UnpinPage()
without a working
eviction policy.
Your code is likely to pass the following tests at this point:
BasicTestBufferManager.TestPinPageWithoutEviction
If there are a sufficient number of pin page requests over
distinct pages, we will eventually run out of free frames. In
that case, the current PinPage()
implementation
will not work unless we implement an eviction policy for
choosing a victim page to evict from the buffer pool. We have
discussed several basic eviction policies (LRU, MRU, Clock) in
the lecture, you may choose any of them to implement.
Question 5: What will be the size of the hash table for mapping page numbers to buffer IDs eventually, given a sequence of pin page requests over a sufficient number of distinct pages that is more than the number of buffer frames?
Task 3: Implement a reasonably good eviction policy of your choice (e.g., LRU, MRU, Clock, etc.).
Your code is likely to pass the following tests at this point. Note that these tests do not test the efficiency of your eviction policy yet.
BasicTestBufferManager.TestPinPageWithEviction
BasicTestBufferManager.TestMultiplePins
BasicTestBufferManager.TestNoAvailableBufferFrame
BasicTestBufferManager.TestUnpinPageNotPinned
BasicTestBufferManager.TestPinInvalidPid
The buffer manager in Taco-DB does not flush pages back unless
the page is evicted or the database shuts down, as opposed to
in a more sophisticated DBMS where the buffer pool may flush
pages from time to time for performance and recovery
consideration.
In Taco-DB, there is a BufferManager::Flush()
function that flushes all the dirty pages back to the disk. It
should only be called during database shutdown from
BufferManager::Destroy()
. When that happens, all
the buffer frames (whether it is clean or dirty) should have
zero pin counts. You should throw a fatal error if some pin
count is not zero, because non-zero pin count indicates bugs in
other components.
Task 4: Implement BufferManager::Flush()
and
add a call to it in BufferManager::Destroy()
.
Your code should pass all the tests in
BasicTestBufferManager
and
BasicTestEvictionPolicy
at this point. The two
tests in BasicTestEvictionPolicy
checks if your
eviction policy is working and reasonably good (i.e., not
random eviction). They take a bit longer to run (tens of
seconds in debug build), so their timeout are set to higher
values on Autolab and in the offline tests. If it runs for more
than 1 minute in the debug build, there is a high chance that
your implementation is not efficient enough to pass the tests.
Source files to modify: none.
Source files to read:
include/catalog/Schema.h
src/catalog/Schema.cpp
include/base/datum.h
In this Section, we will explore how fields are laid out given a particular schema in Taco-DB. This is an implementation of one possible record layout discussed in the lectures. You do not have to modify the code but you will need to read the code to answer the questions in your project write-up.
Before we talk about the Schema
objects and record
layout, let's have a brief overview of data types and data
representation in Taco-DB. You might be quite familiar with
what data types are in common programming languages (e.g.,
int, float, std::string, ...
in C++) -- a
data type
usually defines a set of possible values, a set of allowed
operations as well as the representation of these values in
terms of low-level primitives such as bits, bytes, or words,
etc. Since we are implementing an RDBMS, it has a different
type system (of SQL) than the language it is implemented in
(i.e., C++), an important distinction to make.
Each SQL type (e.g., INT4, FLOAT, VARCHAR(256)
)
has its
own valid set of values and operations permitted. A datum of a
SQL type is always stored as a fixed-length or variable-length
object (a sequence of bytes in memory or on disk) in Taco-DB
without the type information attached. For instance, an
INT4
is stored as a 4-byte object with its initial
address aligned (cf. Data Alignment in Records of Lecture 4) to 4-byte
memory address/file offset boundaries. There are a set of
functions and operators that can be applied to
INT4
, mostly defined in
src/utils/typsupp/int4.cpp
. It may be conveniently
stored in an int32_t
in C++ for simplifying the
implementation. Another example is
VARCHAR(n)
, a variable-length type, which is
stored as an object with the same length as the string itself
of up to n
characters with no special alignment
(i.e., alignment = 1). Most of its functions and operators are
defined in src/utils/typsupp/varchar.cpp
. In C++,
we pass around a pointer (char*
) to the beginning
of a VARCHAR(n)
object when it's in memory. You may
wonder how we can determine the length of a
VARCHAR
string since we don't store a length with
the object or a null-terminator ('\0'
as a C
string). The short answer is the length is encoded in the
record layout and you will find out how once you finish reading
this section.
Because the database has to operate on data with dynamic types
specified during runtime, the system must know the type
information of a datum during run time in order to operate on
it. Each type has its own requirements and allowed operators
and functions stored in a database catalog (a set of special
tables internal to the system), and we use a unique Type
ID
to denote each type. Whenever we process a table in a
database, we obtain a Schema
object, which stores the
type IDs for each field. That is one of the two main purposes of
the Schema
objects -- to keep track of the types of
data.
Question 6: Why don't we store the type information
along with data? Hint: to answer this question, please show the
answers for the following questions and briefly explain why for
Q6: 1) What is the least amount of space in KiloBytes
is needed for storing a table A(x: INT4, y:
INT4)
10,000 rows if we store a Type ID
along with each datum in the table. 2) What is the least amount of
space if we store one Type ID
per field instead
(outside the table data storage)?
Since different data types have different representation and
types are dynamic during run time, we use a Datum
object (see include/base/datum.h
) to store an
object of any type in memory (as a sequence of bytes).
You may think of it as a union
of values of
different types (e.g., integers, floats, strings), but
internally it has differnt ways of storing data. In general,
a fixed-length object of length 1, 2, 4 or 8 bytes is stored
inline in a Datum object (e.g., int32_t
,
int64_t
), while a variable-length object (e.g.,
strings) or a fixed-length object with other
number of bytes is stored in a buffer and Datum keeps a pointer
(char*
, which can be cast to any other pointers
to specific type structures).
Depending on where this buffer is located (on buffer
page or in some malloc-allocated memory), the buffer may be owned or
not owned by the Datum. Note that Datum
does not have
a copy constructor or a copy assignment operator because of
that (more on this in project 5). To reference a
Datum
, you may either use const
Datum&
or the DatumRef
class. When an
object is stored on a data page, it is always stored as a flat
array of bytes without pointers or references. Note that a
fixed-length object may be stored as a variable-length
datum (e.g., CHAR(5)
), which should
be accessed using Datum::GetVarlenBytes()
,
Datum::GetVarlenSize()
or
Datum::GetVarlenAsStringView()
rather than
Datum::GetFixedlenBytes()
.
On the other hand, Taco-DB relies on the type catalog
information to interpret the values. The type information is looked up from the
catalog table Type. These types are SQL types rather than C++ types, and they
can be used to find information about how to manipute these objects. There are
a few important fields of the Type catalog table
(<builddir>/generated_source/catalog/systables/Type.h
-- you must have buit
the code to see this file) exlained
below:
typid
: each type has a unique object id (Oid) in the systemtyplen
: the length of a type if it is fixed-length and has no type parameter that may determine its length. If this is a fixed-length record and also has a type parameter that determines its length, this is undefined. If this is a variable-length type, this is always -1.typisvarlen
: whether this type is variable-length.typalign
: a value of this type must be aligned to typalign
byte memory address or within a data page. This is 1, 2, 4, or 8.typlenfunc
: a function id such that the function accepts a type parameter (INT8
) and returns its length if there's one. This is only valid when the type is fixed-length but its length is determined by its type parameter.
A built-in function is some function implemented in C++ that
accepts a list of Datum
of certain SQL types and
returns a Datum of a certain SQL type. These built-in functions
are managed by a centralized function catalog, and can be
looked up/called dynamically through FunctionCall
method. For instance, the typlenfunc
for a
predefined SQL type is a built-in function that accepts a
UINT8
Datum and returns a INT2
length. It calculates the binary length of this given type
based on the type parameter. You may find the details of
built-in functions in include/base/fmgr.h
and
include/utils/builtin_funcs.h
. As an example, for
the fixed-length string CHAR(n)
, its corresponding
typlenfunc
will return n
because we
always store it as an n
-byte object.
Finally, you are ready for understanding how the
Schema
object computes record layout, the second
main purpose of it. The record layout is briefly described in
the documentation comment block of the class in
include/catalog/Schema.h
, and the implementation
is in Schema::ComputeLayoutImpl()
,
Schema::WritePayloadToBufferImpl()
,
Schema::GetOffsetAndLength()
. We will leave this
as an exercise to you.
Hint: if a non-negative integer x
is of some
integral type T
(e.g., T = int
) and
d
is an non-negative integer, then the following
are always true as long as the result does not overflow:
x << d == x * (T) pow(2,d)
x >> d == x / (T) pow(2,d)
x & ((1 << d) - 1) == x % (1 << d)
Question 7: How would you get a field as a
Datum
from a record in memory, given a pointer
char *rec
to it, its corresponding schema object
Schame *sch
and its field ID FieldId
fid
? Write down a single C++ expression for this
question.
Question 8: Assuming the Datum
you get in
Q7 has the type of INT4
, how would you
get its value as an int32_t
in C++? Write down a single
C++ expression for this question using rec
, sch
and fid
in Q7.
(note: this will be useful for debugging code in later projects)
Consider the following table schema defined in SQL for questions 9 - 15.
CREATE TABLE A(
f1 CHAR(2),
f2 VARCHAR(100) NOT NULL,
f3 CHAR(5) NOT NULL,
f4 INT2 NOT NULL,
f5 VARCHAR(2),
f6 INT4
);
Question 9: For a record in table A
that
does not have any NULL fields, what is the order of all the
fields sorted by their offsets?
For questions 10 - 15, let t1
be a record in the
table with the following values: (f1 = 'xy', f2 = '1',
f3 = 'abcde', f4 = 78, f5 = NULL, f6 = 201)
Question 10: What is the offset of field f1
in record t1
? Is it always at the same offset for
any record in table A
?
Question 11: What is the offset of field f2
in record t1
? Is it always at the same offset for
any record in table A
?
Question 12: What is the offset of field f3
in record t1
? Is it always at the same offset for
any record in table A
?
Question 13: What is the offset of field f4
in record t1
? Is it always at the same offset for
any record in table A
?
Question 14: What is the offset of field f6
in record t1
? Is it always at the same offset for
any record in table A
?
Question 15: What is the total length of record
t1
? Is the total length always the same as the
offset of the end of the last field (i.e., the offset of the
last field + the length of the last field) you listed in
Q9, for any record in table A
?
Each individual student needs to complete a write-up independently and submit it to UBLearns with answers to Questions 1 - 15. Please find the due date of the write-up at the beginning of this web page and submit a PDF document to UBLearns by the deadline. Please do not submit any other formats such as .docx, .doc, .txt -- we will not grade any non-PDF submission.