CSE 462/562: Database Systems (Spring 2022)

Project 3: B-Tree Index

Assigned: Thursday, 3/10/2022
Project due: Sunday, 4/10/2022, 11:59 pm EST (extended)
Write-up due:Sunday, 4/10/2022, 11:59 pm EST (extended)
Last updated: 4/3/2022

0. Getting Started

This project will be built upon your code for project 2. To get started, download lab3.tar.xz from Autolab and import the supplemental files as your did in project 2. The code should build without compilation errors once the supplemental files are imported, but most of the additional tests are likely to fail. Note: some of past test cases may also fail because we enabled new features that you haven't implemented in your system (e.g., BasicTestRepoCompilesAndRuns.TestShouldAlwaysSucceed). Once you complete a significant portion of the tasks below, these case should pass automatically. You may list all the tests in your build directory using the ctest -N command. As before, you may attempt any number of code submissions at Autolab, project 3 (the same hourly submission rate limit applies as before). To do so, please tag or create a new branch of the commit you want to submit, and provide the tag or branch name in Autolab.

A few useful hints:

  1. We strongly encourage you to read through the entire document to get some sense of tasks to do in this project. Note that some of the tasks involve the source files distributed to you in project 2. The project requires you to understand the B-Tree data structure in general and understand the specific design and requirement of the indexing layer in Taco-DB, so you might want to start early. You might also find it useful to review the lectures slides for indexing and tree-based index. Please also consider joining the office hours if you are not sure about how to proceed or have any questions about the tasks.
  2. This lab will heavily rely on your implementation from project 2. In particular, we provide preloaded B-Tree indexes in some of the test cases, where the B-Tree pages are encoded using the VarlenDataPage. Hence, you need to make sure your implementation of VarlenDataPage conforms to the specification. If you did not receive full points for one or more components in project 2 or you have trouble finding bugs in your project 2 implementation, please join the office hour to find the best way to address that. You might want to identify a few questions or a few places that you think might be buggy before joining the office hour. We will try to help you find a solution if it's minor, or provide you the solution for the implementation that significantly deviates from the specification.
  3. The description below is meant to be a brief overview of the tasks. Always refer to the special document blocks in the headers and source files for a more detailed specification of how the functions should behave. While we try to provide as many hints as possible to alleviate your burden of reading code in the provided code, there might always be some declarations or definitions you need to look for in the code docs (in the navbar of the course page), or in the source code.
  4. While you may add any definitions/declarations to the classes or functions you're implementing, it is unlikely you'll need to do that in project 3. It's likely that your design is incorrect if you do, but in case you need to add any definitions/declarations, you should not change the signatures of any public and private fucntions. Also do not edit those source files not listed in the source files to modify below, as they will be replaced during tests.
  5. You do not have to consider concurrency control for multi-threading in your implementation.
  6. We provide a list of tasks throughout this document, which is a suggestion of the order of implementation such that your code may pass a few more tests after you complete a task. It, however, is not the only possible order and you may also find alternative order that makes more sense to you. It is also possible that your code still could not pass certain test that is supposed to because of some unexpected dependencies on later tasks, in which case you might want to refer to the test code implementation in the tests/ directory.
  7. If you're working in a team of two, you might want to first go through the list of tasks, agree on a big picture of design, and divide the implementation tasks into indepedent pieces. One strategy is to implement tasks that may be tested independently. Alternatively, you could also design the algorithm and data structures together, and identify independent components to complete. You may also come up with better ways of collaboration that makes more sense to you. In any case, the grade for the code implementation will be based the test result of your group's final submission.

1. Project Overview

In this project, you will build a B-Tree index as one of the index access methods in Taco-DB. More specifically, your goal is to understand the structure of the B-Tree and implement the core pieces of insert/delete/search/scan functions. This B-Tree supports variable-length keys, thus does not have a stable fanout. Splitting and merging are based on the actual load of internal and leaf pages rather than the number of entries. To simplify our implementation and maximize code reuse, TacoDB’s B-Tree utilizes VarlenDataPage to organize all its nodes. In the lab handout, we will find basic tests for various B-Tree utility functions and some small-scale end-to-end tests. During online and offline grading, there will be larger-scale system tests (which are hidden to you) to further evaluate the correctness and efficiency of your implementation. Note that, some of the system tests are not available on Autolab and will be only tested during the offline tests, because they are too large to run on the Autolab VMs. Your grade on those test cases will be posted on UBLearns after the final submission deadline once we finish the offline tests.

1.1 Index Interface

Source files to read:

Source files to modify: none

As a general and important interface in an RDBMS, Index is the base class of all different types of indexes. Its basic public interface is defined in Index.h. The metadata information, including its base table ID, index type, virtual file FID, etc., is stored in the catalog and represented as IndexDesc in memory. The static function Index::Initialize() will do some essential checks and initialize basic physical storage layout on the virtual file specified in IndexDesc allocated for the index, while the static function Index::Create() will initialize an Index class instance so that other components of DBMS can make calls on it.

An index should override public interfaces for inserting/deleting a data entry and scanning through all the data entries (an data entry is also called an indexed item in Taco-DB). The data entries in all the indexes in TacoDB are organized as Alternative 2, where each index entry is a (key, rid) pair. Data entries and index entries are collectively called index records in Taco-DB. Note that rid is unique for each record. Thus, all data entries are unique even though keys in the records are not. Index::StartScan() will initialize an iterator that will scan through all data entries falling into a particular key range. When scanning through the iterator, you need to ensure that each data entry will be touched EXACTLY ONCE. Besides, since all indexes of the current version of TacoDB are range indexes, your index iterators should scan through data entries in the ascending order of their keys.

There is a flag in the catalog information for indexes to indicate whether this index is a unique index. Note that the indexes still have unique data entries as we discussed in the previous paragraph even if the index is not unique (because duplicate keys belong to different records with different record IDs). If an index is non-unique, you should reject an insertion only if you are inserting a duplicate data entry with the same key and same record ID. However, if an index is unique, you should reject an insertion if there exists some data entry in the index with the same key but different record id. For instance, you cannot insert a data entry (5, (5, 1)) into a unique index with another data entry (5, (5, 2)) even if these two data entries are not considered as equal in the indexes.

We provided an volatile (in-memory) tree index implemented using std::multimap in index/volatiletree as a reference implementation. If you are unsure about what the specification of a specific interface means, you may always refer to the volatile tree implementation. Any sequence of insert/delete/update/scan in your B-Tree implementation should produce the same result in the eye of the user (i.e., the same set of data entries are in the index in the same order, they will generate the same error if there's any, an index scan will produce the same set of index key and record ID pairs). However, you cannot use any STL container or Abseil container to implement your B-Tree.

We want to make a quick note about the difference between IndexKey and the key payload of a index range. IndexKey is an in-memory structure that stores an array of references to in-memory Datum objects, which are primitive C/C++ values or pointers to strings. They can be directly used in your program through the appropriate Datum::GetXXX() functions based on the run-time type information you have. A key payload in an index record is the keys of this key stored in a serialized record format (as defined by the Schema class). To use a value in the record, you must use Schema::GetField()/Schema::DissemblePayload() or other related functions to extract the field into a Datum. In this project, you don't have to worry about extracting individual fields from a record to form an index key, as the code of that is provided. However, it is important that you understand the distinctions so that you will not try to copy IndexKey into an index record -- you need to use the key schema to serialize it into a record payload (using Schema::WritePayloadToBuffer with appropriate arguments constructed from the index key).

1.2 B-Tree Structure

Source file to READ:

Source files to modify:

The physical structure of the B-Tree in TacoDB is as explained during our lectures, which contains internal nodes and leaf nodes. All the structures of a B-Tree live in a virtual File managed by FileManager, whose first page is a meta page BTreeMetaPageData containing a page number pointing to the root. Note the root node of a B-Tree can be either a leaf or internal node, depending on whether there is enough space for a leaf page to fit all the data entries.

B-tree page format

In this project, you will need to rely on your VarlenDataPage implementation in project 2 to construct both leaf and internal nodes. A B-Tree page is encoded as a VarlenDataPage with a specific user data stored in the user data area, specific requirements on the slots (see Task 2). The figure above shows the physical layout of the page representing a B-Tree node. You can see that its first 24 bytes are exactly the same as those of VarlenDataPage for general heap file data. The only differences are: (1) Index entries for leaf nodes and separator keys with links to corresponding children for internal nodes are treated as index records in this specialized VarlenDataPage. (2) B-Tree nodes store their specialized meta information in the user-defined data region. You can reserve enough space for that region through passing usr_data_sz when calling VarlenDataPage::Initialize. You can find all the fields of that region in struct BTreePageHeaderData defined in include/index/btree/BTreeInternal.h. Specifically, it includes flags indicating if this is the root node and if this is a leaf node, the pointers to its two neighbor siblings on the same level, and the total number of bytes used by index records in the page.

Task 1: Implement functions for laying out B-Tree pages and some other basic utilities:

Your code should allocate a new page and pin it after BTree::CreateNewBTreePage() is called, then return the location of pinned buffer frame. BTree::GetBTreeMetaPage() should pin the meta page of this B-Tree and return the pinned frame location. BTree::CreateLeafRecord() and BTree::CreateInternalRecord() are used for creating index records in B-Tree pages. In particular, index entries in internal nodes should be contiguous bytes containing BTreeInternalRecordHeaderData and the index key (strictly in this order with alignment). You can see the m_heap_rec_id is also the part of BTreeInternalRecordHeaderData as (key, rec_id) will always be unique even if this is not a unique index. The first record in an internal node should only contain BTreeInternalRecordHeaderData since its starting key is stored in its parent. The key of all records stored in the subtree between two neighboring keys k1 and k2 of an internal node falls in a left-closed right-open range [k1, k2). The leaf node is much simpler, where each data entry is contiguous bytes containing BTreeLeafRecordHeaderData (which is effectively just a RID) and the index key (in this order with alignment).

A B-Tree page, as a VarlenDataPage, must not have any unoccupied and valid slot. In other words, the valid SID on a B-Tree page must be in a consecutive range [GetMinSlotId(), GetMaxSlotId()] (if there's any slot). A B-Tree page may not be empty unless the entire tree is empty. In addition, the keys on a B-Tree page must be in a (ascending) sorted order. If you need to insert or remove an index record from an internal or a leaf B-tree node, you should do so by specifying the correct insert/delete slot ID in the VarlenDataPage::InsertRecordAt() and VarlenDataPage::RemoveSlot functions so that the keys are still sorted after the insertion/deletion. You should not use the VarlenDataPage::InsertRecord() and VarlenDataPage::EraseRecord() functions you implemented in project 2 on any B-Tree page. Updating an index record with VarlenDataPage::UpdateRecord() is only safe when you are sure the updated index record will fit into the page. In addition, you should also implement a utility function VarlenDataPage::ComputeFreeSpace() to compute the page usage given the total number of slots and the total length of record payloads so that you can safely implement B-Tree page split, merge, rebalance in the later tasks.

Task 2: Implement additional utility functions for variable-length data page. These utility functions are useful for maintaing a data page with sorted (index) records, especially in handling B-Tree insertion and deletion. You may find the specification for these functions in the header file include/storage/VarlenDataPage.h.

Your code is likely to pass the following tests at this point:

1.3 B-Tree Search

Source files to modify:

To compare two data entry or index entry, you should compare not only the index key but also the record ID. This is because we want to make sure all index records are unique to each other to enforce a total order in the index. You can compare two index keys by calling TupleCompare(key1, key2, key_schema, lt_func, eq_func) (Note: don't paste this line verbatim and you need to replace the arguments with the ones that make sense in the scope where you call the function!). You should compare on record ID further if the index key comparison turns out to be equal. Note that an invalid record id is assumed to be smaller than any record id (including any invalid record id and there are many different possible invalid record ids). Different from SQL, NULL values are considered to be smaller than any non-NULL values in index and NULL values compare equal in index, because we need to index NULL values as well. Further, we will enforce dictionary order for partial key matches, in B-Tree. This means a key like (1, 2) is always greater than its prefix (1, phi) (where phi means unspecified). You can compare a full key with its prefix using TupleCompare, but it will turn out to be equal. So you have to resolve that in BTree::BTreeTupleCompare() as well.

You should write a simple binary search to look up the key in a B-Tree Node. The second overload of BTree::FindLeafPage() is the core recursion logic of traversing the tree. Please be careful about pinning and unpinning the pages during the traversal. At any time during a search, there should be ONLY ONE B-Tree node pinned in the buffer.

Task 3: Implement the utility function to compare two keys and complete the recursive B-Tree point lookup logic:

Your code is likely to pass the following tests at this point:

You should use the sibling pointers on the leaf node to iterate through data entries in the sorting order. You should also check if the scan is completed (either you are running out of index entries or the next index entry falls out of query range).

Task 4: Finish up the B-Tree iterator scan logic:

You should use the sibling pointers on the leaf node to iterate through index entries in order. You should also check if the scan is completed (either you are running out of index entries or the next index entry falls out of query range).

Your code is likely to pass the following tests at this point:

1.4 B-Tree Insertion

Source files to modify:

You may reuse the B-Tree search code to locate the correct leaf page to perform the insertion. In addition, you should remember the path down by keeping track of which B-Tree nodes (a.k.a B-Tree page ID) you have touched and which slot number you took during the traversal. This information is useful in later split procedures.

The uniqueness check of the data entry for a non-unique index or key for a unique index happens in BTree::FindInsertionSlotIdOnLeaf(). You should return an appropriate slot ID if a duplicate is found. A small nuance to note here is that we allow duplicate NULL values to appear in a unique index, which is SQL-compliant but differs from the practice of serveral major DBMS. (Think about why? Hint: unique indexes are used to enforce UNIQUE constraints.)

Task 5: Implement the functions to find the correct slot on a leaf node for insertion, and insert an index record on a B-Tree page. You may just implement these functions without the logic for recursively splitting the page initially and do that in task 6 instead.

Your code is likely to pass the following tests at this point:

You should also add the logic for splitting a page in BTree:InsertRecordOnPage() when there is no space left to insert a new index record. Consider a local case of splitting a node into two. There will be two nodes originally in the tree being affected, namely the node you are splitting and its parent. During this procedure, BTree::SplitPage() will trigger another BTree::InsertRecordOnPage() at the parent node. This procedure can trigger another split on the parent node, thus can be recursive and go all the way to the root. You will need to handle this special case of root node splitting with BTree::CreateNewRoot().

Again, during this recursive procedure, you should be very careful about pinning and unpinning B-Tree pages. At any time during insertion, there should be AT MOST FOUR pages pinned at the same time. (Think about which four pages?)

Task 6: Implement node splitting and complete BTree::InsertRecordOnPage() by adding triggers for recursive splitting:

1.5 B-Tree Deletion

Source files to modify:

When we locate the record for deletion in the leaf level, we should also remember the traversal path the same as what we did in insertion. Note that, when we are searching for a leaf page to locate a key to delete without specifying a valid record ID, it might end up on a leaf page that is to the left of any matching keys, because an invalid record id is treated as smaller than any valid record id. In this case, you may have to move to the right page (m_next_page) to find a matching key for deletion (if there is any, it must be in the first slot of the right page). One tricky issue is that the parent of the original leaf page may be different from the leaf page on the right. If that happens, we ask you to simply add one to the slot ID of the last PathItem of the search path in the BTree::FindDeletionSlotIdOnLeaf(), because BTree::HandleMinPageUsage() will handle the logic of moving to the right to find the correct parent.

Task 7: Implement the function to find the location of the data entry to delete and the function for erasing an index record on a B-Tree page:

Your code is likely to pass the following tests at this point:

Merging should happen when a B-Tree node is under-utilized (it has free space of more than 60% of the page size), which is triggered by BTree::HandleMinPageUsage() called in BTree::DeleteKey(). Since node rebalancing is assigned as a bonus, you don’t have to implement it to get the full score. The test that is not part of the bonus project will use a flag to disable rebalancing so that your code on the rebalancing will not be called there.

Node merging involves two sibling pages next to each other that share the same parent node, and you should also delete an index record on the parent. This may recursively trigger another BTree::MergePages() on the parent level, which is already handled recursively in BTree::HandleMinPageUsage(). The merge may go all the way up to the root, and eventually change the root node PID stored in B-Tree meta page. You don’t have to deal with this either, since we have already handled it for you in BTree::HandleMinPageUsage(). During this recursive deletion and merging process, there should be AT MOST FOUR pages pinned at the same time. (which four pages?)

Task 8: Implement node merging and complete BTree::DeleteSlotOnPage():

Your code is likely to pass the following tests at this point:

RebalancePages() is called when we have an under-utilized page, but it can’t be merged with a sibling page because its page usage will exceed the page size if they are merged. RebalancePages() should try to move records from the page with a higher page usage to the one with a lower page usage, such that both have a page usage no less than BTREE_MIN_PAGE_USAGE (Update:) the resulting pages have a smaller difference of page usages. There are usually many choices of how many tuples to move. Different from SplitPage() which can always succeed, RebalancePages() may fail because the new separator key after the rebalancing may not fit into the parent page (as it might be longer than the old separator key). You should not make a choice such that the new separator key cannot fit into the parent page. Out of all valid choices, you should choose the one that minimizes the page usage difference between the two pages. If there’s no valid choice, you should not change either of the pages and should return false from RebalancePages().

During this rebalancing process, there should be AT MOST THREE pages pinned at the same time. (Which three pages?)

You should also implement VarlenDataPage::ShiftSlots() to help you preallocate invalid slots at the start of the slot array or remove a batch of slots at the start of the slot array. Note if ShiftSlots() is called on an empty page with no slots with n and truncate == false, it is expected to reserve n unoccupied slots on the page although that will temporarily violate the constraint that the left-most slot on the page must be occupied. This is only allowed in this very specific corner case. It is expected that the caller will immediately fill in these slots by calling InsertRecordAt() (and possibly GetMaxSlotId()) before calling any other functions on the page. That said, this corner case never appears in our test cases and should not appear in your implementation of B-Tree rebalancing, because no B-Tree page may ever become completely empty during rebalancing. As a result, whether your code passes the bonus tests does not depend on the correct handling of the corner case.

Task 9: (Bonus, optional): Implement node rebalancing and optimize BTree::DeleteSlotOnPage() with rebalancing logic

Your code is likely to pass the following tests at this point:

2. Write-up requirement

Each individual student needs to submit an independent write-up to UBLearns for the project design, division of implementation responsibility (no more than 1 page in letter size for design and coding responsibility). Please find the due date of the write-up at the beginning of this web page and submit a PDF document to UBLearns by the deadline. Please find a more detailed description of the requirements and a sample for the write-up on UBLearns.