taco-db  0.1.0
Classes | Public Member Functions | Static Public Member Functions | Private Member Functions | Private Attributes | Friends | List of all members
taco::BTree Class Reference

A B-tree stored in the persistent files. More...

#include <index/btree/BTree.h>

Inheritance diagram for taco::BTree:
taco::Index

Classes

class  Iterator
 

Public Member Functions

virtual ~BTree ()
 
void BulkLoad (BulkLoadIterator &iter) override
 Loads all the (key, record id) pairs provided by the iterator iter into an empty index. More...
 
bool InsertKey (const IndexKey *key, RecordId recid) override
 Inserts the (key, rid) pair into the index. More...
 
bool DeleteKey (const IndexKey *key, RecordId &recid) override
 Deletes an arbitrary data entry with matching key if rid is invalid. More...
 
std::unique_ptr< Index::IteratorStartScan (const IndexKey *lower, bool lower_isstrict, const IndexKey *upper, bool upper_isstrict) override
 Returns a forward-iterator for all the indexed items in the specified range defined by the arguments. More...
 
bool IsEmpty ()
 Returns whether this tree is empty. More...
 
uint32_t GetTreeHeight ()
 Returns the tree height. More...
 
- Public Member Functions inherited from taco::Index
virtual ~Index ()
 Index object destructor. More...
 
const IndexDescGetIndexDesc () const
 Returns the index descriptor of this index. More...
 
const SchemaGetKeySchema () const
 Returns the key schema of this index. More...
 
bool InsertRecord (const Record &rec, const Schema *tabschema=nullptr)
 Inserts the (key, rid) extracted from the table record into the index. More...
 
bool DeleteRecord (Record &rec, const Schema *tabschema=nullptr)
 Deletes the (key, rid) extracted from the table record from the index. More...
 

Static Public Member Functions

static void Initialize (const IndexDesc *idxdesc)
 
static std::unique_ptr< BTreeCreate (std::shared_ptr< const IndexDesc > idxdesc)
 
- Static Public Member Functions inherited from taco::Index
static void Initialize (const IndexDesc *idxdesc)
 Initializes an index described by an index descriptor. More...
 
static std::unique_ptr< IndexCreate (std::shared_ptr< const IndexDesc > idxdesc)
 Creates an index object over an index file described by the index descriptor, which has already been initialized. More...
 

Private Member Functions

 BTree (std::shared_ptr< const IndexDesc > idxdesc, std::unique_ptr< File > file, std::vector< FunctionInfo > lt_funcs, std::vector< FunctionInfo > eq_funcs)
 
BufferId CreateNewBTreePage (bool isroot, bool isleaf, PageNumber prev_pid, PageNumber next_pid)
 Allocates and initializes a new BTree internal or leaf page. More...
 
BufferId GetBTreeMetaPage ()
 Pins the BTree meta page and returns the buffer ID of the buffer frame where it is pinned. More...
 
void CreateLeafRecord (const IndexKey *key, const RecordId &recid, maxaligned_char_buf &recbuf)
 Creates a new leaf record with the given key and heap \recid in recbuf. More...
 
void CreateInternalRecord (const Record &child_recbuf, PageNumber child_pid, bool child_isleaf, maxaligned_char_buf &recbuf)
 Creates a new internal record with separator key and heap record id from child_recbuf, and the child page number child_pid in recbuf. More...
 
int BTreeTupleCompare (const IndexKey *key, const RecordId &recid, const char *tuplebuf, bool isleaf) const
 Compares a (key, recid) pair to a leaf or internal record in the B-tree. More...
 
SlotId BinarySearchOnPage (char *buf, const IndexKey *key, const RecordId &recid)
 Uses binary search to find the last slot whose key-recid pair is smaller than or equal to the (key, recid) pair. More...
 
BufferId FindLeafPage (const IndexKey *key, const RecordId &recid, std::vector< PathItem > *p_search_path)
 Finds a leaf page from the root such that: More...
 
BufferId FindLeafPage (BufferId bufid, const IndexKey *key, const RecordId &recid, std::vector< PathItem > *p_search_path)
 Finds a leaf page from a page pinned in buffer frame bufid whose key range covers the (key, recid) pair such that. More...
 
SlotId FindInsertionSlotIdOnLeaf (const IndexKey *key, const RecordId &recid, BufferId bufid)
 Finds the slot id where the inserting (key, recid) pair at that slot will keep the page's key and record id pairs unique and sorted. More...
 
void InsertRecordOnPage (BufferId bufid, SlotId insertion_sid, maxaligned_char_buf &&recbuf, std::vector< PathItem > &&search_path)
 Inserts a record in recbuf onto a leaf page already pinned in the buffer frame bufid whose key range covers the (key, recid) pair, at slot insertion_sid. More...
 
void CreateNewRoot (BufferId bufid, maxaligned_char_buf &&recbuf)
 Creates a new root page and updates the B-Tree meta page. More...
 
maxaligned_char_buf SplitPage (BufferId bufid, SlotId insertion_sid, maxaligned_char_buf &&recbuf)
 Splits a page pinned in bufid that is too full to insert an additional record in recbuf at slot insertion_sid, into two sibling pages such that. More...
 
SlotId FindDeletionSlotIdOnLeaf (const IndexKey *key, const RecordId &recid, BufferId &bufid, std::vector< PathItem > &search_path)
 Finds the slot to delete for the (key, recid) pair. More...
 
void DeleteSlotOnPage (BufferId bufid, SlotId sid)
 Deletes a slot from a page. More...
 
void HandleMinPageUsage (BufferId bufid, std::vector< PathItem > &&search_path)
 Checks if a page pinned in buffer frame bufid needs to be merged or rebalanced with a sibling page sharing the same parent. More...
 
bool TryMergeOrRebalancePages (BufferId lbufid, BufferId rbufid, BufferId pbufid, SlotId lsid)
 Try to merge or reblanace two sibling pages. More...
 
bool MergePages (BufferId lbufid, BufferId rbufid, BufferId pbufid, SlotId lsid)
 Merges two sibling pages if there're enough space. More...
 
bool RebalancePages (BufferId lbufid, BufferId rbufid, BufferId pbufid, SlotId lsid)
 Rebalances two sibling pages if it is possible and it should choose a way that minimizes the differences between the page usages of the two pages. More...
 

Private Attributes

std::unique_ptr< Filem_file
 
std::vector< FunctionInfom_lt_funcs
 
std::vector< FunctionInfom_eq_funcs
 

Friends

class TestBTree
 
class Iterator
 

Additional Inherited Members

- Protected Member Functions inherited from taco::Index
 Index (std::shared_ptr< const IndexDesc > idxdesc)
 

Detailed Description

A B-tree stored in the persistent files.

Logically the tree is a map of key, record ID pairs that has one-to-one correspondence to the heap file records in some table. The key is a record payload that only consists of the key columns as defined by the key schema, extracted from a heap tuple, and the record ID is that of the heap tuple in the table (which you may think of as a pointer into the table).

Internally, B-tree considers each (key, recid) pair as the key as a whole. Meaning, two pairs with the same key but different record ids are considered different by the B-tree. Duplicate (key, recid) pair must not appear at the same time in a B-tree index. This ensures all the pairs are unique within the index and we can uniquely identify each indexed item in the tree (which makes updating and iteration meaningful).

The tree consists of several levels of internal pages and a level of leaf pages. Each page is managed as a VarlenDataPage. The pages on the same level are linked as a double-linked list through the their page headers BTreePageHeaderData. Note that this header is stored in the user data area of the VarlenDataPage rather than the beginning of the page.

For the leaf pages, each (key, recid) pair is stored as a single record, which consists of a header BTreeLeafRecordHeaderData at the beginning and the actual key payload immediately following the header. We store the recid in the header instead of the payload. All the records are sorted by key and record id lexicographically (see BTreeTupleCompare() for details).

For internal pages, it stores the the child page links and separator keys as internal records. Each internal records consists of a header BTreeInternalRecordHeaderData at the beginning and the separator key's payload immediately follows, with the exception of the very first internal record on an internal page (see below). Note that, we store the child page number and the heap record id in the header rather than in the payload. The first internal record on an internal page only has a header BTreeInternalRecordHeaderData without any payload following that, and its heap record id is undefined (meaning we should never use its value). The first internal record is assumed to be smaller than any other internal record and/or (key, recid) pairs. All the remaining records are sorted by the key and heap record id lexicographically (i.e., its key is negative infinity in all fields). For an internal record with key k_i, heap record id recid_i, and child page number c_i, let k_{i+1}, recid_{i+1} be the key and the heap record id in the internal record after it (if any), the subtree c_i points to should contain and may only contain (key, recid) pairs such that (k_{i}, recid_{i}) <= (key, recid) < (k_{i+1}, recid_{i+1}). This (key, recid) pair range is said to be covered by the subtree.

Constructor & Destructor Documentation

◆ BTree()

taco::BTree::BTree ( std::shared_ptr< const IndexDesc idxdesc,
std::unique_ptr< File file,
std::vector< FunctionInfo lt_funcs,
std::vector< FunctionInfo eq_funcs 
)
private

◆ ~BTree()

taco::BTree::~BTree ( )
virtual

Member Function Documentation

◆ BinarySearchOnPage()

SlotId taco::BTree::BinarySearchOnPage ( char *  buf,
const IndexKey key,
const RecordId recid 
)
private

Uses binary search to find the last slot whose key-recid pair is smaller than or equal to the (key, recid) pair.

The comparison of the key-recid pair is defined by BTreeTupleCompare(), except that the first slot on an internal page is treated as negative infinity (which is smaller than any key).

For an internal page, there must be such a slot as the key in first slot on the page is treated as negative infinity (i.e., it is smaller than any key). For a leaf page, there may not be such a slot, in which case it should return INVALID_SID (which is MinSlotId - 1).

◆ BTreeTupleCompare()

int taco::BTree::BTreeTupleCompare ( const IndexKey key,
const RecordId recid,
const char *  tuplebuf,
bool  isleaf 
) const
private

Compares a (key, recid) pair to a leaf or internal record in the B-tree.

The key is first compared against the key stored in the tuplebuf lexicographically by fields. If there's any NULL field, it is considered as smaller than any non-NULL value. If the key compares equal to the key stored in tuplebuf, it then compares recid against the heap record id stored in tuplebuf.

Returns -1 if (key, recid) is smaller than the key and record id pair stored in tuplebuf; 0 if they equal; or 1 if the former is larger than the latter.

Note that TupleCompare() treats a key that is exactly a prefix of the fields stored in a record payload as equal, whereas BTreeTupleCompare() should consider that key is smaller than the key stored in the record payload. Hence, BTreeTupleCompare() needs to consider this case after calling TupleCompare().

◆ BulkLoad()

void taco::BTree::BulkLoad ( BulkLoadIterator iter)
overridevirtual

Loads all the (key, record id) pairs provided by the iterator iter into an empty index.

It is undefined if the index is not empty, and the specific index implementation is allowed to (but not required to) throw an error when called on an non-empty index.

Upon successful return, the iterator \iter is ended by calling iter->EndScan().

The default implementation calls the InsertKey() function over all the pairs returned by iter. The specific index implementation should override it with a more efficient way if there's any.

Reimplemented from taco::Index.

◆ Create()

std::unique_ptr< BTree > taco::BTree::Create ( std::shared_ptr< const IndexDesc idxdesc)
static

◆ CreateInternalRecord()

void taco::BTree::CreateInternalRecord ( const Record child_recbuf,
PageNumber  child_pid,
bool  child_isleaf,
maxaligned_char_buf recbuf 
)
private

Creates a new internal record with separator key and heap record id from child_recbuf, and the child page number child_pid in recbuf.

child_isleaf must be true if child_recbuf is a leaf record. Conversely, it must be false if child_recbuf is an internal record. The record buffer recbuf is cleared upon entry and contains the newly created record upon return.

◆ CreateLeafRecord()

void taco::BTree::CreateLeafRecord ( const IndexKey key,
const RecordId recid,
maxaligned_char_buf recbuf 
)
private

Creates a new leaf record with the given key and heap \recid in recbuf.

The record buffer is cleared upon entry and contains the newly created record upon return.

◆ CreateNewBTreePage()

BufferId taco::BTree::CreateNewBTreePage ( bool  isroot,
bool  isleaf,
PageNumber  prev_pid,
PageNumber  next_pid 
)
private

Allocates and initializes a new BTree internal or leaf page.

Returns the buffer frame id where the new page is pinned.

◆ CreateNewRoot()

void taco::BTree::CreateNewRoot ( BufferId  bufid,
maxaligned_char_buf &&  recbuf 
)
private

Creates a new root page and updates the B-Tree meta page.

The old root page (which has already been split) is initially pinned in bufid and the record buffer recbuf contains an internal record that points to the right sibling of the old root page. Upon return the old root page is unpinned.

◆ DeleteKey()

bool taco::BTree::DeleteKey ( const IndexKey key,
RecordId rid 
)
overridevirtual

Deletes an arbitrary data entry with matching key if rid is invalid.

Deletes the data entry with matching (key, rid) pair if rid is valid.

Returns true if the deletion succeeds. Upon successful return, update rid to the deleted indexed item's record id. Returns false if the key (if rid is invalid) or (key, rid) pair (if rid is valid) is not found in the index, in which case, rid should be set to invalid.

Implements taco::Index.

◆ DeleteSlotOnPage()

void taco::BTree::DeleteSlotOnPage ( BufferId  bufid,
SlotId  sid 
)
private

Deletes a slot from a page.

If slot sid is the last slot on the root page, it should update the header flags to indicate this becomes a leaf page as well.

◆ FindDeletionSlotIdOnLeaf()

SlotId taco::BTree::FindDeletionSlotIdOnLeaf ( const IndexKey key,
const RecordId recid,
BufferId bufid,
std::vector< PathItem > &  search_path 
)
private

Finds the slot to delete for the (key, recid) pair.

Upon entry, the page pinned in buffer frame bufid should be 1) either the page whose key range covers (key, recid) if recid is valid; 2) or a page that is the left-most page on the leaf level whose key range possibly overlaps with some pair with the given key if recid is invalid.

This function may need to move to the right page of the original page to find the slot to delete in some cases, in which case, the search path at the lowest internal page level may also need to be updated. However, it should simply add 1 to the slot id. CheckMinPageUsage() is be responsible for fetching the sibling parent page if necessary. The bufid should be updated if this function moves to the right page and it should properly unpin the previous leaf page. It is also responsible for unpinning the page should there be any uncaught error.

Returns the slot id of the leaf record to be deleted if it has a matching key when recid is invalid, or (key, recid) pair when recid is valid. Otherwise, returns INVALID_SID.

◆ FindInsertionSlotIdOnLeaf()

SlotId taco::BTree::FindInsertionSlotIdOnLeaf ( const IndexKey key,
const RecordId recid,
BufferId  bufid 
)
private

Finds the slot id where the inserting (key, recid) pair at that slot will keep the page's key and record id pairs unique and sorted.

If this a unique index, it should further ensure that no key (instead of pairs) on this page is equal to the insertion key. For this purpose, NULL values do not compare equal.

◆ FindLeafPage() [1/2]

BufferId taco::BTree::FindLeafPage ( BufferId  bufid,
const IndexKey key,
const RecordId recid,
std::vector< PathItem > *  p_search_path 
)
private

Finds a leaf page from a page pinned in buffer frame bufid whose key range covers the (key, recid) pair such that.

1) for a null key, the found leaf page is the left-most page on the leaf level;

2) for a non-null key, the found leaf page whose key range covers the (key, recid) pair (note that there's only one such page on the leaf level).

Returns the buffer id of the buffer frame where the found leaf page is pinned, with its parent page unpinned.

If p_search_path is not null, this function stores the record id of all the internal records in *p_search_path. Otherwise, it should not dereferences p_search_path.

It is undefined if the initial page pinned in bufid does not have a key range that covers the (key, recid) pair.

◆ FindLeafPage() [2/2]

BufferId taco::BTree::FindLeafPage ( const IndexKey key,
const RecordId recid,
std::vector< PathItem > *  p_search_path 
)
private

Finds a leaf page from the root such that:

1) for a null key, the found leaf page is the left-most page on the leaf level;

2) for a non-null key, the found leaf page whose key range covers the (key, recid) pair (note that there's only one such page on the leaf level).

Returns the buffer id of the buffer frame where the found leaf page is pinned. Note that the returned leaf page is the only page that should be pinned by FindLeafPage() upon return.

If p_search_path is not null, this function stores the record id of all the internal records in *p_search_path. Otherwise, it should not dereferences p_search_path.

◆ GetBTreeMetaPage()

BufferId taco::BTree::GetBTreeMetaPage ( )
private

Pins the BTree meta page and returns the buffer ID of the buffer frame where it is pinned.

◆ GetTreeHeight()

uint32_t taco::BTree::GetTreeHeight ( )

Returns the tree height.

It is 1 + the level of the root page.

◆ HandleMinPageUsage()

void taco::BTree::HandleMinPageUsage ( BufferId  bufid,
std::vector< PathItem > &&  search_path 
)
private

Checks if a page pinned in buffer frame bufid needs to be merged or rebalanced with a sibling page sharing the same parent.

search_path is the record ids of the internal records that leads to the leaf page where we deleted a tuple (with the exception of the last record id in search path which can have a slot id that is 1 larger than the maximum on the page, see FindDeletionSlotIdOnLeaf()).

The buffer and any buffer additionally obtained in this function should be unpinned upon return.

◆ Initialize()

void taco::BTree::Initialize ( const IndexDesc idxdesc)
static

◆ InsertKey()

bool taco::BTree::InsertKey ( const IndexKey key,
RecordId  rid 
)
overridevirtual

Inserts the (key, rid) pair into the index.

Returns true if the insertion succeeds. Returns false if the index was declared as a unique index and a duplicate key is found, or for a non-unique index, the (key, rid) pair already exists in the index.

Implements taco::Index.

◆ InsertRecordOnPage()

void taco::BTree::InsertRecordOnPage ( BufferId  bufid,
SlotId  insertion_sid,
maxaligned_char_buf &&  recbuf,
std::vector< PathItem > &&  search_path 
)
private

Inserts a record in recbuf onto a leaf page already pinned in the buffer frame bufid whose key range covers the (key, recid) pair, at slot insertion_sid.

search_path contains a stack of record ids of the internal records when we initially searched for the insertion point.

Note that, InsertRecordOnPage() may need to split the page that it is inserting into, which may cause split of its parent or ancestor pages recursively. Upon return, the page it is inserting into should be unpinned and any additional pin obtained during the recursive split must also be dropped.

◆ IsEmpty()

bool taco::BTree::IsEmpty ( )

Returns whether this tree is empty.

◆ MergePages()

bool taco::BTree::MergePages ( BufferId  lbufid,
BufferId  rbufid,
BufferId  pbufid,
SlotId  lsid 
)
private

Merges two sibling pages if there're enough space.

Returns true if it successfully merged the two pages, or false if it cannot merge the two pages. Unpon successful return, it should free the right page and unpin the left page. Otherwise, it should leave all three pages pinned.

Note that if MergePages() determines that it cannot merge the pages, it should not physically modify any of the three pages (even if the page still has the same set of records logically after such modification) and should not mark these buffered pages as dirty.

◆ RebalancePages()

bool taco::BTree::RebalancePages ( BufferId  lbufid,
BufferId  rbufid,
BufferId  pbufid,
SlotId  lsid 
)
private

Rebalances two sibling pages if it is possible and it should choose a way that minimizes the differences between the page usages of the two pages.

Returns true if it is successful, or false if it cannot rebalance the two pages. Upon successful return, it should unpin both the left and the right pages. Otherwise, it should leave all three pages pinned.

Note that if RebalancePages() determines that it cannot rebalance the pages, it should not physically modify any of the three pages (even if the page still has the same set of records logically after such modification) and should not mark these buffered pages as dirty.

◆ SplitPage()

maxaligned_char_buf taco::BTree::SplitPage ( BufferId  bufid,
SlotId  insertion_sid,
maxaligned_char_buf &&  recbuf 
)
private

Splits a page pinned in bufid that is too full to insert an additional record in recbuf at slot insertion_sid, into two sibling pages such that.

1) the two pages have exactly all the records on the original page plus the new record in recbuf (for internal page split, the payload of the first record on the new right sibling may be purged);

2) all the (key, recid) pairs are still sorted and the old page is the left sibling page;

3) and the choice of split point minimizes the difference between the page usages of the two pages.

Returns an internal record that points to the new (right) page and its key and heap record id are correctly set such that all key and record id pairs on the old (left) page are strictly smaller than the internal record, and all key and record id pairs on the new (right) page are larger than or equal to the internal record.

Note that the buffer pin on the left page should be kept, while the pin on the right page should be dropped, upon successful return.

◆ StartScan()

std::unique_ptr< Index::Iterator > taco::BTree::StartScan ( const IndexKey lower,
bool  lower_isstrict,
const IndexKey upper,
bool  upper_isstrict 
)
overridevirtual

Returns a forward-iterator for all the indexed items in the specified range defined by the arguments.

A non-null lower defines the lower bound and the iterator should return the indexed items with keys that are >= (lower_isstrict == false), or > (lower_isstrict == true) lower. A non-null upper defines the upper bound, and the iterator should not return the indexed items whose keys are >= (upper_isstrict == true), or > (upper_isstrict = false). If lower (and/or upper) is null, there is no lower (upper resp.) bound for the returned indexed items, in which case, lower_isstrict and upper_isstrict are ignored.

lower and upper are allowed to have fewer keys than the index schema does, the comparison between the key of an indexed item and lower or upper should be done only on the prefix of lower->GetNumKeys() or upper->GetNumKeys() keys.

Not all combinations of arguments can be supported by the underlying index and it should throw an error if the underlying index cannot reasonably support that. Tree indexes (e.g., B-tree) should support range search and prefix searches, while hash indexes may just support equality searches.

Implements taco::Index.

◆ TryMergeOrRebalancePages()

bool taco::BTree::TryMergeOrRebalancePages ( BufferId  lbufid,
BufferId  rbufid,
BufferId  pbufid,
SlotId  lsid 
)
private

Try to merge or reblanace two sibling pages.

Returns true if any of these succeed. Upon successful return, all but the parent buffer pin are dropped. If the function returns false, it does not unpin any page.

Friends And Related Function Documentation

◆ Iterator

friend class Iterator
friend

◆ TestBTree

friend class TestBTree
friend

Member Data Documentation

◆ m_eq_funcs

std::vector<FunctionInfo> taco::BTree::m_eq_funcs
private

◆ m_file

std::unique_ptr<File> taco::BTree::m_file
private

◆ m_lt_funcs

std::vector<FunctionInfo> taco::BTree::m_lt_funcs
private

The documentation for this class was generated from the following files: