taco-db
0.1.0
|
A B-tree stored in the persistent files. More...
#include <index/btree/BTree.h>
Classes | |
class | Iterator |
Public Member Functions | |
virtual | ~BTree () |
void | BulkLoad (BulkLoadIterator &iter) override |
Loads all the (key, record id) pairs provided by the iterator iter into an empty index. More... | |
bool | InsertKey (const IndexKey *key, RecordId recid) override |
Inserts the (key, rid) pair into the index. More... | |
bool | DeleteKey (const IndexKey *key, RecordId &recid) override |
Deletes an arbitrary data entry with matching key if rid is invalid. More... | |
std::unique_ptr< Index::Iterator > | StartScan (const IndexKey *lower, bool lower_isstrict, const IndexKey *upper, bool upper_isstrict) override |
Returns a forward-iterator for all the indexed items in the specified range defined by the arguments. More... | |
bool | IsEmpty () |
Returns whether this tree is empty. More... | |
uint32_t | GetTreeHeight () |
Returns the tree height. More... | |
Public Member Functions inherited from taco::Index | |
virtual | ~Index () |
Index object destructor. More... | |
const IndexDesc * | GetIndexDesc () const |
Returns the index descriptor of this index. More... | |
const Schema * | GetKeySchema () const |
Returns the key schema of this index. More... | |
bool | InsertRecord (const Record &rec, const Schema *tabschema=nullptr) |
Inserts the (key, rid) extracted from the table record into the index. More... | |
bool | DeleteRecord (Record &rec, const Schema *tabschema=nullptr) |
Deletes the (key, rid) extracted from the table record from the index. More... | |
Static Public Member Functions | |
static void | Initialize (const IndexDesc *idxdesc) |
static std::unique_ptr< BTree > | Create (std::shared_ptr< const IndexDesc > idxdesc) |
Static Public Member Functions inherited from taco::Index | |
static void | Initialize (const IndexDesc *idxdesc) |
Initializes an index described by an index descriptor. More... | |
static std::unique_ptr< Index > | Create (std::shared_ptr< const IndexDesc > idxdesc) |
Creates an index object over an index file described by the index descriptor, which has already been initialized. More... | |
Private Member Functions | |
BTree (std::shared_ptr< const IndexDesc > idxdesc, std::unique_ptr< File > file, std::vector< FunctionInfo > lt_funcs, std::vector< FunctionInfo > eq_funcs) | |
BufferId | CreateNewBTreePage (bool isroot, bool isleaf, PageNumber prev_pid, PageNumber next_pid) |
Allocates and initializes a new BTree internal or leaf page. More... | |
BufferId | GetBTreeMetaPage () |
Pins the BTree meta page and returns the buffer ID of the buffer frame where it is pinned. More... | |
void | CreateLeafRecord (const IndexKey *key, const RecordId &recid, maxaligned_char_buf &recbuf) |
Creates a new leaf record with the given key and heap \recid in recbuf . More... | |
void | CreateInternalRecord (const Record &child_recbuf, PageNumber child_pid, bool child_isleaf, maxaligned_char_buf &recbuf) |
Creates a new internal record with separator key and heap record id from child_recbuf , and the child page number child_pid in recbuf . More... | |
int | BTreeTupleCompare (const IndexKey *key, const RecordId &recid, const char *tuplebuf, bool isleaf) const |
Compares a (key, recid) pair to a leaf or internal record in the B-tree. More... | |
SlotId | BinarySearchOnPage (char *buf, const IndexKey *key, const RecordId &recid) |
Uses binary search to find the last slot whose key-recid pair is smaller than or equal to the (key, recid) pair. More... | |
BufferId | FindLeafPage (const IndexKey *key, const RecordId &recid, std::vector< PathItem > *p_search_path) |
Finds a leaf page from the root such that: More... | |
BufferId | FindLeafPage (BufferId bufid, const IndexKey *key, const RecordId &recid, std::vector< PathItem > *p_search_path) |
Finds a leaf page from a page pinned in buffer frame bufid whose key range covers the (key, recid) pair such that. More... | |
SlotId | FindInsertionSlotIdOnLeaf (const IndexKey *key, const RecordId &recid, BufferId bufid) |
Finds the slot id where the inserting (key, recid) pair at that slot will keep the page's key and record id pairs unique and sorted. More... | |
void | InsertRecordOnPage (BufferId bufid, SlotId insertion_sid, maxaligned_char_buf &&recbuf, std::vector< PathItem > &&search_path) |
Inserts a record in recbuf onto a leaf page already pinned in the buffer frame bufid whose key range covers the (key, recid) pair, at slot insertion_sid . More... | |
void | CreateNewRoot (BufferId bufid, maxaligned_char_buf &&recbuf) |
Creates a new root page and updates the B-Tree meta page. More... | |
maxaligned_char_buf | SplitPage (BufferId bufid, SlotId insertion_sid, maxaligned_char_buf &&recbuf) |
Splits a page pinned in bufid that is too full to insert an additional record in recbuf at slot insertion_sid , into two sibling pages such that. More... | |
SlotId | FindDeletionSlotIdOnLeaf (const IndexKey *key, const RecordId &recid, BufferId &bufid, std::vector< PathItem > &search_path) |
Finds the slot to delete for the (key, recid) pair. More... | |
void | DeleteSlotOnPage (BufferId bufid, SlotId sid) |
Deletes a slot from a page. More... | |
void | HandleMinPageUsage (BufferId bufid, std::vector< PathItem > &&search_path) |
Checks if a page pinned in buffer frame bufid needs to be merged or rebalanced with a sibling page sharing the same parent. More... | |
bool | TryMergeOrRebalancePages (BufferId lbufid, BufferId rbufid, BufferId pbufid, SlotId lsid) |
Try to merge or reblanace two sibling pages. More... | |
bool | MergePages (BufferId lbufid, BufferId rbufid, BufferId pbufid, SlotId lsid) |
Merges two sibling pages if there're enough space. More... | |
bool | RebalancePages (BufferId lbufid, BufferId rbufid, BufferId pbufid, SlotId lsid) |
Rebalances two sibling pages if it is possible and it should choose a way that minimizes the differences between the page usages of the two pages. More... | |
Private Attributes | |
std::unique_ptr< File > | m_file |
std::vector< FunctionInfo > | m_lt_funcs |
std::vector< FunctionInfo > | m_eq_funcs |
Friends | |
class | TestBTree |
class | Iterator |
Additional Inherited Members | |
Protected Member Functions inherited from taco::Index | |
Index (std::shared_ptr< const IndexDesc > idxdesc) | |
A B-tree stored in the persistent files.
Logically the tree is a map of key, record ID pairs that has one-to-one correspondence to the heap file records in some table. The key is a record payload that only consists of the key columns as defined by the key schema, extracted from a heap tuple, and the record ID is that of the heap tuple in the table (which you may think of as a pointer into the table).
Internally, B-tree considers each (key, recid) pair as the key as a whole. Meaning, two pairs with the same key but different record ids are considered different by the B-tree. Duplicate (key, recid) pair must not appear at the same time in a B-tree index. This ensures all the pairs are unique within the index and we can uniquely identify each indexed item in the tree (which makes updating and iteration meaningful).
The tree consists of several levels of internal pages and a level of leaf pages. Each page is managed as a VarlenDataPage. The pages on the same level are linked as a double-linked list through the their page headers BTreePageHeaderData
. Note that this header is stored in the user data area of the VarlenDataPage rather than the beginning of the page.
For the leaf pages, each (key, recid) pair is stored as a single record, which consists of a header BTreeLeafRecordHeaderData
at the beginning and the actual key payload immediately following the header. We store the recid in the header instead of the payload. All the records are sorted by key and record id lexicographically (see BTreeTupleCompare() for details).
For internal pages, it stores the the child page links and separator keys as internal records. Each internal records consists of a header BTreeInternalRecordHeaderData
at the beginning and the separator key's payload immediately follows, with the exception of the very first internal record on an internal page (see below). Note that, we store the child page number and the heap record id in the header rather than in the payload. The first internal record on an internal page only has a header BTreeInternalRecordHeaderData
without any payload following that, and its heap record id is undefined (meaning we should never use its value). The first internal record is assumed to be smaller than any other internal record and/or (key, recid) pairs. All the remaining records are sorted by the key and heap record id lexicographically (i.e., its key is negative infinity in all fields). For an internal record with key k_i, heap record id recid_i, and child page number c_i, let k_{i+1}, recid_{i+1} be the key and the heap record id in the internal record after it (if any), the subtree c_i points to should contain and may only contain (key, recid) pairs such that (k_{i}, recid_{i}) <= (key, recid) < (k_{i+1}, recid_{i+1}). This (key, recid) pair range is said to be covered by the subtree.
|
private |
|
virtual |
|
private |
Uses binary search to find the last slot whose key-recid pair is smaller than or equal to the (key, recid)
pair.
The comparison of the key-recid pair is defined by BTreeTupleCompare()
, except that the first slot on an internal page is treated as negative infinity (which is smaller than any key).
For an internal page, there must be such a slot as the key in first slot on the page is treated as negative infinity (i.e., it is smaller than any key). For a leaf page, there may not be such a slot, in which case it should return INVALID_SID (which is MinSlotId - 1).
|
private |
Compares a (key, recid) pair to a leaf or internal record in the B-tree.
The key is first compared against the key stored in the tuplebuf
lexicographically by fields. If there's any NULL field, it is considered as smaller than any non-NULL value. If the key
compares equal to the key stored in tuplebuf
, it then compares recid
against the heap record id stored in tuplebuf
.
Returns -1 if (key, recid) is smaller than the key and record id pair stored in
tuplebuf
; 0 if they equal; or 1 if the former is larger than the latter.
Note that TupleCompare() treats a key that is exactly a prefix of the fields stored in a record payload as equal, whereas BTreeTupleCompare() should consider that key is smaller than the key stored in the record payload. Hence, BTreeTupleCompare() needs to consider this case after calling TupleCompare().
|
overridevirtual |
Loads all the (key, record id) pairs provided by the iterator iter
into an empty index.
It is undefined if the index is not empty, and the specific index implementation is allowed to (but not required to) throw an error when called on an non-empty index.
Upon successful return, the iterator \iter is ended by calling iter->EndScan()
.
The default implementation calls the InsertKey() function over all the pairs returned by iter
. The specific index implementation should override it with a more efficient way if there's any.
Reimplemented from taco::Index.
|
private |
Creates a new internal record with separator key and heap record id from child_recbuf
, and the child page number child_pid
in recbuf
.
child_isleaf
must be true if child_recbuf
is a leaf record. Conversely, it must be false if child_recbuf
is an internal record. The record buffer recbuf
is cleared upon entry and contains the newly created record upon return.
|
private |
Creates a new leaf record with the given key
and heap \recid in recbuf
.
The record buffer is cleared upon entry and contains the newly created record upon return.
|
private |
Allocates and initializes a new BTree internal or leaf page.
Returns the buffer frame id where the new page is pinned.
|
private |
Creates a new root page and updates the B-Tree meta page.
The old root page (which has already been split) is initially pinned in bufid
and the record buffer recbuf
contains an internal record that points to the right sibling of the old root page. Upon return the old root page is unpinned.
Deletes an arbitrary data entry with matching key
if rid
is invalid.
Deletes the data entry with matching (key
, rid
) pair if rid
is valid.
Returns true if the deletion succeeds. Upon successful return, update rid
to the deleted indexed item's record id. Returns false if the key
(if rid
is invalid) or (key
, rid
) pair (if rid
is valid) is not found in the index, in which case, rid should be set to invalid.
Implements taco::Index.
Deletes a slot from a page.
If slot sid
is the last slot on the root page, it should update the header flags to indicate this becomes a leaf page as well.
|
private |
Finds the slot to delete for the (key, recid)
pair.
Upon entry, the
page pinned in buffer frame bufid
should be 1) either the page whose key range covers (key, recid)
if recid is valid; 2) or a page that is the left-most page on the leaf level whose key range possibly overlaps with some pair with the given key
if recid is invalid.
This function may need to move to the right page of the original page to find the slot to delete in some cases, in which case, the search path at the lowest internal page level may also need to be updated. However, it should simply add 1 to the slot id. CheckMinPageUsage() is be responsible for fetching the sibling parent page if necessary. The bufid should be updated if this function moves to the right page and it should properly unpin the previous leaf page. It is also responsible for unpinning the page should there be any uncaught error.
Returns the slot id of the leaf record to be deleted if it has a matching key when recid is invalid, or (key, recid) pair when recid is valid. Otherwise, returns INVALID_SID.
|
private |
Finds the slot id where the inserting (key, recid) pair at that slot will keep the page's key and record id pairs unique and sorted.
If this a unique index, it should further ensure that no key (instead of pairs) on this page is equal to the insertion key
. For this purpose, NULL values do not compare equal.
|
private |
Finds a leaf page from a page pinned in buffer frame bufid
whose key range covers the (key, recid) pair such that.
1) for a null key
, the found leaf page is the left-most page on the leaf level;
2) for a non-null key
, the found leaf page whose key range covers the (key, recid) pair (note that there's only one such page on the leaf level).
Returns the buffer id of the buffer frame where the found leaf page is pinned, with its parent page unpinned.
If p_search_path
is not null, this function stores the record id of all the internal records in *p_search_path
. Otherwise, it should not dereferences p_search_path
.
It is undefined if the initial page pinned in bufid
does not have a key range that covers the (key, recid) pair.
|
private |
Finds a leaf page from the root such that:
1) for a null key
, the found leaf page is the left-most page on the leaf level;
2) for a non-null key
, the found leaf page whose key range covers the (key, recid) pair (note that there's only one such page on the leaf level).
Returns the buffer id of the buffer frame where the found leaf page is pinned. Note that the returned leaf page is the only page that should be pinned by FindLeafPage()
upon return.
If p_search_path
is not null, this function stores the record id of all the internal records in *p_search_path
. Otherwise, it should not dereferences p_search_path
.
|
private |
Pins the BTree meta page and returns the buffer ID of the buffer frame where it is pinned.
uint32_t taco::BTree::GetTreeHeight | ( | ) |
Returns the tree height.
It is 1 + the level of the root page.
|
private |
Checks if a page pinned in buffer frame bufid
needs to be merged or rebalanced with a sibling page sharing the same parent.
search_path
is the record ids of the internal records that leads to the leaf page where we deleted a tuple (with the exception of the last record id in search path which can have a slot id that is 1 larger than the maximum on the page, see FindDeletionSlotIdOnLeaf()).
The buffer and any buffer additionally obtained in this function should be unpinned upon return.
|
static |
Inserts the (key, rid) pair into the index.
Returns true if the insertion succeeds. Returns false if the index was declared as a unique index and a duplicate key is found, or for a non-unique index, the (key, rid) pair already exists in the index.
Implements taco::Index.
|
private |
Inserts a record in recbuf
onto a leaf page already pinned in the buffer frame bufid
whose key range covers the (key, recid) pair, at slot
insertion_sid
.
search_path
contains a stack of record ids of the internal records when we initially searched for the insertion point.
Note that, InsertRecordOnPage() may need to split the page that it is inserting into, which may cause split of its parent or ancestor pages recursively. Upon return, the page it is inserting into should be unpinned and any additional pin obtained during the recursive split must also be dropped.
bool taco::BTree::IsEmpty | ( | ) |
Returns whether this tree is empty.
|
private |
Merges two sibling pages if there're enough space.
Returns true if it successfully merged the two pages, or false if it cannot merge the two pages. Unpon successful return, it should free the right page and unpin the left page. Otherwise, it should leave all three pages pinned.
Note that if MergePages() determines that it cannot merge the pages, it should not physically modify any of the three pages (even if the page still has the same set of records logically after such modification) and should not mark these buffered pages as dirty.
|
private |
Rebalances two sibling pages if it is possible and it should choose a way that minimizes the differences between the page usages of the two pages.
Returns true if it is successful, or false if it cannot rebalance the two pages. Upon successful return, it should unpin both the left and the right pages. Otherwise, it should leave all three pages pinned.
Note that if RebalancePages() determines that it cannot rebalance the pages, it should not physically modify any of the three pages (even if the page still has the same set of records logically after such modification) and should not mark these buffered pages as dirty.
|
private |
Splits a page pinned in bufid
that is too full to insert an additional record in recbuf
at slot insertion_sid
, into two sibling pages such that.
1) the two pages have exactly all the records on the original page plus the new record in recbuf
(for internal page split, the payload of the first record on the new right sibling may be purged);
2) all the (key, recid) pairs are still sorted and the old page is the left sibling page;
3) and the choice of split point minimizes the difference between the page usages of the two pages.
Returns an internal record that points to the new (right) page and its key and heap record id are correctly set such that all key and record id pairs on the old (left) page are strictly smaller than the internal record, and all key and record id pairs on the new (right) page are larger than or equal to the internal record.
Note that the buffer pin on the left page should be kept, while the pin on the right page should be dropped, upon successful return.
|
overridevirtual |
Returns a forward-iterator for all the indexed items in the specified range defined by the arguments.
A non-null lower
defines the lower bound and the iterator should return the indexed items with keys that are >= (lower_isstrict
== false), or > (lower_isstrict
== true) lower
. A non-null upper
defines the upper bound, and the iterator should not return the indexed items whose keys are >= (upper_isstrict
== true), or > (upper_isstrict
= false). If lower
(and/or upper
) is null, there is no lower (upper resp.) bound for the returned indexed items, in which case, lower_isstrict
and upper_isstrict
are ignored.
lower
and upper
are allowed to have fewer keys than the index schema does, the comparison between the key of an indexed item and lower
or upper
should be done only on the prefix of lower->GetNumKeys()
or upper->GetNumKeys()
keys.
Not all combinations of arguments can be supported by the underlying index and it should throw an error if the underlying index cannot reasonably support that. Tree indexes (e.g., B-tree) should support range search and prefix searches, while hash indexes may just support equality searches.
Implements taco::Index.
|
private |
Try to merge or reblanace two sibling pages.
Returns true if any of these succeed. Upon successful return, all but the parent buffer pin are dropped. If the function returns false, it does not unpin any page.
|
friend |
|
friend |
|
private |
|
private |
|
private |