ExternalSort
is a general utility class to do external sorting on any number of general bytes.
More...
#include <extsort/ExternalSort.h>
ExternalSort
is a general utility class to do external sorting on any number of general bytes.
It takes a user-defined comparison function to compare the content of two binary buffers and the memory budget for this external sort operator. After calling Sort()
function it returns an iterator iterating through sorted items in ascending order.
◆ ~ExternalSort()
taco::ExternalSort::~ExternalSort |
( |
| ) |
|
◆ ExternalSort()
◆ Create()
Create an ExternalSort
instance where comp
provides the method of comparision function of two items.
merge_ways
indicate the memory budget of this external sorting operator. Specifically, the memory budget for this operator should be roughtly (merge_ways + 1) * PAGE_SIZE
◆ GenerateNewPass()
void taco::ExternalSort::GenerateNewPass |
( |
std::vector< PageNumber > & |
new_run_boundaries | ) |
|
|
private |
Generate a new sorted pass through a m_merge_ways
merge from m_file[1 - m_current_pass]
.
Persist the new sorted runs in m_file[m_current_pass]
and save the new run boundaries generated after the merge in new_run_boudaries
.
◆ MergeInternalRuns()
PageNumber taco::ExternalSort::MergeInternalRuns |
( |
size_t |
low_run, |
|
|
size_t |
high_run |
|
) |
| |
|
private |
Merge the runs low_run
(inclusive) to high_run
(inclusive) from last pass whose boundaries are stored in m_run_boundaries.
After this, the new merged run should be persistent at the end of m_file[m_current_pass]
and return the page boundary of newly generated sorted pass through return value.
◆ Sort()
Given an input_iter
, it iterates over input_iter
in one pass, and sorts all the returned items using the SortCompareFunction provided during construction.
Returns a ItemIterator over the sorted results.
The input_iter
is allowed to not support rewinding, but the returned output ItemIterator must support rewinding.
◆ SortInitialRuns()
void taco::ExternalSort::SortInitialRuns |
( |
ItemIterator * |
item_iter | ) |
|
|
private |
Iterate through item_iterator
and create the initial runs based on memory budget.
You can use merge_ways * PAGE_SIZE
bytes for storing the actual record payloads and another array for storing Record structures that points to payloads. Persist it on m_file
[0] and page boundaries of all sorted runs after this call should be saved in m_run_boundaries
.
◆ WriteOutInitialPass()
void taco::ExternalSort::WriteOutInitialPass |
( |
std::vector< Record > & |
pass | ) |
|
|
private |
◆ WriteOutRecord()
void taco::ExternalSort::WriteOutRecord |
( |
Record & |
rec | ) |
|
|
private |
◆ m_cmp
compare functions for sorting items.
◆ m_current_pass
uint8_t taco::ExternalSort::m_current_pass |
|
private |
0 or 1, indicates which file current pass is using.
◆ m_file
std::unique_ptr<File> taco::ExternalSort::m_file[2] |
|
private |
tmp files used in external sorting
◆ m_inputbuf
char* taco::ExternalSort::m_inputbuf |
|
private |
◆ m_merge_ways
size_t taco::ExternalSort::m_merge_ways |
|
private |
how many ways of merge this operator allows, implicitly define memory budget as (m_merge_ways + 1) pages.
◆ m_outbuf
char* taco::ExternalSort::m_outbuf |
|
private |
◆ m_outpg
◆ m_output_pos
◆ m_run_boundaries
std::vector<PageNumber> taco::ExternalSort::m_run_boundaries |
|
private |
Saves the page boundaries of last pass sorted runs.
The documentation for this class was generated from the following files: