taco-db  0.1.0
Classes | Public Member Functions | Static Public Member Functions | Private Member Functions | Private Attributes | List of all members
taco::ExternalSort Class Reference

ExternalSort is a general utility class to do external sorting on any number of general bytes. More...

#include <extsort/ExternalSort.h>

Classes

class  OutputIterator
 The output operator that can used to scan through the output of this ExternalSort instance in ascending order defined by comp. More...
 

Public Member Functions

 ~ExternalSort ()
 Deconstructor of ExternalSort More...
 
std::unique_ptr< ItemIteratorSort (ItemIterator *input_iter)
 Given an input_iter, it iterates over input_iter in one pass, and sorts all the returned items using the SortCompareFunction provided during construction. More...
 

Static Public Member Functions

static std::unique_ptr< ExternalSortCreate (const SortCompareFunction &comp, size_t merge_ways)
 Create an ExternalSort instance where comp provides the method of comparision function of two items. More...
 

Private Member Functions

 ExternalSort (const SortCompareFunction &comp, size_t merge_ways)
 
void SortInitialRuns (ItemIterator *item_iter)
 Iterate through item_iterator and create the initial runs based on memory budget. More...
 
void GenerateNewPass (std::vector< PageNumber > &new_run_boundaries)
 Generate a new sorted pass through a m_merge_ways merge from m_file[1 - m_current_pass]. More...
 
PageNumber MergeInternalRuns (size_t low_run, size_t high_run)
 Merge the runs low_run (inclusive) to high_run (inclusive) from last pass whose boundaries are stored in m_run_boundaries. More...
 
void WriteOutInitialPass (std::vector< Record > &pass)
 
void WriteOutRecord (Record &rec)
 

Private Attributes

uint8_t m_current_pass
 0 or 1, indicates which file current pass is using. More...
 
std::unique_ptr< Filem_file [2]
 tmp files used in external sorting More...
 
size_t m_merge_ways
 how many ways of merge this operator allows, implicitly define memory budget as (m_merge_ways + 1) pages. More...
 
SortCompareFunction m_cmp
 compare functions for sorting items. More...
 
std::vector< PageNumberm_run_boundaries
 Saves the page boundaries of last pass sorted runs. More...
 
char * m_inputbuf
 
char * m_outbuf
 
VarlenDataPage m_outpg
 
PageNumber m_output_pos
 

Detailed Description

ExternalSort is a general utility class to do external sorting on any number of general bytes.

It takes a user-defined comparison function to compare the content of two binary buffers and the memory budget for this external sort operator. After calling Sort() function it returns an iterator iterating through sorted items in ascending order.

Constructor & Destructor Documentation

◆ ~ExternalSort()

taco::ExternalSort::~ExternalSort ( )

Deconstructor of ExternalSort

◆ ExternalSort()

taco::ExternalSort::ExternalSort ( const SortCompareFunction comp,
size_t  merge_ways 
)
private

Member Function Documentation

◆ Create()

std::unique_ptr< ExternalSort > taco::ExternalSort::Create ( const SortCompareFunction comp,
size_t  merge_ways 
)
static

Create an ExternalSort instance where comp provides the method of comparision function of two items.

merge_ways indicate the memory budget of this external sorting operator. Specifically, the memory budget for this operator should be roughtly (merge_ways + 1) * PAGE_SIZE

◆ GenerateNewPass()

void taco::ExternalSort::GenerateNewPass ( std::vector< PageNumber > &  new_run_boundaries)
private

Generate a new sorted pass through a m_merge_ways merge from m_file[1 - m_current_pass].

Persist the new sorted runs in m_file[m_current_pass] and save the new run boundaries generated after the merge in new_run_boudaries.

◆ MergeInternalRuns()

PageNumber taco::ExternalSort::MergeInternalRuns ( size_t  low_run,
size_t  high_run 
)
private

Merge the runs low_run (inclusive) to high_run (inclusive) from last pass whose boundaries are stored in m_run_boundaries.

After this, the new merged run should be persistent at the end of m_file[m_current_pass] and return the page boundary of newly generated sorted pass through return value.

◆ Sort()

std::unique_ptr< ItemIterator > taco::ExternalSort::Sort ( ItemIterator input_iter)

Given an input_iter, it iterates over input_iter in one pass, and sorts all the returned items using the SortCompareFunction provided during construction.

Returns a ItemIterator over the sorted results.

The input_iter is allowed to not support rewinding, but the returned output ItemIterator must support rewinding.

◆ SortInitialRuns()

void taco::ExternalSort::SortInitialRuns ( ItemIterator item_iter)
private

Iterate through item_iterator and create the initial runs based on memory budget.

You can use merge_ways * PAGE_SIZE bytes for storing the actual record payloads and another array for storing Record structures that points to payloads. Persist it on m_file[0] and page boundaries of all sorted runs after this call should be saved in m_run_boundaries.

◆ WriteOutInitialPass()

void taco::ExternalSort::WriteOutInitialPass ( std::vector< Record > &  pass)
private

◆ WriteOutRecord()

void taco::ExternalSort::WriteOutRecord ( Record rec)
private

Member Data Documentation

◆ m_cmp

SortCompareFunction taco::ExternalSort::m_cmp
private

compare functions for sorting items.

◆ m_current_pass

uint8_t taco::ExternalSort::m_current_pass
private

0 or 1, indicates which file current pass is using.

◆ m_file

std::unique_ptr<File> taco::ExternalSort::m_file[2]
private

tmp files used in external sorting

◆ m_inputbuf

char* taco::ExternalSort::m_inputbuf
private

◆ m_merge_ways

size_t taco::ExternalSort::m_merge_ways
private

how many ways of merge this operator allows, implicitly define memory budget as (m_merge_ways + 1) pages.

◆ m_outbuf

char* taco::ExternalSort::m_outbuf
private

◆ m_outpg

VarlenDataPage taco::ExternalSort::m_outpg
private

◆ m_output_pos

PageNumber taco::ExternalSort::m_output_pos
private

◆ m_run_boundaries

std::vector<PageNumber> taco::ExternalSort::m_run_boundaries
private

Saves the page boundaries of last pass sorted runs.


The documentation for this class was generated from the following files: