taco-db  0.1.0
Classes | Public Member Functions | Static Public Member Functions | Private Member Functions | Private Attributes | List of all members
taco::Schema Class Reference

A Schema object stores the information for accessing an ordered set of typed fields either from a disk-based record payload, or from an in-memory Datum/DatumRef array. More...

#include <catalog/Schema.h>

Classes

struct  FieldInfo
 

Public Member Functions

void ComputeLayout ()
 Computes the layout of the record payload with this schema, using the global catalog cache g_db->catcache(). More...
 
void ComputeLayout (BootstrapCatCache *catcache)
 Computes the layout of the record payload with this schema, using the provided Bootstrap catelog cache. More...
 
void CollectTypeInfo ()
 Collects the type info only without computing the record payload layout. More...
 
bool IsTypeInfoCollected () const
 
bool IsLayoutComputed () const
 
Oid GetFieldTypeId (FieldId field_id) const
 Returns the type Oid of a field. More...
 
uint64_t GetFieldTypeParam (FieldId field_id) const
 Returns the type parameter of a field. More...
 
absl::string_view GetFieldName (FieldId field_id) const
 Returns the field name of field field_id. More...
 
bool FieldIsNullable (FieldId field_id) const
 Returns whether this field is nullable. More...
 
bool FieldPassByRef (FieldId field_id) const
 Returns whether this field is passed by reference or by value in memory. More...
 
FieldOffset GetFieldLength (size_t field_id) const
 Returns the cached size of a field. More...
 
FieldId GetNumFields () const
 Returns the total number of fields. More...
 
FieldId GetFieldIdFromFieldName (absl::string_view field_name) const
 Returns the field ID of the field with the `‘field_name’'. More...
 
std::pair< FieldOffset, FieldOffsetGetOffsetAndLength (FieldId field_id, const char *payload) const
 Returns the offset and the length of a field in this schema as a pair. More...
 
FieldOffset WritePayloadToBuffer (const std::vector< Datum > &data, maxaligned_char_buf &buf) const
 See Schema::WritePayloadToBufferImpl(). More...
 
FieldOffset WritePayloadToBuffer (const std::vector< DatumRef > &data, maxaligned_char_buf &buf) const
 See Schema::WritePayloadToBufferImpl(). More...
 
FieldOffset WritePayloadToBuffer (const std::vector< NullableDatumRef > &data, maxaligned_char_buf &buf) const
 See Schema::WritePayloadToBufferImpl(). More...
 
bool FieldIsNull (FieldId field_id, const char *payload) const
 Returns whether a field is null or not in a record payload. More...
 
Datum GetField (FieldId field_id, const char *payload) const
 Returns a field in the payload as a Datum. More...
 
std::vector< DatumDissemblePayload (const char *payload) const
 Dissemble the payload into a vector of Datums in field order as defined in the schema. More...
 

Static Public Member Functions

static SchemaCreate (const std::vector< Oid > &typid, const std::vector< uint64_t > &typparam, const std::vector< bool > &nullable)
 Creates a new schema with the given types and field names. More...
 
static SchemaCreate (const std::vector< Oid > &typid, const std::vector< uint64_t > &typparam, const std::vector< bool > &nullable, std::vector< std::string > field_names)
 Creates a new schema with the given types without field names. More...
 
static SchemaCombine (const Schema *left, const Schema *right)
 Combines two schema into one. More...
 
static bool Identical (const Schema *left, const Schema *right)
 Returns whether the two schemas are identical (i.e., having the same numbers of fields, the same types for the fields with the same field id, and the nullness of the fields). More...
 
static bool Compatible (const Schema *left, const Schema *right)
 Returns if the schema on the right is union compatible with the left (i.e., having the same number of fields, the same types for each field with same field id, and making sure right is not nullable if left does not allow nulls.) More...
 

Private Member Functions

 Schema (const std::vector< Oid > &typid, const std::vector< uint64_t > &typparam, const std::vector< bool > &nullable, std::vector< std::string > field_names)
 Constructs a Schema object without initializing its payload. More...
 
 Schema (const std::vector< FieldInfo > &fields, const std::vector< std::string > &field_names)
 
 Schema ()=default
 The default constructor for constructing a fake schema. More...
 
template<class CCache >
void ComputeLayoutImpl (CCache *catcache, bool cache_typinfo_only)
 The generic implementation of schema layout computation with some catalog cache class CCache. More...
 
void EnsureLayoutComputed () const
 
void EnsureTypeInfoCollected () const
 
template<class SomeDatum >
FieldOffset WritePayloadToBufferImpl (const std::vector< SomeDatum > &data, maxaligned_char_buf &buf) const
 Convert the data as bytes in storage layout and append them to the buf without clear it first. More...
 

Private Attributes

bool m_type_info_collected
 whether the layout has been computed More...
 
bool m_layout_computed
 
bool m_has_only_nonnullable_fixedlen_fields
 
FieldId m_num_nonnullable_fixedlen_fields
 
FieldId m_num_nullable_fixedlen_fields
 
FieldId m_num_varlen_fields
 
FieldOffset m_null_bitmap_begin
 The offset to the null bitmap. More...
 
FieldOffset m_varlen_end_array_begin
 The offset to the beginning of the variable-length field end array. More...
 
FieldOffset m_varlen_payload_begin
 The offset to the beginning of the varlen payload. More...
 
std::vector< FieldIdm_field_reorder_idx
 The order of the fields to be placed in the actual payload. More...
 
std::vector< FieldInfom_field
 information about the individual fields More...
 
std::vector< std::string > m_field_names
 optional field names (may be empty or of the same length as m_field) More...
 

Detailed Description

A Schema object stores the information for accessing an ordered set of typed fields either from a disk-based record payload, or from an in-memory Datum/DatumRef array.

A field value is always treated as an array of fixed-length or variable-length bytes with certain alignment requirements, and one needs to use the type-dependent functions to interpret the field value (which is not handled by Schema). A successfully computed Schema object supports queries of the field nullness, offset and length in a max-aligned record payload and guarantees the offset returned is suitable for read or write for the specified type.

The fields may be optionally named, which is usually the case for a schema object created for a record read from or derived from a file, or for the result of query processing.

The layout of the fields is:

| non-nullable fixed-len fields | | null bitmap | | varlen field end array + alignment padding | | varlen fields | | nullable fixed-len fields |

The entire payload is always maximum aligned to 8-byte boundary at the end.

Constructor & Destructor Documentation

◆ Schema() [1/3]

taco::Schema::Schema ( const std::vector< Oid > &  typid,
const std::vector< uint64_t > &  typparam,
const std::vector< bool > &  nullable,
std::vector< std::string >  field_names 
)
private

Constructs a Schema object without initializing its payload.

The vectors typid, typparam and nullable must be of the same length and may not be empty. The vector field_names may be either empty or of the same length as typid.

Parameters
typida vector of type Oid of the fields in this sc
typparama vector of type parameter of the fields
nullablea vector of nullability of the fields
field_namesa vector of field_names

◆ Schema() [2/3]

taco::Schema::Schema ( const std::vector< FieldInfo > &  fields,
const std::vector< std::string > &  field_names 
)
private

◆ Schema() [3/3]

taco::Schema::Schema ( )
privatedefault

The default constructor for constructing a fake schema.

Member Function Documentation

◆ CollectTypeInfo()

void taco::Schema::CollectTypeInfo ( )

Collects the type info only without computing the record payload layout.

This may be used when the Schema is in-memory only, i.e., the caller does not need to call the field extraction and serialization functions, (e.g., DissemblePayload, GetField, WritePayloadToBuffer).

◆ Combine()

Schema * taco::Schema::Combine ( const Schema left,
const Schema right 
)
static

Combines two schema into one.

◆ Compatible()

bool taco::Schema::Compatible ( const Schema left,
const Schema right 
)
static

Returns if the schema on the right is union compatible with the left (i.e., having the same number of fields, the same types for each field with same field id, and making sure right is not nullable if left does not allow nulls.)

◆ ComputeLayout() [1/2]

void taco::Schema::ComputeLayout ( )

Computes the layout of the record payload with this schema, using the global catalog cache g_db->catcache().

This also collects the type info.

◆ ComputeLayout() [2/2]

void taco::Schema::ComputeLayout ( BootstrapCatCache catcache)

Computes the layout of the record payload with this schema, using the provided Bootstrap catelog cache.

This also collects the type info.

This should only be used at DB startup and testing.

◆ ComputeLayoutImpl()

template<class CCache >
void taco::Schema::ComputeLayoutImpl ( CCache *  catcache,
bool  cache_typinfo_only 
)
private

The generic implementation of schema layout computation with some catalog cache class CCache.

CCache is required to have a function `‘const SysTable_Type *FindType(Oid)’'.

◆ Create() [1/2]

Schema * taco::Schema::Create ( const std::vector< Oid > &  typid,
const std::vector< uint64_t > &  typparam,
const std::vector< bool > &  nullable 
)
static

Creates a new schema with the given types and field names.

The returned schema object is not fully initialized. To do so, the caller either needs to call Schema::ComputeLayout() (for a schema that will be used for reading/writing record payloads) or Schema::CollectTypeInfo() (for a schema that will be used for caching the type info only).

◆ Create() [2/2]

Schema * taco::Schema::Create ( const std::vector< Oid > &  typid,
const std::vector< uint64_t > &  typparam,
const std::vector< bool > &  nullable,
std::vector< std::string >  field_names 
)
static

Creates a new schema with the given types without field names.

If one calls GetFieldName() on the returned schema object, it will return an empty string. The returned schema object is not fully initialized. To do so, the caller either needs to call Schema::ComputeLayout() (for a schema that will be used for reading/writing record payloads) or Schema::CollectTypeInfo() (for a schema that will be used for caching the type info only).

◆ DissemblePayload()

std::vector< Datum > taco::Schema::DissemblePayload ( const char *  payload) const

Dissemble the payload into a vector of Datums in field order as defined in the schema.

The data in the returned vector are a reference into the passed payload so the payload must be alive when the return value is used. However, any in-place change to payload may or may not be reflected in the returned data vector.

◆ EnsureLayoutComputed()

void taco::Schema::EnsureLayoutComputed ( ) const
inlineprivate

◆ EnsureTypeInfoCollected()

void taco::Schema::EnsureTypeInfoCollected ( ) const
inlineprivate

◆ FieldIsNull()

bool taco::Schema::FieldIsNull ( FieldId  field_id,
const char *  payload 
) const

Returns whether a field is null or not in a record payload.

◆ FieldIsNullable()

bool taco::Schema::FieldIsNullable ( FieldId  field_id) const
inline

Returns whether this field is nullable.

◆ FieldPassByRef()

bool taco::Schema::FieldPassByRef ( FieldId  field_id) const
inline

Returns whether this field is passed by reference or by value in memory.

Note that both variable-length fields and some fixed-length fields (that are not 1,2,4,8 bytes long) are passed by reference.

◆ GetField()

Datum taco::Schema::GetField ( FieldId  field_id,
const char *  payload 
) const

Returns a field in the payload as a Datum.

The returned datum references the payload so the payload must be alive when the return value is use. However, any in-place change to payload may or may not be reflected in the returned datum.

◆ GetFieldIdFromFieldName()

FieldId taco::Schema::GetFieldIdFromFieldName ( absl::string_view  field_name) const

Returns the field ID of the field with the `‘field_name’'.

Or InvalidFieldId if there's no such field.

◆ GetFieldLength()

FieldOffset taco::Schema::GetFieldLength ( size_t  field_id) const
inline

Returns the cached size of a field.

If this field is variable-length or is fixed-length with unknown type parameter, returns -1.

◆ GetFieldName()

absl::string_view taco::Schema::GetFieldName ( FieldId  field_id) const
inline

Returns the field name of field field_id.

If no field name was given when creating this schema object, it returns an empty string.

◆ GetFieldTypeId()

Oid taco::Schema::GetFieldTypeId ( FieldId  field_id) const
inline

Returns the type Oid of a field.

◆ GetFieldTypeParam()

uint64_t taco::Schema::GetFieldTypeParam ( FieldId  field_id) const
inline

Returns the type parameter of a field.

◆ GetNumFields()

FieldId taco::Schema::GetNumFields ( ) const
inline

Returns the total number of fields.

◆ GetOffsetAndLength()

std::pair< FieldOffset, FieldOffset > taco::Schema::GetOffsetAndLength ( FieldId  field_id,
const char *  payload 
) const

Returns the offset and the length of a field in this schema as a pair.

Parameters
field_idthe column ID of the field
payloadthe record payload

◆ Identical()

bool taco::Schema::Identical ( const Schema left,
const Schema right 
)
static

Returns whether the two schemas are identical (i.e., having the same numbers of fields, the same types for the fields with the same field id, and the nullness of the fields).

◆ IsLayoutComputed()

bool taco::Schema::IsLayoutComputed ( ) const
inline

◆ IsTypeInfoCollected()

bool taco::Schema::IsTypeInfoCollected ( ) const
inline

◆ WritePayloadToBuffer() [1/3]

FieldOffset taco::Schema::WritePayloadToBuffer ( const std::vector< Datum > &  data,
maxaligned_char_buf buf 
) const

◆ WritePayloadToBuffer() [2/3]

FieldOffset taco::Schema::WritePayloadToBuffer ( const std::vector< DatumRef > &  data,
maxaligned_char_buf buf 
) const

◆ WritePayloadToBuffer() [3/3]

FieldOffset taco::Schema::WritePayloadToBuffer ( const std::vector< NullableDatumRef > &  data,
maxaligned_char_buf buf 
) const

◆ WritePayloadToBufferImpl()

template<class SomeDatum >
FieldOffset taco::Schema::WritePayloadToBufferImpl ( const std::vector< SomeDatum > &  data,
maxaligned_char_buf buf 
) const
private

Convert the data as bytes in storage layout and append them to the buf without clear it first.

This allows one to add an optional header before the payload. buf will be MAXALIGN'd before any data is appended into it.

buf is an std::vector of char with a different allocator that always uses aligned_alloc for allocating buffer spaces aligned to 8-byte boundaries. Always use the type alias maxaligned_char_buf to declare or define such a buffer (see base/tdb_base.h).

It is undefined if the size of data is not the same as GetNumFields().

Returns
the length of the payload (not including the initial MAXALIGN padding), or -1 if the length of the buf exceeds the maximum limit (max FieldOffset)

Member Data Documentation

◆ m_field

std::vector<FieldInfo> taco::Schema::m_field
private

information about the individual fields

◆ m_field_names

std::vector<std::string> taco::Schema::m_field_names
private

optional field names (may be empty or of the same length as m_field)

◆ m_field_reorder_idx

std::vector<FieldId> taco::Schema::m_field_reorder_idx
private

The order of the fields to be placed in the actual payload.

E.g., m_field_order_idx = { 0, 2, 3, 1} means the field 0 is first in the payload, the field 2 is the second, the field 3 is the third, and the field 1 is the fourth.

◆ m_has_only_nonnullable_fixedlen_fields

bool taco::Schema::m_has_only_nonnullable_fixedlen_fields
private

◆ m_layout_computed

bool taco::Schema::m_layout_computed
private

◆ m_null_bitmap_begin

FieldOffset taco::Schema::m_null_bitmap_begin
private

The offset to the null bitmap.

◆ m_num_nonnullable_fixedlen_fields

FieldId taco::Schema::m_num_nonnullable_fixedlen_fields
private

◆ m_num_nullable_fixedlen_fields

FieldId taco::Schema::m_num_nullable_fixedlen_fields
private

◆ m_num_varlen_fields

FieldId taco::Schema::m_num_varlen_fields
private

◆ m_type_info_collected

bool taco::Schema::m_type_info_collected
private

whether the layout has been computed

◆ m_varlen_end_array_begin

FieldOffset taco::Schema::m_varlen_end_array_begin
private

The offset to the beginning of the variable-length field end array.

This is the length of the entire record if there's no variable-length field.

◆ m_varlen_payload_begin

FieldOffset taco::Schema::m_varlen_payload_begin
private

The offset to the beginning of the varlen payload.

This might not be properly aligned to the first field in the payload. So anyone using this offset needs to ensure the data are aligned per their alignment requirements.

However, in case this is a schema with only non-nullable fixed-length fields, or this is a build with fixed-length data page only, this is the record length.


The documentation for this class was generated from the following files: