taco-db
0.1.0
|
A Schema object stores the information for accessing an ordered set of typed fields either from a disk-based record payload, or from an in-memory Datum/DatumRef array. More...
#include <catalog/Schema.h>
Classes | |
struct | FieldInfo |
Public Member Functions | |
void | ComputeLayout () |
Computes the layout of the record payload with this schema, using the global catalog cache g_db->catcache(). More... | |
void | ComputeLayout (BootstrapCatCache *catcache) |
Computes the layout of the record payload with this schema, using the provided Bootstrap catelog cache. More... | |
void | CollectTypeInfo () |
Collects the type info only without computing the record payload layout. More... | |
bool | IsTypeInfoCollected () const |
bool | IsLayoutComputed () const |
Oid | GetFieldTypeId (FieldId field_id) const |
Returns the type Oid of a field. More... | |
uint64_t | GetFieldTypeParam (FieldId field_id) const |
Returns the type parameter of a field. More... | |
absl::string_view | GetFieldName (FieldId field_id) const |
Returns the field name of field field_id . More... | |
bool | FieldIsNullable (FieldId field_id) const |
Returns whether this field is nullable. More... | |
bool | FieldPassByRef (FieldId field_id) const |
Returns whether this field is passed by reference or by value in memory. More... | |
FieldOffset | GetFieldLength (size_t field_id) const |
Returns the cached size of a field. More... | |
FieldId | GetNumFields () const |
Returns the total number of fields. More... | |
FieldId | GetFieldIdFromFieldName (absl::string_view field_name) const |
Returns the field ID of the field with the `‘field_name’'. More... | |
std::pair< FieldOffset, FieldOffset > | GetOffsetAndLength (FieldId field_id, const char *payload) const |
Returns the offset and the length of a field in this schema as a pair. More... | |
FieldOffset | WritePayloadToBuffer (const std::vector< Datum > &data, maxaligned_char_buf &buf) const |
See Schema::WritePayloadToBufferImpl(). More... | |
FieldOffset | WritePayloadToBuffer (const std::vector< DatumRef > &data, maxaligned_char_buf &buf) const |
See Schema::WritePayloadToBufferImpl(). More... | |
FieldOffset | WritePayloadToBuffer (const std::vector< NullableDatumRef > &data, maxaligned_char_buf &buf) const |
See Schema::WritePayloadToBufferImpl(). More... | |
bool | FieldIsNull (FieldId field_id, const char *payload) const |
Returns whether a field is null or not in a record payload. More... | |
Datum | GetField (FieldId field_id, const char *payload) const |
Returns a field in the payload as a Datum. More... | |
std::vector< Datum > | DissemblePayload (const char *payload) const |
Dissemble the payload into a vector of Datums in field order as defined in the schema. More... | |
Static Public Member Functions | |
static Schema * | Create (const std::vector< Oid > &typid, const std::vector< uint64_t > &typparam, const std::vector< bool > &nullable) |
Creates a new schema with the given types and field names. More... | |
static Schema * | Create (const std::vector< Oid > &typid, const std::vector< uint64_t > &typparam, const std::vector< bool > &nullable, std::vector< std::string > field_names) |
Creates a new schema with the given types without field names. More... | |
static Schema * | Combine (const Schema *left, const Schema *right) |
Combines two schema into one. More... | |
static bool | Identical (const Schema *left, const Schema *right) |
Returns whether the two schemas are identical (i.e., having the same numbers of fields, the same types for the fields with the same field id, and the nullness of the fields). More... | |
static bool | Compatible (const Schema *left, const Schema *right) |
Returns if the schema on the right is union compatible with the left (i.e., having the same number of fields, the same types for each field with same field id, and making sure right is not nullable if left does not allow nulls.) More... | |
Private Member Functions | |
Schema (const std::vector< Oid > &typid, const std::vector< uint64_t > &typparam, const std::vector< bool > &nullable, std::vector< std::string > field_names) | |
Constructs a Schema object without initializing its payload. More... | |
Schema (const std::vector< FieldInfo > &fields, const std::vector< std::string > &field_names) | |
Schema ()=default | |
The default constructor for constructing a fake schema. More... | |
template<class CCache > | |
void | ComputeLayoutImpl (CCache *catcache, bool cache_typinfo_only) |
The generic implementation of schema layout computation with some catalog cache class CCache. More... | |
void | EnsureLayoutComputed () const |
void | EnsureTypeInfoCollected () const |
template<class SomeDatum > | |
FieldOffset | WritePayloadToBufferImpl (const std::vector< SomeDatum > &data, maxaligned_char_buf &buf) const |
Convert the data as bytes in storage layout and append them to the buf without clear it first. More... | |
Private Attributes | |
bool | m_type_info_collected |
whether the layout has been computed More... | |
bool | m_layout_computed |
bool | m_has_only_nonnullable_fixedlen_fields |
FieldId | m_num_nonnullable_fixedlen_fields |
FieldId | m_num_nullable_fixedlen_fields |
FieldId | m_num_varlen_fields |
FieldOffset | m_null_bitmap_begin |
The offset to the null bitmap. More... | |
FieldOffset | m_varlen_end_array_begin |
The offset to the beginning of the variable-length field end array. More... | |
FieldOffset | m_varlen_payload_begin |
The offset to the beginning of the varlen payload. More... | |
std::vector< FieldId > | m_field_reorder_idx |
The order of the fields to be placed in the actual payload. More... | |
std::vector< FieldInfo > | m_field |
information about the individual fields More... | |
std::vector< std::string > | m_field_names |
optional field names (may be empty or of the same length as m_field) More... | |
A Schema object stores the information for accessing an ordered set of typed fields either from a disk-based record payload, or from an in-memory Datum/DatumRef array.
A field value is always treated as an array of fixed-length or variable-length bytes with certain alignment requirements, and one needs to use the type-dependent functions to interpret the field value (which is not handled by Schema). A successfully computed Schema object supports queries of the field nullness, offset and length in a max-aligned record payload and guarantees the offset returned is suitable for read or write for the specified type.
The fields may be optionally named, which is usually the case for a schema object created for a record read from or derived from a file, or for the result of query processing.
The layout of the fields is:
| non-nullable fixed-len fields | | null bitmap | | varlen field end array + alignment padding | | varlen fields | | nullable fixed-len fields |
The entire payload is always maximum aligned to 8-byte boundary at the end.
|
private |
Constructs a Schema object without initializing its payload.
The vectors typid, typparam and nullable must be of the same length and may not be empty. The vector field_names may be either empty or of the same length as typid.
typid | a vector of type Oid of the fields in this sc |
typparam | a vector of type parameter of the fields |
nullable | a vector of nullability of the fields |
field_names | a vector of field_names |
|
private |
|
privatedefault |
The default constructor for constructing a fake schema.
void taco::Schema::CollectTypeInfo | ( | ) |
Collects the type info only without computing the record payload layout.
This may be used when the Schema is in-memory only, i.e., the caller does not need to call the field extraction and serialization functions, (e.g., DissemblePayload, GetField, WritePayloadToBuffer).
Combines two schema into one.
Returns if the schema on the right is union compatible with the left (i.e., having the same number of fields, the same types for each field with same field id, and making sure right is not nullable if left does not allow nulls.)
void taco::Schema::ComputeLayout | ( | ) |
Computes the layout of the record payload with this schema, using the global catalog cache g_db->catcache().
This also collects the type info.
void taco::Schema::ComputeLayout | ( | BootstrapCatCache * | catcache | ) |
Computes the layout of the record payload with this schema, using the provided Bootstrap catelog cache.
This also collects the type info.
This should only be used at DB startup and testing.
|
private |
The generic implementation of schema layout computation with some catalog cache class CCache.
CCache is required to have a function `‘const SysTable_Type *FindType(Oid)’'.
|
static |
Creates a new schema with the given types and field names.
The returned schema object is not fully initialized. To do so, the caller either needs to call Schema::ComputeLayout()
(for a schema that will be used for reading/writing record payloads) or Schema::CollectTypeInfo()
(for a schema that will be used for caching the type info only).
|
static |
Creates a new schema with the given types without field names.
If one calls GetFieldName()
on the returned schema object, it will return an empty string. The returned schema object is not fully initialized. To do so, the caller either needs to call Schema::ComputeLayout()
(for a schema that will be used for reading/writing record payloads) or Schema::CollectTypeInfo()
(for a schema that will be used for caching the type info only).
std::vector< Datum > taco::Schema::DissemblePayload | ( | const char * | payload | ) | const |
Dissemble the payload into a vector of Datums in field order as defined in the schema.
The data in the returned vector are a reference into the passed payload so the payload must be alive when the return value is used. However, any in-place change to payload may or may not be reflected in the returned data vector.
|
inlineprivate |
|
inlineprivate |
bool taco::Schema::FieldIsNull | ( | FieldId | field_id, |
const char * | payload | ||
) | const |
Returns whether a field is null or not in a record payload.
|
inline |
Returns whether this field is nullable.
|
inline |
Returns whether this field is passed by reference or by value in memory.
Note that both variable-length fields and some fixed-length fields (that are not 1,2,4,8 bytes long) are passed by reference.
Returns a field in the payload as a Datum.
The returned datum references the payload so the payload must be alive when the return value is use. However, any in-place change to payload may or may not be reflected in the returned datum.
FieldId taco::Schema::GetFieldIdFromFieldName | ( | absl::string_view | field_name | ) | const |
Returns the field ID of the field with the `‘field_name’'.
Or InvalidFieldId if there's no such field.
|
inline |
Returns the cached size of a field.
If this field is variable-length or is fixed-length with unknown type parameter, returns -1.
|
inline |
Returns the field name of field field_id
.
If no field name was given when creating this schema object, it returns an empty string.
|
inline |
Returns the type parameter of a field.
|
inline |
Returns the total number of fields.
std::pair< FieldOffset, FieldOffset > taco::Schema::GetOffsetAndLength | ( | FieldId | field_id, |
const char * | payload | ||
) | const |
Returns the offset and the length of a field in this schema as a pair.
field_id | the column ID of the field |
payload | the record payload |
Returns whether the two schemas are identical (i.e., having the same numbers of fields, the same types for the fields with the same field id, and the nullness of the fields).
|
inline |
|
inline |
FieldOffset taco::Schema::WritePayloadToBuffer | ( | const std::vector< Datum > & | data, |
maxaligned_char_buf & | buf | ||
) | const |
FieldOffset taco::Schema::WritePayloadToBuffer | ( | const std::vector< DatumRef > & | data, |
maxaligned_char_buf & | buf | ||
) | const |
FieldOffset taco::Schema::WritePayloadToBuffer | ( | const std::vector< NullableDatumRef > & | data, |
maxaligned_char_buf & | buf | ||
) | const |
|
private |
Convert the data as bytes in storage layout and append them to the buf without clear it first.
This allows one to add an optional header before the payload. buf will be MAXALIGN'd before any data is appended into it.
buf
is an std::vector of char with a different allocator that always uses aligned_alloc for allocating buffer spaces aligned to 8-byte boundaries. Always use the type alias maxaligned_char_buf to declare or define such a buffer (see base/tdb_base.h).
It is undefined if the size of data is not the same as GetNumFields()
.
|
private |
information about the individual fields
|
private |
optional field names (may be empty or of the same length as m_field)
|
private |
The order of the fields to be placed in the actual payload.
E.g., m_field_order_idx = { 0, 2, 3, 1} means the field 0 is first in the payload, the field 2 is the second, the field 3 is the third, and the field 1 is the fourth.
|
private |
|
private |
|
private |
The offset to the null bitmap.
|
private |
|
private |
|
private |
|
private |
whether the layout has been computed
|
private |
The offset to the beginning of the variable-length field end array.
This is the length of the entire record if there's no variable-length field.
|
private |
The offset to the beginning of the varlen payload.
This might not be properly aligned to the first field in the payload. So anyone using this offset needs to ensure the data are aligned per their alignment requirements.
However, in case this is a schema with only non-nullable fixed-length fields, or this is a build with fixed-length data page only, this is the record length.