Skip to main content

Record Types

Most python avro libraries represent records as python dicts, this is convenient, but has a performance impact at high volume.

By default, cavro represents record values as instances of a Record subclass.

When a schema is parsed, cavro dynamically creates a subclasses of cavro.Record for each record type in the schema. These subclasses have the same name as the avro record, and are efficiently populated with values on decode.

Reading Records

import cavro
avro_data = b'\x08JohnT' # Loaded from somewhere

schema = cavro.Schema({
'type': 'record',
'name': 'Example',
'fields': [
{'name': 'name', 'type': 'string'},
{'name': 'age', 'type': 'int'},
]
})
record = schema.binary_decode(avro_data)
print(record)
print(type(record))
<Record:Example {name: 'John' age: 42}>
<class '__main__.Example'>

Accessing Fields

Getting Values

Record fields can be accessed like any class, or like a dictionary:

print(f'Name: {record.name}')
print(f'Age: {record["age"]}')
Name: John
Age: 42

There is also an _asdict method that returns all fields as a python dict:

record._asdict()
{'name': 'John', 'age': 42}

Setting Values

Fields can be assigned to:

record.name = 'Jane'
print(record)
<Record:Example {name: 'Jane' age: 42}>

Creating Records

Records can be created in several different ways:

# Get the record type from the schema
Example = schema.named_types['Example'].record

rec1 = Example(record) # From an existing record
rec2 = Example(name='Jane', age=42) # From keyword arguments
rec3 = Example({'name': 'Jane', 'age': 42}) # From a dict

assert record == rec1 == rec2 == rec3
print(rec3)
<Record:Example {name: 'Jane' age: 42}>

Getting the Schema

Record classes have a class attribute Type that is the AvroType for the record:

print(record.Type)
<cavro.RecordType object at 0x110d65d80>

The list of fields can be accessed via a record's Type:

print([field.name for field in record.Type.fields])
print({field.name: record[field.name] for field in record.Type.fields})
['name', 'age']
{'name': 'Jane', 'age': 42}

Record Options

There are several cavro.Options options that control how record values are created/used:

record_decodes_to_dict

With this option, decoding a record value returns a dict object rather than a Record instance:

schema2 = cavro.Schema({
'type': 'record',
'name': 'Example',
'fields': [
{'name': 'name', 'type': 'string'},
{'name': 'age', 'type': 'int'},
]
}, record_decodes_to_dict=True)

# This is a dict not a Record type
record = schema2.binary_decode(avro_data)

print(record)
{'name': 'John', 'age': 42}

record_can_encode_dict

By default, cavro allows dicts to be passed wherever a record is expected. If this option is set to False, then that is disallowed:

schema.binary_encode({'name': 'Bob', 'age': 35})
print('Default allows dicts')
schema2 = cavro.Schema(schema.schema, record_can_encode_dict=False)
print("But we can disable it:")
try:
schema2.binary_encode({'name': 'Bob', 'age': 35})
except Exception as e:
print(e)
Default allows dicts
But we can disable it:
Invalid value {'name': 'Bob', 'age': 35} for type record at Example

record_values_type_hint

This is a non-standard option that allows passing an extra key '-type' in dicts when encoding them. The '-type' value must match the name of the record:

schema3 = cavro.Schema([
{'type': 'record', 'name': 'A', 'fields': [{'name': 'value', 'type': 'int'}]},
{'type': 'record', 'name': 'B', 'fields': [{'name': 'value', 'type': 'long'}]},
], record_values_type_hint=True)

def pp(val, msg):
decoded = schema3.binary_decode(val)
print(f'{msg}\t{decoded}')

# There is no reliable way (using dicts) to tell cavro which record type to use.
# You can just use anything that works by passing the values:
pp(schema3.binary_encode({'value': 42}), 'Plain Dict:')
# Or you can construct the record type yourself:
pp(schema3.binary_encode(schema3.named_types['B'].record(value=42)), 'Typed:')
# Or using the type hint:
pp(schema3.binary_encode({'-type': 'A', 'value': 42}), 'Select A with -type:')
pp(schema3.binary_encode({'-type': 'B', 'value': 42}), 'Select B with -type:')

Plain Dict: <Record:A {value: 42}>
Typed: <Record:B {value: 42}>
Select A with -type: <Record:A {value: 42}>
Select B with -type: <Record:B {value: 42}>

record_allow_extra_fields

By default, if you encode a dict using cavro, extra fields in the dict are silently ignored.

Disabling this behaviour causes an exception to be raised:

schema_no_extra_fields = cavro.Schema(schema.schema, record_allow_extra_fields=False)

print(schema.binary_encode({'name': 'Bob', 'age': 35, 'height': 1.1}))
try:
schema_no_extra_fields.binary_encode({'name': 'Bob', 'age': 35, 'height': 1.1})
except Exception as e:
print(e)

b'\x06BobF'
Invalid value '...' for type record at height

record_encode_use_defaults

If a schema defines a field as having a default value, and that value is not provided when creating a record, then cavro will supply the default value automatically, this option can turn that off:

schema4 = cavro.Schema({
'type': 'record',
'name': 'Example',
'fields': [
{'name': 'name', 'type': 'string', 'default': 'JDoe'},
{'name': 'age', 'type': 'int', 'default': 25},
]
})
schema4_no_default = cavro.Schema(schema4.schema, record_encode_use_defaults=False)

# Normally, the default values are used:
print(schema4.binary_decode(schema4.binary_encode({})))

# But when disabled...
try:
schema4_no_default.binary_encode({})
except Exception as e:
print(e)
<Record:Example {name: 'JDoe' age: 25}>
Invalid value '<missing>' for type record at age

Record Compatibility

When using dicts to encode values, cavro checks that each key/value in the dict is appropriate for the record, and encodes it.

Using record types, by default cavro will check that the value is an instance of the record value type for the schema.

This normally works fine, but in some cases, can cause surprising errors where values are read with one schema and written with another.

To work around the issue, cavro will fall back to a record compatibility check when a value type does not match the exact type of the schema.

This behaviour can be controlled using the adapt_record_types option:

base_schema = cavro.Schema({'type': 'record', 'name': 'X', 'fields': [{'name': 'a', 'type': 'int'}]})

record = base_schema.named_types['X'].record(a=1)

The actual class type of record is specific to the schema, so let's look at its ID:

print(f'Base ID: {id(type(record))}: {record}')
Base ID: 5199703920: <Record:X {a: 1}>

If we create a new schema object with an identical cavro schema, we get a new type:

similar_schema = cavro.Schema(base_schema.schema)

similar_record = similar_schema.named_types['X'].record(a=1)
# The ID of the class will be different, even if they look the same:
print(f'Similar ID: {id(type(similar_record))}: {similar_record}')
print('Type classes are the same:', type(record) == type(similar_record))
Similar ID: 4572559760: <Record:X {a: 1}>
Type classes are the same: False

But we can encode the record because the schemas match:

print(similar_schema.binary_encode(record))
b'\x02'

If either the type name, or the fields don't match, then there is an error:

incompatible_schema = cavro.Schema({'type': 'record', 'name': 'Y', 'fields': [{'name': 'a', 'type': 'int'}]})
try:
incompatible_schema.binary_encode(record)
except Exception as e:
print(e)
Record <Record:X {a: 1}> cannot be adapted to <cavro.RecordType object at 0x117f3aef0>

Or if the adapt_record_types option is set to false:

strict_schema = cavro.Schema(base_schema.schema, adapt_record_types=False)
try:
strict_schema.binary_encode(record)
except Exception as e:
print(e)
Record <Record:X {a: 1}> cannot be adapted to <cavro.RecordType object at 0x117f3b130>