Record Types
Most python avro libraries represent records as python dicts, this is convenient, but has a performance impact at high volume.
By default, cavro represents record values as instances of a Record subclass.
When a schema is parsed, cavro dynamically creates a subclasses of cavro.Record for each record type in the schema. These subclasses have the same name as the avro record, and are efficiently populated with values on decode.
Reading Records
import cavro
avro_data = b'\x08JohnT' # Loaded from somewhere
schema = cavro.Schema({
'type': 'record',
'name': 'Example',
'fields': [
{'name': 'name', 'type': 'string'},
{'name': 'age', 'type': 'int'},
]
})
record = schema.binary_decode(avro_data)
print(record)
print(type(record))
<Record:Example {name: 'John' age: 42}>
<class '__main__.Example'>
Accessing Fields
Getting Values
Record fields can be accessed like any class, or like a dictionary:
print(f'Name: {record.name}')
print(f'Age: {record["age"]}')
Name: John
Age: 42
There is also an _asdict
method that returns all fields as a python dict
:
record._asdict()
{'name': 'John', 'age': 42}
Setting Values
Fields can be assigned to:
record.name = 'Jane'
print(record)
<Record:Example {name: 'Jane' age: 42}>
Creating Records
Records can be created in several different ways:
# Get the record type from the schema
Example = schema.named_types['Example'].record
rec1 = Example(record) # From an existing record
rec2 = Example(name='Jane', age=42) # From keyword arguments
rec3 = Example({'name': 'Jane', 'age': 42}) # From a dict
assert record == rec1 == rec2 == rec3
print(rec3)
<Record:Example {name: 'Jane' age: 42}>
Getting the Schema
Record classes have a class attribute Type
that is the AvroType for the record:
print(record.Type)
<cavro.RecordType object at 0x110d65d80>
The list of fields can be accessed via a record's Type
:
print([field.name for field in record.Type.fields])
print({field.name: record[field.name] for field in record.Type.fields})
['name', 'age']
{'name': 'Jane', 'age': 42}
Record Options
There are several cavro.Options options that control how record values are created/used:
record_decodes_to_dict
With this option, decoding a record value returns a dict object rather than a Record instance:
schema2 = cavro.Schema({
'type': 'record',
'name': 'Example',
'fields': [
{'name': 'name', 'type': 'string'},
{'name': 'age', 'type': 'int'},
]
}, record_decodes_to_dict=True)
# This is a dict not a Record type
record = schema2.binary_decode(avro_data)
print(record)
{'name': 'John', 'age': 42}
record_can_encode_dict
By default, cavro
allows dicts to be passed wherever a record is expected. If this option is set to False, then that is disallowed:
schema.binary_encode({'name': 'Bob', 'age': 35})
print('Default allows dicts')
schema2 = cavro.Schema(schema.schema, record_can_encode_dict=False)
print("But we can disable it:")
try:
schema2.binary_encode({'name': 'Bob', 'age': 35})
except Exception as e:
print(e)
Default allows dicts
But we can disable it:
Invalid value {'name': 'Bob', 'age': 35} for type record at Example
record_values_type_hint
This is a non-standard option that allows passing an extra key '-type' in dicts when encoding them. The '-type' value must match the name of the record:
schema3 = cavro.Schema([
{'type': 'record', 'name': 'A', 'fields': [{'name': 'value', 'type': 'int'}]},
{'type': 'record', 'name': 'B', 'fields': [{'name': 'value', 'type': 'long'}]},
], record_values_type_hint=True)
def pp(val, msg):
decoded = schema3.binary_decode(val)
print(f'{msg}\t{decoded}')
# There is no reliable way (using dicts) to tell cavro which record type to use.
# You can just use anything that works by passing the values:
pp(schema3.binary_encode({'value': 42}), 'Plain Dict:')
# Or you can construct the record type yourself:
pp(schema3.binary_encode(schema3.named_types['B'].record(value=42)), 'Typed:')
# Or using the type hint:
pp(schema3.binary_encode({'-type': 'A', 'value': 42}), 'Select A with -type:')
pp(schema3.binary_encode({'-type': 'B', 'value': 42}), 'Select B with -type:')
Plain Dict: <Record:A {value: 42}>
Typed: <Record:B {value: 42}>
Select A with -type: <Record:A {value: 42}>
Select B with -type: <Record:B {value: 42}>
record_allow_extra_fields
By default, if you encode a dict using cavro, extra fields in the dict are silently ignored.
Disabling this behaviour causes an exception to be raised:
schema_no_extra_fields = cavro.Schema(schema.schema, record_allow_extra_fields=False)
print(schema.binary_encode({'name': 'Bob', 'age': 35, 'height': 1.1}))
try:
schema_no_extra_fields.binary_encode({'name': 'Bob', 'age': 35, 'height': 1.1})
except Exception as e:
print(e)
b'\x06BobF'
Invalid value '...' for type record at height
record_encode_use_defaults
If a schema defines a field as having a default value, and that value is not provided when creating a record, then cavro will supply the default value automatically, this option can turn that off:
schema4 = cavro.Schema({
'type': 'record',
'name': 'Example',
'fields': [
{'name': 'name', 'type': 'string', 'default': 'JDoe'},
{'name': 'age', 'type': 'int', 'default': 25},
]
})
schema4_no_default = cavro.Schema(schema4.schema, record_encode_use_defaults=False)
# Normally, the default values are used:
print(schema4.binary_decode(schema4.binary_encode({})))
# But when disabled...
try:
schema4_no_default.binary_encode({})
except Exception as e:
print(e)
<Record:Example {name: 'JDoe' age: 25}>
Invalid value '<missing>' for type record at age
Record Compatibility
When using dict
s to encode values, cavro
checks that each key/value in the dict is appropriate for the record, and encodes it.
Using record types, by default cavro
will check that the value is an instance
of the record value type for the schema.
This normally works fine, but in some cases, can cause surprising errors where values are read with one schema and written with another.
To work around the issue, cavro will fall back to a record compatibility check when a value type does not match the exact type of the schema.
This behaviour can be controlled using the adapt_record_types
option:
base_schema = cavro.Schema({'type': 'record', 'name': 'X', 'fields': [{'name': 'a', 'type': 'int'}]})
record = base_schema.named_types['X'].record(a=1)
The actual class type of record is specific to the schema, so let's look at its ID:
print(f'Base ID: {id(type(record))}: {record}')
Base ID: 5199703920: <Record:X {a: 1}>
If we create a new schema object with an identical cavro schema, we get a new type:
similar_schema = cavro.Schema(base_schema.schema)
similar_record = similar_schema.named_types['X'].record(a=1)
# The ID of the class will be different, even if they look the same:
print(f'Similar ID: {id(type(similar_record))}: {similar_record}')
print('Type classes are the same:', type(record) == type(similar_record))
Similar ID: 4572559760: <Record:X {a: 1}>
Type classes are the same: False
But we can encode the record because the schemas match:
print(similar_schema.binary_encode(record))
b'\x02'
If either the type name, or the fields don't match, then there is an error:
incompatible_schema = cavro.Schema({'type': 'record', 'name': 'Y', 'fields': [{'name': 'a', 'type': 'int'}]})
try:
incompatible_schema.binary_encode(record)
except Exception as e:
print(e)
Record <Record:X {a: 1}> cannot be adapted to <cavro.RecordType object at 0x117f3aef0>
Or if the adapt_record_types
option is set to false:
strict_schema = cavro.Schema(base_schema.schema, adapt_record_types=False)
try:
strict_schema.binary_encode(record)
except Exception as e:
print(e)
Record <Record:X {a: 1}> cannot be adapted to <cavro.RecordType object at 0x117f3b130>