Logical Types
cavro
supports a number of standard avro logical types by default:
for logical_type in cavro.DEFAULT_OPTIONS.logical_types:
name = logical_type.logical_name
underlying_types = logical_type.underlying_types
underlying_names = [ut.type_name for ut in underlying_types]
print(f' * {name} ({", ".join(underlying_names)})')
* decimal (bytes, fixed)
* uuid (string)
* uuid (fixed)
* date (int)
* time-millis (int)
* time-micros (long)
* timestamp-millis (long)
* timestamp-micros (long)
Internally, these are implemented as value adapters on the schema type. Value adapters are primarily used for logical types, but also play a part in schema promotion, and effectively are hooks that can change values before being encoded, or after being decoded.
Normally you shouldn't need to worry about value adapters, but understanding this may make the examples below clearer. A normal schema with no logical types has no value adapters:
cavro.Schema({'type': 'int'}).type.value_adapters
()
If a valid logical type spec is found, then this is included as a value adapter:
cavro.Schema({'type': 'int', 'logicalType': 'time-millis'}).type.value_adapters
(<cavro.TimeMillis at 0x1124175d0>,)
Disabling all logical types
To disable logical types, set the logical_types option to be an empty list:
cavro.Schema({'type': 'int', 'logicalType': 'time-millis'}, logical_types=()).type.value_adapters
()
my_options = cavro.DEFAULT_OPTIONS.replace(logical_types=())
cavro.Schema({'type': 'int', 'logicalType': 'time-millis'}, options=my_options).type.value_adapters
()
Selectively enabling logical types
cavro.Schema({'type': 'int', 'logicalType': 'time-millis'}, logical_types=(cavro.TimeMillis, )).type.value_adapters
(<cavro.TimeMillis at 0x112416910>,)
Custom Logical Types
Custom logical types are implemented as subclasses of cavro.CustomLogicalType.
They can be added to the list of types in options by using Options.with_logical_types
Subclasses should be defined like this:
class Times10Type(cavro.CustomLogicalType):
logical_name = 'times-10' # The name that is used in the avro schema
underlying_types = (cavro.IntType, cavro.LongType) # A tuple of classes of the avro types that this type can be attached to
@classmethod
def _for_type(cls, underlying):
return cls()
def custom_encode_value(self, value): # This is called to prepare a value for avro encoding
return value * 10
def custom_decode_value(self, value): # This is called after a value has been decoded
return value // 10
Add the class to the options, and encode/decode a value:
my_options = cavro.DEFAULT_OPTIONS.with_logical_types(Times10Type)
schema = cavro.Schema({'type': 'int', 'logicalType': 'times-10'}, options=my_options)
encoded = schema.binary_encode(314)
print(encoded)
print(schema.binary_decode(encoded))
b'\x881'
314
Decoding the encoded value using a normal schema, it's clear that the stored value is 10x larger:
plain_schema = cavro.Schema({'type': 'int'})
plain_schema.binary_decode(encoded)
3140
Schema Parameters for Custom Types
The _for_type
classmethod allows for a logical type to be customized based on values in the schema.
Let's create a new version of the logical type where the stored value can be multiplied by any value (not just 10):
class TimesNType(cavro.CustomLogicalType):
logical_name = 'times-n'
underlying_types = (cavro.IntType, cavro.LongType)
def __init__(self, n):
self.n = n # Store the 'N' value (the number to multiply by)
@classmethod
def _for_type(cls, underlying: cavro.AvroType):
# underlying.metadata is a dictionary of values in the schema that aren't part of the type definition
n_value = underlying.metadata.get('n', 10)
# The avro spec says that invalid logical types must be ignored, so return None here to signal that:
if not isinstance(n_value, int):
return None
return cls(n_value)
def custom_encode_value(self, value): # This is called to prepare a value for avro encoding
return value * self.n
def custom_decode_value(self, value): # This is called after a value has been decoded
return value // self.n
my_options = cavro.DEFAULT_OPTIONS.with_logical_types(TimesNType)
Now, we can specify how much to multiple values by:
n1_schema = cavro.Schema({'type': 'int', 'logicalType': 'times-n', 'n': 1}, options=my_options)
n2_schema = cavro.Schema({'type': 'int', 'logicalType': 'times-n', 'n': 2}, options=my_options)
n10_schema = cavro.Schema({'type': 'int', 'logicalType': 'times-n', 'n': 10}, options=my_options)
If we encode the same number with each of these schemas, and then decode them with our plain schema above, it's clear that the encoded values are different:
print(plain_schema.binary_decode(n1_schema.binary_encode(10)))
print(plain_schema.binary_decode(n2_schema.binary_encode(10)))
print(plain_schema.binary_decode(n10_schema.binary_encode(10)))
10
20
100