Logical Types

cavro supports a number of standard avro logical types by default:

for logical_type in cavro.DEFAULT_OPTIONS.logical_types:
    name = logical_type.logical_name
    underlying_types = logical_type.underlying_types
    underlying_names = [ut.type_name for ut in underlying_types]
    print(f' * {name} ({", ".join(underlying_names)})')

 * decimal (bytes, fixed)
 * uuid (string)
 * uuid (fixed)
 * date (int)
 * time-millis (int)
 * time-micros (long)
 * timestamp-millis (long)
 * timestamp-micros (long)

Internally, these are implemented as value adapters on the schema type. Value adapters are primarily used for logical types, but also play a part in schema promotion, and effectively are hooks that can change values before being encoded, or after being decoded.

Normally you shouldn't need to worry about value adapters, but understanding this may make the examples below clearer. A normal schema with no logical types has no value adapters:

cavro.Schema({'type': 'int'}).type.value_adapters

()

If a valid logical type spec is found, then this is included as a value adapter:

cavro.Schema({'type': 'int', 'logicalType': 'time-millis'}).type.value_adapters

(<cavro.TimeMillis at 0x1124175d0>,)

Disabling all logical types

To disable logical types, set the logical_types option to be an empty list:

cavro.Schema({'type': 'int', 'logicalType': 'time-millis'}, logical_types=()).type.value_adapters

()

my_options = cavro.DEFAULT_OPTIONS.replace(logical_types=())
cavro.Schema({'type': 'int', 'logicalType': 'time-millis'}, options=my_options).type.value_adapters

()

Selectively enabling logical types

cavro.Schema({'type': 'int', 'logicalType': 'time-millis'}, logical_types=(cavro.TimeMillis, )).type.value_adapters

(<cavro.TimeMillis at 0x112416910>,)

Custom Logical Types

Custom logical types are implemented as subclasses of cavro.CustomLogicalType.

They can be added to the list of types in options by using Options.with_logical_types

Subclasses should be defined like this:

class Times10Type(cavro.CustomLogicalType):
    logical_name = 'times-10'                            # The name that is used in the avro schema
    underlying_types = (cavro.IntType, cavro.LongType)   # A tuple of classes of the avro types that this type can be attached to

    @classmethod
    def _for_type(cls, underlying):
        return cls()
    
    def custom_encode_value(self, value):                 # This is called to prepare a value for avro encoding
        return value * 10
    
    def custom_decode_value(self, value):               # This is called after a value has been decoded
        return value // 10

Add the class to the options, and encode/decode a value:

my_options = cavro.DEFAULT_OPTIONS.with_logical_types(Times10Type)
schema = cavro.Schema({'type': 'int', 'logicalType': 'times-10'}, options=my_options)

encoded = schema.binary_encode(314)
print(encoded)
print(schema.binary_decode(encoded))

b'\x881'
314

Decoding the encoded value using a normal schema, it's clear that the stored value is 10x larger:

plain_schema = cavro.Schema({'type': 'int'})
plain_schema.binary_decode(encoded)

Schema Parameters for Custom Types

The _for_type classmethod allows for a logical type to be customized based on values in the schema.

Let's create a new version of the logical type where the stored value can be multiplied by any value (not just 10):

class TimesNType(cavro.CustomLogicalType):
    logical_name = 'times-n'  
    underlying_types = (cavro.IntType, cavro.LongType)

    def __init__(self, n):
        self.n = n   # Store the 'N' value (the number to multiply by)

    @classmethod
    def _for_type(cls, underlying: cavro.AvroType):
        # underlying.metadata is a dictionary of values in the schema that aren't part of the type definition
        n_value = underlying.metadata.get('n', 10)
        # The avro spec says that invalid logical types must be ignored, so return None here to signal that:
        if not isinstance(n_value, int):
            return None
        return cls(n_value)
    
    def custom_encode_value(self, value):                 # This is called to prepare a value for avro encoding
        return value * self.n
    
    def custom_decode_value(self, value):               # This is called after a value has been decoded
        return value // self.n
    
my_options = cavro.DEFAULT_OPTIONS.with_logical_types(TimesNType)

Now, we can specify how much to multiple values by:

n1_schema = cavro.Schema({'type': 'int', 'logicalType': 'times-n', 'n': 1}, options=my_options)
n2_schema = cavro.Schema({'type': 'int', 'logicalType': 'times-n', 'n': 2}, options=my_options)
n10_schema = cavro.Schema({'type': 'int', 'logicalType': 'times-n', 'n': 10}, options=my_options)

If we encode the same number with each of these schemas, and then decode them with our plain schema above, it's clear that the encoded values are different:

print(plain_schema.binary_decode(n1_schema.binary_encode(10)))
print(plain_schema.binary_decode(n2_schema.binary_encode(10)))
print(plain_schema.binary_decode(n10_schema.binary_encode(10)))

10
20
100

Logical Types

Disabling all logical types​

Selectively enabling logical types

Custom Logical Types​

Schema Parameters for Custom Types​

Disabling all logical types

Custom Logical Types

Schema Parameters for Custom Types