Skip to main content

Schemas

The primary interface for cavro is the cavro.Schema class. These are constructed from an avro schema definition:

schema = cavro.Schema('{"type": "int"}')
print(schema)
<cavro.Schema object at 0x11585d990>

Schemas are actually convenience wrappers around the underlying avro types that do the heavy-lifting of encoding/decoding.

Constructing Schemas

Schemas can be created from a JSON string representing the schema, or from a python object that matches a JSON schema.

Sometimes the expected behaviour can be ambiguous, so there is a 'parse_json' argument that can disable json parsing. The following are all equivalent:

print(cavro.Schema('{"type": "int"}').schema_str)
print(cavro.Schema('"int"').schema_str)
print(cavro.Schema({'type': 'int'}).schema_str)
print(cavro.Schema('int', parse_json=False).schema_str)
"int"
"int"
"int"
"int"

Whereas this will not work, because cavro tries to parse 'int' as a JSON string:

try:
cavro.Schema('int')
except Exception as e:
print(e)
Expecting value: line 1 column 1 (char 0)

Options

How a schema behaves can be controlled through Options. Options can be passed to a schema, either using the options= argument, or via kwargs the the Schema.__init__ that match the fields of Options.

If no options are provided, then the schema uses cavro.DEFAULT_OPTIONS which provides reasonable defaults.

schema0 = cavro.Schema('"int"')
print('0: ', schema0.options.coerce_values_to_int)

schema1 = cavro.Schema('"int"', coerce_values_to_int=True)
print('1: ', schema1.options.coerce_values_to_int)

schema2 = cavro.Schema('"int"', coerce_values_to_int=False)
print('2: ', schema2.options.coerce_values_to_int)

schema3 = cavro.Schema('"int"', options=cavro.DEFAULT_OPTIONS.replace(coerce_values_to_int=True))
print('3: ', schema3.options.coerce_values_to_int)
0:  False
1: True
2: False
3: True

The various flags and what they mean are described in The Api reference, and the options user guide

Encoding / Decoding values

Encoding and decoding values is done using the binary_encode, binary_decode, json_encode, and json_decode methods:

print(schema.binary_encode(3))
print(schema.binary_decode(b'\x00\x06'))
print(schema.json_encode(3))
print(schema.json_decode('{"int": 3}'))
b'\x00\x06'
3
{"int": 3}
3

Schema dict, Schema String & Canonical form

Unlike other libraries, cavro does not retain the original source used to construct a schema. Standard representations of the avro schema definitions can be retrived from the Schema using several properties (None of these are guaranteed to be identical to the original source).

  • Schema.schema - A python object that represents the schema definition
  • Schema.schema_str - JSON encoded version of the above
  • Schema.canonical_form - The Parsing Canonical Form of the schema
schema = cavro.Schema('["int", {"type": "long"}, {"fields": [{"name": "a", "type": "A"}], "type": "record", "name": "A", "namespace": "x"}]')

schema.schema
['int',
'long',
{'namespace': 'x',
'name': 'A',
'fields': [{'name': 'a', 'type': 'x.A'}],
'type': 'record'}]
print(schema.schema_str)
[
"int",
"long",
{
"namespace": "x",
"name": "A",
"fields": [
{
"name": "a",
"type": "x.A"
}
],
"type": "record"
}
]
schema.canonical_form
'["int","long",{"name":"x.A","type":"record","fields":[{"name":"a","type":"x.A"}]}]'