Schemas
The primary interface for cavro
is the cavro.Schema class. These are constructed from an avro schema definition:
schema = cavro.Schema('{"type": "int"}')
print(schema)
<cavro.Schema object at 0x11585d990>
Schemas are actually convenience wrappers around the underlying avro types that do the heavy-lifting of encoding/decoding.
Constructing Schemas
Schemas can be created from a JSON string representing the schema, or from a python object that matches a JSON schema.
Sometimes the expected behaviour can be ambiguous, so there is a 'parse_json' argument that can disable json parsing. The following are all equivalent:
print(cavro.Schema('{"type": "int"}').schema_str)
print(cavro.Schema('"int"').schema_str)
print(cavro.Schema({'type': 'int'}).schema_str)
print(cavro.Schema('int', parse_json=False).schema_str)
"int"
"int"
"int"
"int"
Whereas this will not work, because cavro tries to parse 'int' as a JSON string:
try:
cavro.Schema('int')
except Exception as e:
print(e)
Expecting value: line 1 column 1 (char 0)
Options
How a schema behaves can be controlled through Options. Options can be passed to a schema, either using the options=
argument, or via kwargs the the Schema.__init__
that match the fields of Options
.
If no options are provided, then the schema uses cavro.DEFAULT_OPTIONS
which provides reasonable defaults.
schema0 = cavro.Schema('"int"')
print('0: ', schema0.options.coerce_values_to_int)
schema1 = cavro.Schema('"int"', coerce_values_to_int=True)
print('1: ', schema1.options.coerce_values_to_int)
schema2 = cavro.Schema('"int"', coerce_values_to_int=False)
print('2: ', schema2.options.coerce_values_to_int)
schema3 = cavro.Schema('"int"', options=cavro.DEFAULT_OPTIONS.replace(coerce_values_to_int=True))
print('3: ', schema3.options.coerce_values_to_int)
0: False
1: True
2: False
3: True
The various flags and what they mean are described in The Api reference, and the options user guide
Encoding / Decoding values
Encoding and decoding values is done using the binary_encode
, binary_decode
, json_encode
, and json_decode
methods:
print(schema.binary_encode(3))
print(schema.binary_decode(b'\x00\x06'))
print(schema.json_encode(3))
print(schema.json_decode('{"int": 3}'))
b'\x00\x06'
3
{"int": 3}
3
Schema dict, Schema String & Canonical form
Unlike other libraries, cavro
does not retain the original source used to construct a schema. Standard representations of the avro schema definitions can be retrived from the Schema
using several properties (None of these are guaranteed to be identical to the original source).
Schema.schema
- A python object that represents the schema definitionSchema.schema_str
- JSON encoded version of the aboveSchema.canonical_form
- The Parsing Canonical Form of the schema
schema = cavro.Schema('["int", {"type": "long"}, {"fields": [{"name": "a", "type": "A"}], "type": "record", "name": "A", "namespace": "x"}]')
schema.schema
['int',
'long',
{'namespace': 'x',
'name': 'A',
'fields': [{'name': 'a', 'type': 'x.A'}],
'type': 'record'}]
print(schema.schema_str)
[
"int",
"long",
{
"namespace": "x",
"name": "A",
"fields": [
{
"name": "a",
"type": "x.A"
}
],
"type": "record"
}
]
schema.canonical_form
'["int","long",{"name":"x.A","type":"record","fields":[{"name":"a","type":"x.A"}]}]'