Concepts
Avro IDL (.avdl)
-
Plain text, of
Avro IDL
syntax -
Resources
Avro Schema (.avsc)
-
JSON format
// Example { "type": "record", "name": "User", "namespace": "my.types", "doc": "User record", "fields": [ { "name": "name", "type": "string" }, { "name": "email", "type": "string" } ] }
Schema - Cheatsheet
Schema - Reference a shared type
Use $namespace.$type
to reference a custom type
{
"type": "record",
"namespace": "data.add",
"name": "Address",
"fields": [
{
"name": "student",
"type": "data.add.Student"
}
]
}
Schema - Specify order of compilation
-
Avro Maven plugin
Use
includes
to specify the order of compilation<plugin> <groupId>org.apache.avro</groupId> <artifactId>avro-maven-plugin</artifactId> <version>${avro.version}</version> <configuration> <stringType>String</stringType> <sourceDirectory>${project.basedir}/src/main/avro/schema</sourceDirectory> <outputDirectory>${project.build.directory}/generated-sources/avro</outputDirectory> <includes> <include>**/Message.avsc</include> <include>**/UserDtoAvro.avsc</include> <include>**/Metadata.avsc</include> <include>**/Template.avsc</include> <include>**/smsRequested.avsc</include> </includes> </configuration> <executions> <execution> <id>schema-to-java</id> <phase>generate-sources</phase> <goals> <goal>schema</goal> </goals> </execution> </executions> </plugin>
Avro Protocol (.avpr)
- JSON format
Avro Data (.avro)
- Binary
Avro Serialization
Data in Avro is always stored with its corresponding schema, meaning we can always read a serialized item regardless of whether we know the schema ahead of time. This allows us to perform serialization and deserialization without code generation.
Avro Serialization - Encoding
-
Binary
- Default encoding
- More performant, smaller and faster
- Does not include field names, self-contained information about the types of individual bytes, nor field or record separators. Therefore readers are wholly reliant on the schema used when the data was encoded.
-
JSON
- Human readable, easy for debugging
- Useful for web applications
Cheatsheet
Avro IDL -> Avro schema
avro-tools idl2schemata ${protocol.avdl} .
Get Avro schema from Avro data
avro-tools getschema ${data.avro} > ${schema.avsc}
Avro schema + JSON data -> Avro data
avro-tools fromjson --schema-file ${schema.avsc} ${data.json} > ${data.avro}
Avro schema + JSON data -> Avro data (with compression)
avro-tools fromjson --codec deflate --schema-file ${schema.avsc} ${data.json} > ${data.avro}
Avro data -> JSON data
avro-tools tojson --pretty ${data.avro} > ${data.json}
Java source code -> Avro schema
-
Jackson Binary Dataformats
-
avro-tools induce
Avro schema -> Java source code
- avro-tools compile
CLI
avro-tools
-
Installation
# Homebrew brew install avro-tools
-
Usage
❯ avro-tools Version 1.11.3 of Apache Avro Copyright 2010-2015 The Apache Software Foundation This product includes software developed at The Apache Software Foundation (https://www.apache.org/). ---------------- Available tools: canonical Converts an Avro Schema to its canonical form cat Extracts samples from files compile Generates Java code for the given schema. concat Concatenates avro files without re-compressing. count Counts the records in avro files or folders fingerprint Returns the fingerprint for the schemas. fragtojson Renders a binary-encoded Avro datum as JSON. fromjson Reads JSON records and writes an Avro data file. fromtext Imports a text file into an avro data file. getmeta Prints out the metadata of an Avro data file. getschema Prints out schema of an Avro data file. idl Generates a JSON schema from an Avro IDL file idl2schemata Extract JSON schemata of the types from an Avro IDL file induce Induce schema/protocol from Java class/interface via reflection. jsontofrag Renders a JSON-encoded Avro datum as binary. random Creates a file with randomly generated instances of a schema. recodec Alters the codec of a data file. repair Recovers data from a corrupt Avro Data file rpcprotocol Output the protocol of a RPC service rpcreceive Opens an RPC Server and listens for one message. rpcsend Sends a single RPC message. tether Run a tethered mapreduce job. tojson Dumps an Avro data file as JSON, record per line or pretty. totext Converts an Avro data file to a text file. totrevni Converts an Avro data file to a Trevni file. trevni_meta Dumps a Trevni file's metadata as JSON. trevni_random Create a Trevni file filled with random instances of a schema. trevni_tojson Dumps a Trevni file as JSON.
-
Resources
Article
Performance
Code
-
avro/share/test/schemas at main · apache/avro (opens in a new tab)
IDL
(.avdl),Protocol
(.avpr),Schema
(.avsc) examples -
GitHub - linkedin/avro-util (opens in a new tab)
A collection of utilities and libraries to allow java projects to better work with avro.