-
Notifications
You must be signed in to change notification settings - Fork 13
Tests
Valentin Kuznetsov edited this page Mar 23, 2017
·
1 revision
Sometimes it is necessary to test existing JSON document against WMArchive schema. To do so we need to perform the following steps:
- Obtain a document in JSON format, e.g. file.json
- Obtain WMArchive schema file, it can be fetched from HDFS
hadoop fs -get /cms/wmarchive/avro/fwjr_prod.avsc ./schame.avsc
or generated directly from FWJRProduction.py file
# setup WMArchive environment and run the following commands
# both fwjrschema and json2avsc are part of WMArchvie distribution
fwjrschema --fout=schema.json
json2avsc --fin=schema.json --fout=schema.avsc
The schema.avsc
is an AVRO schema for WMArchive. It has the following format:
{
"namespace": "ns12",
"type": "record",
"name": "name12",
"fields": [
{
"type": {
"items": [
"string",
"null"
],
"type": "array"
},
"name": "PFNArrayRef"
},
...
}
which describes valid keys and associated value data-types.
- Generate AVRO file from existing JSON and WMArchive schema files
json2avro --fin=file.json --schema=schema.avsc --fout=file.avro
- Read back avro file using Java avro library
# you may look-up avro-tools-1.7.7.jar in your local Java installation area or on an internet
java -jar avro-tools-1.7.7.jar tojson file.avro > avro2file.json