-
Notifications
You must be signed in to change notification settings - Fork 13
Schema upgrade
WMArchive relies on fixed schema to generate avro files from JSON data format. The schema migration should be done in two phases. First, we need to generate new schema and then we should stop service, place it to HDFS and restart the service. Below we'll outline all steps.
To upgrade the schema for WMArchvie we need to perform the following steps:
-
update schema in WMArchive tree
-
create new WMArchvie release
-
deploy new WMArchvie release; during deployment procedure a python to avro schema conversion will be run and new file will be produced:
- schema generation is done in wmarchive.spec using the following command
mkdir -p %i/data/schemas
export PYTHONPATH=$PYTHONPATH:$PWD/src/python:$PWD/../WMCore_wmarchive/src/python
bin/fwjrschema --fout=%i/data/schemas/fwjr_prod.json
bin/json2avsc --fin=%i/data/schemas/fwjr_prod.json --fout=%i/data/schemas/fwjr_prod.avsc
bin/wmexceptions --fout=%i/data/wmexceptions.json
- the generated file is copied to service area by wmarchive/deploy script as following
# copy release schema into state area
if [ -f /data/srv/state/wmarchive/avro/schemas/current.avsc ]; then
rm /data/srv/state/wmarchive/avro/schemas/current.avsc
fi
cp /data/srv/current/apps/wmarchive/data/schemas/fwjr_prod.avsc /data/srv/state/wmarchive/avro/schemas/current.avsc
at this stage the new schema file is available at /data/srv/state/wmarchive/avro/schemas/current.avsc
. Now we're ready to put it to HDFS.
At this step we're ready to put schema to HDFS. To do that please follow these steps:
Stop WMArchive service
ssh vocms082
sudo -u wma /bin/bash
/data/vm_manage.sh stop
-
move any avro files from
/data/srv/state/wmarchive/avro/data/{fwjr,crab}
to/data/srv/state/wmarchive/avro/migrate/{fwjr,crab}
areas. This is necessary due to the fact that existing avro files were generated with previous schema -
put schema to HDFS area:
# put schema into schemas folder
hadoop fs -put /data/srv/state/wmarchive/avro/schemas/current.avsc hdfs:///cms/wmarchive/avro/schemas/current.avsc.NEW_TIMESTAMP
# save existing schema into schemas folder
hadoop fs -cp hdfs:///cms/wmarchive/avro/schema.avsc hdfs:///cms/wmarchive/avro/schemas/schema.avsc.OLD_TIMESTAMP
# remove existing schema.avsc file
hadoop fs -rm hdfs:///cms/wmarchive/avro/schema.avsc
# copy new schema to schema.avsc file
hadoop fs -cp hdfs:///cms/wmarchive/avro/current.avsc.NEW_TIMESTAMP hdfs:///cms/wmarchive/avro/schema.avsc
now we're ready to start WMArchive service again:
/data/vm_manage.sh start
As a final note, please check WMArchive configuration file which schema it points to. Currently it has the following line:
data.long_storage_uri = 'hdfs:///cms/wmarchive/avro/schema.avsc'
which will be loaded as current schema. The procedure shown above just save existing schema in schemas folder and put new schema file to hdfs:///cms/wmarchive/avro/schema.avsc
file which will be used by WMArchive.