-
Notifications
You must be signed in to change notification settings - Fork 13
Schema upgrade
WMArchive relies on fixed schema to generate avro files from JSON data format. The schema migration should be done in two phases. First, we need to generate new schema and then we should stop service, place it to HDFS and restart the service. Below we'll outline all steps.
To upgrade the schema for WMArchvie we need to perform the following steps:
-
update schema in WMArchive tree
-
create new WMArchvie release
-
deploy new WMArchvie release; during deployment procedure a python to avro schema conversion will be run and new file will be produced:
- schema generation is done in wmarchive.spec using the following command
mkdir -p %i/data/schemas
export PYTHONPATH=$PYTHONPATH:$PWD/src/python:$PWD/../WMCore_wmarchive/src/python
bin/fwjrschema --fout=%i/data/schemas/fwjr_prod.json
bin/json2avsc --fin=%i/data/schemas/fwjr_prod.json --fout=%i/data/schemas/fwjr_prod.avsc
bin/wmexceptions --fout=%i/data/wmexceptions.json
- the generated file is copied to service area by wmarchive/deploy script as following
# copy release schema into state area
if [ -f /data/srv/state/wmarchive/avro/schemas/current.avsc ]; then
rm /data/srv/state/wmarchive/avro/schemas/current.avsc
fi
cp /data/srv/current/apps/wmarchive/data/schemas/fwjr_prod.avsc /data/srv/state/wmarchive/avro/schemas/current.avsc
at this stage the new schema file is available at /data/srv/state/wmarchive/avro/schemas/current.avsc
. Now we're ready to put it to HDFS.
At this step we're ready to put schema to HDFS. To do that please follow these steps:
Stop WMArchive service
ssh vocms071
sudo -u wma /bin/bash
/data/vm_manage.sh stop
-
move any avro files from
/data/srv/state/wmarchive/avro/data/
to/data/srv/state/wmarchive/avro/migrate
area. This is necessary due to the fact that existing avro files were generated with previous schema -
put schema to HDFS area (here we use 20161215 timestamp as an example):
# put schema into schemas folder
hadoop fs -put /data/srv/state/wmarchive/avro/schemas/current.avsc hdfs:///cms/wmarchive/avro/schemas/current.avsc.20161215
# save existing schema into schemas folder
hadoop fs -cp hdfs:///cms/wmarchive/avro/schema.avsc.20161214 hdfs:///cms/wmarchive/avro/schemas/schema.avsc.OLD_TIMESTAMP
# remove existing schema.avsc file
hadoop fs -rm hdfs:///cms/wmarchive/avro/schema.avsc
# copy new schema to schema.avsc file
hadoop fs -cp hdfs:///cms/wmarchive/avro/current.avsc.20161215 hdfs:///cms/wmarchive/avro/schema.avsc
now we're ready to start WMArchive service again:
/data/vm_manage.sh start