Skip to content

Schema upgrade

Valentin Kuznetsov edited this page Oct 9, 2017 · 4 revisions

Schema upgrade procedure

WMArchive relies on fixed schema to generate avro files from JSON data format. The schema migration should be done in two phases. First, we need to generate new schema and then we should stop service, place it to HDFS and restart the service. Below we'll outline all steps.

Schema generation

To upgrade the schema for WMArchvie we need to perform the following steps:

  1. update schema in WMArchive tree

  2. create new WMArchvie release

  3. deploy new WMArchvie release; during deployment procedure a python to avro schema conversion will be run and new file will be produced:

  • schema generation is done in wmarchive.spec using the following command
mkdir -p %i/data/schemas
export PYTHONPATH=$PYTHONPATH:$PWD/src/python:$PWD/../WMCore_wmarchive/src/python
bin/fwjrschema --fout=%i/data/schemas/fwjr_prod.json
bin/json2avsc --fin=%i/data/schemas/fwjr_prod.json --fout=%i/data/schemas/fwjr_prod.avsc
bin/wmexceptions --fout=%i/data/wmexceptions.json
  • the generated file is copied to service area by wmarchive/deploy script as following
  # copy release schema into state area
  if [ -f /data/srv/state/wmarchive/avro/schemas/current.avsc ]; then
     rm /data/srv/state/wmarchive/avro/schemas/current.avsc
  fi
  cp /data/srv/current/apps/wmarchive/data/schemas/fwjr_prod.avsc /data/srv/state/wmarchive/avro/schemas/current.avsc

at this stage the new schema file is available at /data/srv/state/wmarchive/avro/schemas/current.avsc. Now we're ready to put it to HDFS.

Schema migration

At this step we're ready to put schema to HDFS. To do that please follow these steps:

Stop WMArchive service

ssh vocms082
sudo -u wma /bin/bash
/data/vm_manage.sh stop
  1. move any avro files from /data/srv/state/wmarchive/avro/data/{fwjr,crab} to /data/srv/state/wmarchive/avro/migrate/{fwjr,crab} areas. This is necessary due to the fact that existing avro files were generated with previous schema

  2. put schema to HDFS area:

# put schema into schemas folder
hadoop fs -put /data/srv/state/wmarchive/avro/schemas/current.avsc hdfs:///cms/wmarchive/avro/schemas/current.avsc.NEW_TIMESTAMP
# save existing schema into schemas folder
hadoop fs -cp hdfs:///cms/wmarchive/avro/schema.avsc hdfs:///cms/wmarchive/avro/schemas/schema.avsc.OLD_TIMESTAMP
# remove existing schema.avsc file
hadoop fs -rm hdfs:///cms/wmarchive/avro/schema.avsc
# copy new schema to schema.avsc file
hadoop fs -cp hdfs:///cms/wmarchive/avro/current.avsc.NEW_TIMESTAMP hdfs:///cms/wmarchive/avro/schema.avsc

now we're ready to start WMArchive service again:

/data/vm_manage.sh start

As a final note, please check WMArchive configuration file which schema it points to. Currently it has the following line:

data.long_storage_uri = 'hdfs:///cms/wmarchive/avro/schema.avsc'

which will be loaded as current schema. The procedure shown above just save existing schema in schemas folder and put new schema file to hdfs:///cms/wmarchive/avro/schema.avsc file which will be used by WMArchive.