Schema upgrade

Schema upgrade procedure

WMArchive relies on fixed schema to generate avro files from JSON data format. The schema migration should be done in two phases. First, we need to generate new schema and then we should stop service, place it to HDFS and restart the service. Below we'll outline all steps.

Schema generation

To upgrade the schema for WMArchvie we need to perform the following steps:

update schema in WMArchive tree
create new WMArchvie release
deploy new WMArchvie release; during deployment procedure a python to avro schema conversion will be run and new file will be produced:

schema generation is done in wmarchive.spec using the following command

mkdir -p %i/data/schemas
export PYTHONPATH=$PYTHONPATH:$PWD/src/python:$PWD/../WMCore_wmarchive/src/python
bin/fwjrschema --fout=%i/data/schemas/fwjr_prod.json
bin/json2avsc --fin=%i/data/schemas/fwjr_prod.json --fout=%i/data/schemas/fwjr_prod.avsc
bin/wmexceptions --fout=%i/data/wmexceptions.json

the generated file is copied to service area by wmarchive/deploy script as following

  # copy release schema into state area
  if [ -f /data/srv/state/wmarchive/avro/schemas/current.avsc ]; then
     rm /data/srv/state/wmarchive/avro/schemas/current.avsc
  fi
  cp /data/srv/current/apps/wmarchive/data/schemas/fwjr_prod.avsc /data/srv/state/wmarchive/avro/schemas/current.avsc

at this stage the new schema file is available at /data/srv/state/wmarchive/avro/schemas/current.avsc. Now we're ready to put it to HDFS.

Schema migration

At this step we're ready to put schema to HDFS. To do that please follow these steps:

Stop WMArchive service

ssh vocms082
sudo -u wma /bin/bash
/data/vm_manage.sh stop

move any avro files from /data/srv/state/wmarchive/avro/data/{fwjr,crab} to /data/srv/state/wmarchive/avro/migrate/{fwjr,crab} areas. This is necessary due to the fact that existing avro files were generated with previous schema
put schema to HDFS area:

# put schema into schemas folder
hadoop fs -put /data/srv/state/wmarchive/avro/schemas/current.avsc hdfs:///cms/wmarchive/avro/schemas/current.avsc.NEW_TIMESTAMP
# save existing schema into schemas folder
hadoop fs -cp hdfs:///cms/wmarchive/avro/schema.avsc hdfs:///cms/wmarchive/avro/schemas/schema.avsc.OLD_TIMESTAMP
# remove existing schema.avsc file
hadoop fs -rm hdfs:///cms/wmarchive/avro/schema.avsc
# copy new schema to schema.avsc file
hadoop fs -cp hdfs:///cms/wmarchive/avro/current.avsc.NEW_TIMESTAMP hdfs:///cms/wmarchive/avro/schema.avsc

now we're ready to start WMArchive service again:

/data/vm_manage.sh start

As a final note, please check WMArchive configuration file which schema it points to. Currently it has the following line:

data.long_storage_uri = 'hdfs:///cms/wmarchive/avro/schema.avsc'

which will be loaded as current schema. The procedure shown above just save existing schema in schemas folder and put new schema file to hdfs:///cms/wmarchive/avro/schema.avsc file which will be used by WMArchive.

CMS collaboration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema upgrade

Schema upgrade procedure

Schema generation

Schema migration

End-user tools

WMArchive internals

Clone this wiki locally