-
Notifications
You must be signed in to change notification settings - Fork 13
WMArchive queries
The queries in WMArchive should be placed via POST request in a form of Query Language (QL). The QL should specify condition keys and document fields to be retrieved. We end-up using JSON format as QL data-format similar to MongoDB QL. For example:
{"spec": {"lfn":"bla*123.root", "task": "bla", "run":[1,2,3]}, "fields":["logs"]}
here we define that our conditions spec
is a dictionary where we define actual conditions such as lfn pattern, task name and run list. The output fields
would be logs key in FWJR documents stored in WMArchive.
The WMArchive will support three attributes which user may provide:
-
spec
defines condition dictionary- each spec must have a timerange key-value pair which points to specific timerange user want to look-up the data, e.g.
"timerange":[20160101, 20160102]
, the values in timerange list should be in YYYYMMDD format
- each spec must have a timerange key-value pair which points to specific timerange user want to look-up the data, e.g.
-
fields
defines list of attributes to be retrieved from FWJR and returned back to user -
aggregate
define list of attributes we need to aggregate
The QL should be flexible enough to accommodate user queries, see next section. For that we can use JSON document to define our queries, see example in previous section. Since we're going to use nested structure in our FWJR documents we need to refer them via dot notations, e.g. output.inputDataset
, meta_data.agent_ver
, etc. Usage of JSON for queries can be easily translated into MongoDB QL. Here is simplest QL rule definitions (mostly aligned with MongoDB QL):
- input query should be in form of JSON
- the query JSON should contain only key-value pairs, no nested structures
- the supported data types for values would be int, float, string and list of those
- user can send multiple set of conditions which translates into AND set
- user can send alternative conditions via
{"$or": [JSON1, JSON2]}
structure. - user can specify
$gt, $lt
operators for greater than, less than use cases
The WMArchive code will translate user query either into Mongo DB one or the one suitable for querying documents on HDFS. Here are few examples of QL syntax
# use multiple keys
{"task":"abc", "run":1}
# use patterns
{"lfn":"/a/v/b*/123.root"}
# use or conditions
{"$or": [{"task":"abc", "lfn":"/a/v/b*/123.root"}, {"run":[1,2,3]}]}
# use array values, i.e. find docs with all given run numbers
{"run":[1,2,3]}
# usage of $gt, $lt operators
{"run":{"$gt":1})
Based on current schema structure we can foresee the following queries:
- Find documents based on provided LFN or its pattern, see here
- Find documents for given agent meta-data
- Find documents for chain attributes such as
input.runs
,output. acquisitionEra
oroutput.applicationVersion
,output.inputDataset
oroutput.outputDataset
- Find documents for given task name
Any combination of aforementioned conditions is also possible
We may need to simplify user life and give them restricted set of keys which will be used by QL. For example
lfn
will refer to LFNArray
, agent.version
can be translated into meta_data.agent_ver
, etc. I think the translation layer will come later when we'll settle down on schema and QL implementation.
Here we provide examples how input query JSON file may looks like and how WMArchive will provide the output:
input QL JSON (stored in query.json):
{"spec":{"task": "/AbcCde_Task_Data_test_6735909/RECO"}, "fields":["wmaid"]}
client call and output
wma_client.py --spec=query.json
{"result": [{"status": "ok",
"input": {"fields": ["wmaid"],
"spec": {"task": "/AbcCde_Task_Data_test_6735909/RECO"}},
"storage": "mongodb",
"results": [{"wmaid": "6b0bace20fc732563316198d1ed2b94e"}]}]}
intput QL JSON (stored in query.json):
{"spec":{"task": "/AbcCde_Task_Data_test_6735909/RECO"}, "fields":["meta_data", "logs"]}
client call and output
wma_client.py --spec=query.json
{"result": [{"status": "ok",
"input": {"fields": ["meta_data", "logs"],
"spec": {"task": "/AbcCde_Task_Data_test_6735909/RECO"}},
"storage": "mongodb",
"results": [{"meta_data": {"agent_ver": "1.0.13.pre8", "fwjr_id": "100-0", "host": "vocms008.cern.ch", "ts": 1454617125}}]}]}