Skip to content

WMArchvie data look up

Valentin Kuznetsov edited this page Mar 18, 2016 · 1 revision

WMArchvie data look-up

The WMArchive operates with two types of storage, the Short-Term Storage (STS) and Long-Term Storage (LTS). The former is used when data are injected by WMAgents and it serves as a buffer to propogate data into LTS. The latter parks all data for lifetime of CMS experiment. So far we use MongoDB for STS and HDFS for LTS, see WMArchive architecture wiki.

The end-user specifies his/her query in a form of JSON spec/fields, see WMArchive queries wiki. Each spec must contains a timerange key which provides a time range for data to look-up. If time range is shorter then specific threshold (determine by capacity of STS) the user request will be routed to STS, otherwise it will be send to LTS. The STS provides real-time queries, therefore user request will be acknowledged in real time and results will be provided back to end-user. While in LTS we cannot provide real-time queries and will rely on underlying batch system (e.g. spark or map-reduce jobs). The job will be send on user behave and user will be given uid of their job. The results of the job will be injected back to STS and at later time user can look them back via provided uid. In that case user will provide their spec in form of uid.