-
Notifications
You must be signed in to change notification settings - Fork 2
Context
A good amount of data in organisations is maintained in tables with multiple columns. Typically you can think of a multitude of Excel or CSV tables. This data often has a dimension which describes the time, sometimes other dimensions for classifying the data, and in most cases some actual observed values or counts.
Such tables are holding the data in a structured form, but most of the time, the information to understand the columns and also the necessary metadata enabling the creation of use-full representations in charts and visualisations is missing.
With the creation of Cubes you as data provider and domain specialist of the data are able to augment and annotate your data with everything necessary to understand the input data – directly in the to be published dataset. Finally can fully annotated Cubes also be used to visualize your data with tools like https://visualize.admin.ch .
The Cube Creator allows us to transform data provided as clean CSV into a standardised RDF Cube format. At a second step in the Cube Creator – the Cube Designer – it is possible to annotate the Cube with the necessary descriptive and technical metadata. Further is it possible to map common values to known concepts (e.g. Cantons, Municipalities, Companies, Departement) which further augments the data at hand. Finally the Cube Creator allows to manage the publishing of the Cube on LINDAS which allows it to be consumed through https://visualize.admin.ch (for end-users) and queried on https://lindas.admin.ch/sparql/ (for developers).
Another way of thinking about Cubes are multi-dimensional representations of your input tables.
In the image above we see a cube with 3 dimensions: Year, Location and Season. In the cube is for each of this combinations the average temperature reported.
The source of this cube might be provided though following table:
Year | Location | Season | Average Temperature |
---|---|---|---|
2019 | Bern | Summer | 22 °C |
2020 | Bern | Summer | 23 °C |
2021 | Bern | Summer | 24 °C |
2019 | Zürich | Summer | 21 °C |
2020 | Zürich | Summer | 22 °C |
2021 | Zürich | Summer | 23 °C |
2019 | Bern | Winter | 12 °C |
2020 | Bern | Winter | 13 °C |
2021 | Bern | Winter | 14 °C |
2019 | Zürich | Winter | 11 °C |
2020 | Zürich | Winter | 12 °C |
2021 | Zürich | Winter | 13 °C |
For every combination of this three dimensions we have a value which we normally call an observation. It is possible to have more than three dimensions for which every combination provides an observation. Also is it possible to have multiple observations per combination of dimensions.
Was any decision taken about this, is there still a need to go further with the issue Implement naming concept / consistent wording
Question: do we reuse the former glossary ? if yes, it will need clean-up and clarification (still mentions of "pipeline", "rdf", etc.)