-
Notifications
You must be signed in to change notification settings - Fork 2
Context
A good amount of data in organisations is maintained in tables with multiple columns. Typically you can think of a multitude of Excel or CSV tables. This data often has a dimension which describes the time, sometimes other dimensions for classifying the data, and in most cases some actual observed values or counts.
Such tables are holding the data in a structured form, but most of the time, the information to understand the columns and also the necessary metadata enabling the creation of use-full representations in charts and visualisations is missing.
With the creation of Cubes you as data provider and domain specialist of the data are able to augment and annotate your data with everything necessary to understand the input data – directly in the to be published dataset. Finally can fully annotated Cubes also be used to visualize your data with tools like https://visualize.admin.ch .
With the cube-creator.
Another way of thinking about this datasets are multi-dimensional cubes.
image cube
In the image above we have a cube with 3 dimensions.
What is missing
Fabian: I am no cube specialist, and no statistics specialist either, so I would need some help here, but I start with some questions/discussions that seems needed IMHO. The goal would certainly be to describe a Cube as simply as possible, so that "Excel" users will understand the tool, and Statistical experts, who are familiar with complex cubes, would also understand the tool.
I did not find a specific description of what a cube is on Zazuko's cube page, and the definition given for the W3C Data Cube seems too complex.
Would it be possible to simplify the description, for instance:
"A cube is a collection of observations.
Within this tool, a cube can be seen as a table or matrix (similar to a spreadsheet), where each line is an observation and each column is a dimension.
All observations of a cube have the same structure (i.e. same dimensions)"
That proposal would reduce the concept of "Cube" (a multi-dimensional representation) to the concept of Table (a two-dimensional representation)
-> would that be ok ? maybe yes, as the main definition of the cube is the "Observation table".
And maybe here we should take into consideration the further coming explanation about "literal" vs "link to another table" situation.
This representation of a main table with links to "secondary" tables should be clear for the end-users, as already discussed with Véronique (similar to databases, spreadsheet, etc.).
This vision can make sens if, compared to the W3C RDF data cube that makes distinctions between 3 different components "dimensions, attributes and measures", Zazuko designed its cube with a much "simpler" concept where those 3 components are now just "dimensions", is it the case ?
My current understanding is that "column" refers to the columns of the CSV files, and "dimension" can be used for each line of an Output table (for both, literal value and link to another table). This understanding does match the wording on the Cube Designer, where each icon is labeled "edit dimension metadata" (for both, literal value and link to another table), and it also matches the current page for the Cube Designer.
The Mapping as Literal Property vs. Mapping as Dimension makes a distinction between "literal" vs "dimension", which is also not clear to me (and to Veronique neither, see her comment there), all those terms need clarification now.
Was any decision taken about this, is there still a need to go further with the issue Implement naming concept / consistent wording
Question: do we reuse the former glossary ? if yes, it will need clean-up and clarification (still mentions of "pipeline", "rdf", etc.)