Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observations as RDF Statements - (re)usability of observations/measurements #5

Open
pmaria opened this issue Oct 20, 2020 · 4 comments
Assignees

Comments

@pmaria
Copy link

pmaria commented Oct 20, 2020

What I've always thought was a missed opportunity in the RDF data cube recommendation is that RDF data cube observation statements are not easily transferrable and directly (semantically) useful in a broader context. You need to "understand" data cube language to make sense of these statements.

One of the key reasons for this is that a instances of qb:MeasureProperty are expected to be used on instances of qb:Observation. This leads to some semantically whacky statements like <some-observation> <has-life-expectancy> 80 ., and it also means that I can basically only use measurement statements in a data cube observation context.

For example, let's say were measuring Alice's height. Consider the following RDF data cube based example:

ex:height a rdf:Property, qb:MeasureProperty ;
  rdfs:subPropertyOf sdmx-measure:obsValue;
  rdfs:range xsd:decimal ;
.

ex:person a rdf:Property, qb:DimensionProperty ;
  rdfs:range schema:Person ;
.

ex:obs1 a qb:Observation ;
  ex:person ex:Alice ;
  ex:height 143.5 ;
  # ... and other (meta)data
.

So here we've created a data cube with an observation of Alices height. But, consider we also have some other data on Alice.

ex:Alice a schema:Person ;
  schema:name "Alice" ;
  schema:lastName "Liddell" ;
  schema:birthDate "1852-05-04"^^xsd:date ;
  schema:homeLocation ex:Wonderland ;
.

Now to find out Alice's height at some point in time, I have to find and interpret the observation, which is somehow connected to Alice.

IMO it would be much more natural to be able to state:

ex:Alice ex:height 143.5 .

And, we could also have that be the core of an observation, if we define

new:Observation a rdfs:Class ;
  rdfs:subClassOf rdf:Statement ;
.

new:measuredProperty a rdf:Property ;
  rdfs:domain new:Observation ;
  rdfs:range rdf:Property ;
. 
# etc.

We could then state something like:

ex:Alice ex:height 143.5 .

ex:obs1 a new:Observation, rdf:Statement ;
  rdf:subject ex:Alice ;
  
  rdf:predicate ex:height ;
  new:measuredProperty ex:height ; 
  
  rdf:object 143.5 ;
  new:measuredValue 143.5 ; # sub-property
  ex:heightValue 143.5 ; # sub-property

  ex:measuredDate "1864-08-12"^^xsd:date ;
.

Note that, because we now use a more semantically sense making statement we can easily make use of existing vocabularies for our observation. In this case we can replace ex:height with the already defined schema:height.

This also fits the RDF* approach (and possibly also property graphs). We can combine this like so:

ex:Alice a schema:Person ;
  schema:name "Alice" ;
  schema:lastName "Liddell" ;
  schema:birthDate "1852-05-04"^^xsd:date ;
  schema:homeLocation ex:Wonderland ;
.

<<ex:Alice schema:height 143.5>>
  a new:Observation ;
  new:measuredProperty schema:height ;
  new:measuredValue 143.5 ;
  ex:heightValue 143.5 ;
  ex:measuredDate "1864-08-12"^^xsd:date ;
.

This also plays nice with the schema.org vocab for Observations.

<<ex:Alice schema:height 143.5>>
  a schema:Observation ;
  schema:observedNode ex:Alice ;
  schema:measuredProperty schema:height ;
  schema:measuredValue 143.5 ;
  schema:observationDate "1864-08-12T20:17:46.384Z"^^xsd:dateTime ;
.

One downside of this approach is that it doesn't nicely fit multi-measurement observations. I don't know how much of a dealbreaker that would be. Personally I'd place greater weight on having observations statements be semantically "reusable" in different contexts, than having the ability to have multi-measurement observations.

I'm curious what your viewpoint on this is and what kind of issues you see with an approach like this @ktk

@RickMoynihan
Copy link

RickMoynihan commented Jan 14, 2021

Hey, I'm not the author of this repository but I think the "semantically whacky statements" you're talking about are due modelling errors rather than issues with the cube vocabulary.

A qb:Observation is essentially a statistical population or a measurement; which to me feels semantically disjoint from an instance of a particular person.

So <some-observation> <has-life-expectancy> 80 . is really more like:

<some-population-or-some-measurement> 
   <with-characteristic> x ; 
   <with-other-characteristic> y ; 
   <has-life-expectancy> 80 .

I understand the desire to reuse these things, and perhaps associate a measurement with the person or river which was measured, however you could always attach the measurement to the measured thing with a qb:Attribute or additionally model these observations with a layer of prov, where you model the succession of measurements of the river as immutable observations, as prov:specializationOf the identity representing ex:Alice / ex:River over time... e.g. modelling heraclituses River:

ex:river-avon a prov:Entity ; 
                       rdfs:label "The river avon" .

obs:river-avon-1 a qb:Observation , prov:Entity ; 
                            rdfs:label "The river avon as measured on 2020-01-01"
                            prov:specializationOf ex:river-avon ; 
                            ex:river-flow 123456789 ;
                            ex:refPeriod 2020-01-01 .

@pmaria
Copy link
Author

pmaria commented Jan 14, 2021

@RickMoynihan, thanks for your remarks.

A qb:Observation is essentially a statistical population or a measurement; which to me feels semantically disjoint from an instance of a particular person.

I agree, and I don't believe I stated anywhere that this is the case.

What is key, is to be clear what the subject of the statistical population/observation is.

So <some-observation> <has-life-expectancy> 80 . is really more like:

<some-population-or-some-measurement> 
  <with-characteristic> x ; 
  <with-other-characteristic> y ; 
  <has-life-expectancy> 80 .

In this example, some observation is being stated to have a life expectancy. I maintain that this is whacky. I would like to state that the subject of this observation has a life expectancy.

I understand the desire to reuse these things, and perhaps associate a measurement with the person or river which was measured, however you could always attach the measurement to the measured thing with a qb:Attribute or additionally model these observations with a layer of prov, where you model the succession of measurements of the river as immutable observations, as prov:specializationOf the identity representing ex:Alice / ex:River over time... e.g. modelling heraclituses River:

ex:river-avon a prov:Entity ; 
                      rdfs:label "The river avon" .

obs:river-avon-1 a qb:Observation , prov:Entity ; 
                           rdfs:label "The river avon as measured on 2020-01-01"
                           prov:specializationOf ex:river-avon ; 
                           ex:river-flow 123456789 ;
                           ex:refPeriod 2020-01-01 .

I believe prov:specializationOf has similar intent as rdfs:subClassOf. In that case this does not solve my problem. ex:river-avon is an observation and not a river.

I would like to model the example as follows:

ex:river-avon a ex:River ;
  rdfs:label "The river avon" ;
  ex:someOtherCoolRiverStuff "that has nothing to do with statistics" .

<<ex:river-avon ex:river-flow 123456789>> a qb:Observation , prov:Entity ;
  rdfs:label "River flow of the river avon as measured on 2020-01-01"
  ex:refPeriod 2020-01-01 .

My problem with the data cube model is that IMO it creates a silo of statistical information (models), and I believe that with some minor tweaks this can remedied.

Curious to hear your thoughts.

@RickMoynihan
Copy link

RickMoynihan commented Jan 15, 2021

In this example, some observation is being stated to have a life expectancy. I maintain that this is whacky. I would like to state that the subject of this observation has a life expectancy.

Whether it's whacky or not depends entirely on the rdfs:domain of the the predicate <has-life-expectancy>. If observation and person are disjoint then the domains of the predicates either need to be sufficiently general to support this reuse, i.e. "the life expectancy of this resource", or they need to be different predicates. I didn't mean to imply predicates would be shared, I think this just leads to imprecission in the model. Life expectancy in statistics doesn't mean everyone in that population will die after the same time T, it means on average they will.

Fair point about prov:specializationOf I didn't double check the description before suggesting this; however I think you're just raising a http range-14 issue. i.e. the problem isn't really with the data cube at all, it's with the philosophy and metaphysics of modelling itself.

Essentially you're battling battling the distinction between the real world river avon; and the RDF description of some aspects of it. See also the distinction between id and doc URI's.

All things in computer science are solved by another level indirection :-), So doc:river-avon could be a representation suitable for prov:specializationOf and id:river-avon can be used to represent the real river but "redirect" to your document of it.

@pmaria
Copy link
Author

pmaria commented Jan 22, 2021

Life expectancy in statistics doesn't mean everyone in that population will die after the same time T, it means on average they will.

Depends on the statistic. For life expectancy, agreed. But one could also have measurements for singular subjects. Like height or weight.

Fair point about prov:specializationOf I didn't double check the description before suggesting this; however I think you're just raising a http range-14 issue. i.e. the problem isn't really with the data cube at all, it's with the philosophy and metaphysics of modelling itself.

I don't see how this has to do with http-range-14, which is about the subtle difference between an IRI identifying an object vs identifying a web document about said object and whether or not that distinction is important.

In this case the difference between an object and one or more observations made about that object, is much less subtle. But yes, it is about modeling philosophy. But so is this rdf cube schema, right?

Essentially you're battling battling the distinction between the real world river avon; and the RDF description of some aspects of it. See also the distinction between id and doc URI's.

What I'm trying to do is to make observations useable in a data graph outside of a cube context in a natural way.

In other words, in terms of SPARQL I would like to

describe ex:river-anon

And find interesting statements, like

ex:river-avon ex:river-flow 123456789 .

And hopefully be able to use that in a generic way.
With the current data cube model, I cannot do this. And I think that's a missed opportunity.

It's about (IMO) reusable / non-siloed information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants