-
Notifications
You must be signed in to change notification settings - Fork 9
Revision support - Possibly move to rdf_entity 2.x #83
Comments
Thank you for your thoroughness @idimopoulos ! |
Very interesting read. I was considering an alternative approach (which is nowhere near as thought through as yours), involving the use of the Drupal database to store revisions. Apologies for my poor use of terminology, I'm still learning the language around RDF. My proposal makes a few assumptions:
With regards to implementation, a derived module of the triple storage module and a version of the RDF_draft module could allow for the current version of the RDF entity to be written to the triple store, and revisions to be written to the Drupal database (for example using the entity model and only using the triple store when reverting to a previous version). This solution would not allow for inspecting an old version of the RDF graph, and wouldn't track RDF relations (as far as I understand the implementation in Drupal) but would be relatively simple to implement. |
Hi @Roensby, This PoC creates a new 'RDF entity' (collection of triples with the same subject) for each revision. I had a similar idea as yours at some point, but the necessity for being able to query revisions made me take the other path. I was thinking to simply serialize the entities on disk as ttl files, and not even put them in the database. Simply giving them incremental filenames should be enough, in the end if you can't query on the data, I didn't see much point in putting it in a db. You could come up with a way of attaching some metadata, so a db could make sense for some use cases I guess. I'm very curious to learn about your use case though! |
Hi @sandervd, thanks for engaging. I'm going to check out your poc. A little background for my use case: traditionally, Drupal uses RDF on a per-field basis. For example, a node can have an author field In my opinion, this way is old-fashioned because it requires constant changes to the data model (in this case a node), which creates extra work in a decoupled setup where Drupal is just the content layer and something else provides the presentation layer (for example). Instead, I use a metadata field (e.g There's obviously more to it, but treating my metadata this way allows me to integrate Drupal with a triple store in a conceptually simple way. The rdf entity module seems like a perfect fit for this use case. However, because my nodes already take care of versioning the most important relation; what triples they reference through the metadata field, I don't particularly need the triple store to remember past relations. For my use case, it seems more prudent to track the actual content of the RDF triples (such as changes to the author name). For this, Drupal built-in revision support using SQL tables may be sufficient. Btw, I commented to vocalise my idea with the intention of writing the necessary code myself. I'm still not convinced of the viability, but on the face of it, my proposal seems technically (if not conceptually) much, much simpler than implementing custom revision support on top of a triple store. I hope it makes sense, and again, thanks for engaging! |
Hi @Roensby, You could also have a look at this project: http://d2rq.org/ |
After a discussion today with @sandervd and @brummbar we came up with the following template/suggestion so that remarks can be added and a complete solution can be created.
Purpose of the issue: Support revisions
Current state and problematic pieces
A bit of history
Rdf entity provides a layer to support storing entities directly in the triplestore. The idea behind it is that every property of every field has a predicate URI mapped to it and this is used as a storage identifier for the database. Properties without a mapped URI do not get stored in the database and are simply skipped.
The other major factor of the module is that it uses graphs in order to store the entities separated by bundle. That means that each bundle has its own graph, rather than each entity type. This approach, while seemed nice at first, is not ideal as one cannot enforce that all objects (entities) that have a "specific schemantic meaning" will live under graphs split by their type.
Where the problems start
The last major factor - Triplets
The last major factor in our decisions is the triplestore (or quadstore as we use it) itself. A triplestore means that everything is described using triplets. This has advantages and disadvantages but what is really important for us is that it lacks a bit of flexibility against SQL in terms that you cannot have more than 3 "columns" to describe something.
For example, in SQL, for each field, you have a table where each entry stores the entity_id, the revision_id, the delta, the value and other properties required by each field. That means that a structure is created and you store one entry for each delta of each revision of each entity.
In SPARQL however, you need to find a way to do this in triplets without breaking the structure of the entity, allowing to query properly (so no serialized cheats) and without breaking the ontology (keep a predicate per property). That means that you had to do something like
Only by looking the above, the problem already exists since if you query for the field1 of entity1 you already don't know which property belongs to which delta. The sequence of the data stored is also not a way to distinguish as triplestore does not return results or stores them as you give them.
While the delta specific problem is for another issue, this remains here for the following sections.
The string ID
One of the thing that makes the scemantic web so appealing is the identification of its objects (or entities) by a unique URI. For us, that means that unlike nodes, we are using URIs for identifying the entities. This also brought up many issues in the past, as many modules did not yet support the string IDs but we got over that. Why is that important here though?
Back a year and a half, we also had to somehow support multiple versions in the Joinup project. As it is normal, the idea to support revisions just like core does was one of the ideas. However, there were a few issues here:
http://example.com/rdf/1
cannot automatically havehttp://example.com/rdf/1/version/2
as the implications are multiple. The same id can be the same id of another entity (not a revision) and the queries would be a nightmare if we had to concatenate ids.Rdf Draft
That is when the rdf_draft module came into play. The need in Joinup was that only up to two revisions can exist at any given time, a published and an unpublished one. Since the need for a history of changes was not a requirement, the solution came with the graphs themselves.
For each bundle, a second graph was created, separating the two entities and giving the option for a publication status on the entity. For us, those graphs took the form of
http://joinup.eu/<bunlde>/[published|draft]
.This is already a solution to many of the needs that might come up but that also came up with some limitations:
Revisions
The idea behind supporting revisions involves a few ideas, parameters and a couple of compromises:
Drop of the rdf_draft module
The rdf_draft module is a nice implementation but should be a legacy of the past. Apart from the fact that a lot of issues have come up due to the multiple graphs we need to support, it will directly conflict with the implementation of revisions.
Graph structure
Following the problems above, we are going to drop the support of a graph per bundle and follow the notion of the Drupal entities in version 8. Each entity type has a table which stores the base fields of the entity. Unlike Drupal, however, we are going to have everything within a specific graph.
Without addressing the revisions yet, that means that every entity type will only have one graph to look into for anything.
Revisions
Revisions, like rdf_draft, will reside in a separate module. That would require the need that the storage class (or entity class) will be overridden by the new module in order to support new methods like the
::allRevisions()
(corresponding to the::allRevisions()
from the NodeStorage class).However, since we are trying to split the rdf_entity module already, we can use this module to simply include an interface and a revision trait for each entity type that is defined and wants to use the revision system.
How issues will be addressed
For all the above issues, during the discussion we came up with the following structure details.
Revision graph
The revision graph will also belong to a specific entity type, it can be defined in the annotation of the entity type or in the mapping entity that we currently support. Each entity type's graph will be solely for internal use and should not be exposed if there is an exposed endpoint.
The name of the graph is user defined.
Identification of the entities
The revision graph will be a pool of data from all revisions of the entities. As nodes do, even the current revision will exist in the graph.
Since we define that revision graphs are solely for internal usage, the IDs of the entities can be arbitrary and different from the original IDs. This gives us the ability to create IDs like
http://<random alphanumeric string>.com/<entity type>/revision/<revision serial id>
. The serial number can be global or per entity. If global, since triplestore does not have serial numbering, it has to be stored in Drupal or be determined on the fly when a new revision is stored. The later is a better solution solely because migrating data will not break the structure.Connection with original entity
The idea is that the revision entities will use a property like the
dcat:isVersionOf
to link to the original content. Possible implications here is that the original entity might already have a property mapped to thedcat:isVersionOf
predicate so probably another property might be used. Something like<base_url>/drupalIsVerionOf
.Additionally, the revisions sub module can define to all entity types that have a revision graph defined an additional base field mapped to something like
<base_url>/drupalRevisionId
which also maps back to the current revision ID.Additional properties
Every revision should include the following properties apart from the
drupalIsVersionOf
:Drop support of states
RDF Draft enforces the idea of states within the rdf_entity module. However, the revisions is not necessarily the entity in a different state rather than a history of it. Further support by states can be attained but this will be irrelevant to the rdf entity structure and only relevant to the corresponding status field.
Conclusion and Compromises
** All queries can be supported by default.
** Entities existing in the published graph can be simply moved over to the main entity type graph and the update path is complete.
** Enabling revisions only require to copy over the version of the published graph over to the revisions graph.
** Upgrading from rdf_draft only requires to create a new revision in the revisions graph.
** Support to the federation is easily achievable by having a new adding a state in the status field which is 'federation' (other statuses are published or unpublished, this is not about the state_machine state field we are using). Entities with the
federation
status are simply prone to becoming the new revision of the entity.The text was updated successfully, but these errors were encountered: