Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotations' attributes get overwritten by CWRC-Writer #282

Open
lucaju opened this issue Aug 27, 2020 · 6 comments
Open

Annotations' attributes get overwritten by CWRC-Writer #282

lucaju opened this issue Aug 27, 2020 · 6 comments
Assignees

Comments

@lucaju
Copy link
Contributor

lucaju commented Aug 27, 2020

RDF annotations attributes get overwritten by CWRC-Writer every time it rebuilds the XML. More specifically, the annotation ID (which includes de URL), the date when the annotation was created and issued get updated to the current timestamp, and the creator's information is replaced with the current user's data. If the file is saved, previous information is lost.

For instance:

  • dcterms:creator is replaced with the current user's @id, @cwrc:hasname, and foaf:nick
  • dcterms:created and dcterms:issued are replaced with the current timestamp.

Also, some of the metadata stored within the annotation is not up-to-date or incorrect. For instance,

  • ad:generator.id is hardcoded as "https://cwrc-writer.cwrc.ca/", which is problematic when using CWRC-Writer from withing Islandora, or even an external standalone installation.
  • as:generator.schema:softwareVersion is also hardcoded as "1.0", which I am assuming is CWRC-Writer's version, currently om version 6.0.0

Expected Behaviour

Unless the user deliberately changes an annotation, CWRC-Writer should preserve annotation's attributes, including the ID, the dates and the creator.

Current Behaviour

Previous information gets replaced when the file is saved,

Possible Solution

Rework the process of generating annotations. Currently, it seems that some attributes (date, creator, etc) are not stored as objects when the file is loaded. Storing these attributes and use them to check if there was any change to the annotation prior to regenerate the XML might solve the problem.

Since this is an important and sensitive feature, these changes should be discussed and agreed upon before implemented.

Steps to Reproduce (for bugs)

  1. Load Sample Letter from the template.
  2. Save the document on your own repository.
  3. Check how the annotation got saved and comparate with the original file
@lucaju lucaju assigned lucaju and ilovan and unassigned lucaju Aug 27, 2020
@lucaju lucaju added the bug label Aug 27, 2020
@lucaju
Copy link
Contributor Author

lucaju commented Aug 28, 2020

The attribute appVersion will be set to read the current version from package.json.
The attribute as:generator.id will be set to read from window.location.origin, which is the host (and protocol) from where CWRC-Writer is been used. e.g., https://cwrc-writer.cwrc.ca.

@lucaju
Copy link
Contributor Author

lucaju commented Aug 28, 2020

A few thought about the dates attributes.
we might want to include dcterms:modified to store when the annotation was updated for the last time.

We might want to rethink the use of dcterms:issued. It seems that it should describe when a document was formally issued. They give the following example:
"A government file, officially released in 1997, consisting of photographs taken in 1985 of hundreds of meteorite fragments collected in 1952 could be described with the following metadata:
DC.Date->Issued: 1997
DC.Date->Created: 1985
DC.Date->DataGathered: 1952
"
(https://www.dublincore.org/specifications/dublin-core/date-element/)

Since we are dealing with annotation metadata (as opposed to the content of the annotation), the issue date is the same as the created date, which can be either the moment when the user creates the annotation or when the file is saved (committed). Alternatively, we can use this pair of attributes (created & issued) to describe the above process (creation and save). Not sure if it would make any difference, though.

@lucaju
Copy link
Contributor Author

lucaju commented Sep 1, 2020

Currently, when loading a file, CWRC-Writer parses (and remove) RDF annotation from the XML, and stores them in JS objects (Entities). These objects are used to manage entities, edit, and display entities. Later on, when the user decides to save (or to check the XML), these objects are used to regenerate JSON-LD and put them back in the XML (inside XENODATA).

The problem is that when initially parsing the RDF, CWRC-Writer only stores part of the attributes, which does not include creator, for instance. Them, when regenerating the annotation, the creator becomes the current user. The date of creation is also a problem since the process of regeneration updates the annotation's creation timestamp. In the end, the annotation gets overwritten with the information of the latest user that saved the file and we lost the original date and creator attributes.

The solution to this issue might be as trivial as storing these crucial attributes to be used to regenerate the annotation later. while this works fine when the annotation doesn't change in the session, how we will handle updates? Should we update the date of creation's timestamp? Or perhaps add a new attribute to store the latest updated (dcterms:modified)? What if the annotations got changed multiple times? And What to do with the creator? Keep the original creator? replace? add an array of contributors?

Perhaps @SusanBrown can help with these questions.

@lucaju
Copy link
Contributor Author

lucaju commented Sep 1, 2020

I'm not well versed in JSON-LD neither how people used, but perhaps there is more to this. Perhaps, a bigger question (at least for me at this point) is how CWRC-WRITER supports JSON-LD and how it handles it? What attributes (and from which namespaces) we are using?

Anatomy of a CWRC-Writer JSON-LD.
So, here an example of one annotation saved by CWRC-Writer:

{
	"@context": {
		"as": "http://www.w3.org/ns/activitystreams#",
		"cwrc": "http://sparql.cwrc.ca/ontologies/cwrc#",
		"dc": "http://purl.org/dc/elements/1.1/",
		"dcterms": "http://purl.org/dc/terms/",
		"foaf": "http://xmlns.com/foaf/0.1/",
		"geo": "http://www.geonames.org/ontology#",
		"oa": "http://www.w3.org/ns/oa#",
		"schema": "http://schema.org/",
		"xsd": "http://www.w3.org/2001/XMLSchema#",
		"dcterms:created": {
			"@type": "xsd:dateTime",
			"@id": "dcterms:created"
		},
		"dcterms:issued": {
			"@type": "xsd:dateTime",
			"@id": "dcterms:issued"
		},
		"oa:motivatedBy": {
			"@type": "oa:Motivation"
		},
		"@language": "en"
	},
	"@id": "https://raw.githubusercontent.com/ilovan/Git-Writer-tests/master/templates/sample_letter?correction_annotation_20190814144101",
	"@type": "oa:Annotation",
	"dcterms:created": "2019-08-14T20:41:01.124Z",
	"dcterms:issued": "2019-08-14T20:44:01.985Z",
	"dcterms:creator": {
		"@id": "https://github.com/ilovan",
		"@type": [
			"cwrc:NaturalPerson",
			"schema:Person"
		],
		"cwrc:hasName": "Mihaela Ilovan",
		"foaf:nick": "ilovan"
	},
	"oa:motivatedBy": "oa:editing",
	"oa:hasTarget": {
		"@id": "https://raw.githubusercontent.com/ilovan/Git-Writer-tests/master/templates/sample_letter?correction_annotation_20190814144101#Target",
		"@type": "oa:SpecificResource",
		"oa:hasSource": {
			"@id": "https://raw.githubusercontent.com/ilovan/Git-Writer-tests/master/templates/sample_letter",
			"@type": "dctypes:Text",
			"dc:format": "text/xml"
		},
		"oa:renderedVia": {
			"@id": "https://cwrc-writer.cwrc.ca/",
			"@type": "as:Application",
			"rdfs:label": "CWRC Writer",
			"schema:softwareVersion": "1.0"
		},
		"oa:hasSelector": {
			"@id": "https://raw.githubusercontent.com/ilovan/Git-Writer-tests/master/templates/sample_letter?correction_annotation_20190814144101#Selector",
			"@type": "oa:XPathSelector",
			"rdf:value": "TEI/text/body/div/p[2]/choice"
		}
	},
	"oa:hasBody": {
		"@type": "fabio:Correction",
		"dc:format": "text/xml",
		"rdf:value": "when"
	},
	"as:generator": {
		"@id": "https://cwrc-writer.cwrc.ca/",
		"@type": "as:Application",
		"rdfs:label": "CWRC Writer",
		"schema:url": "https://cwrc-writer.cwrc.ca",
		"schema:softwareVersion": "1.0"
	}
}

lucaju added a commit that referenced this issue Sep 4, 2020
Store orginal annotation into entity object. Add modifiedDate, didUpdate, creator and creators as
property to entity. Update modifiedDate when user edit annotation. Update xpath of context around
changes (without modify the entity). Update annotationId if a new file is created. Add list of
contribuitors when annotation gets modified.

#282
@lucaju
Copy link
Contributor Author

lucaju commented Sep 4, 2020

Made some more adjustments and updates (9e3c0cf):

  • (FIX) Annotations will now persist with their metadata when loading/saving if there is no change to them (with two exceptions below:
    • The attribute dcterms:issued will be removed (there is no point to having this attribute. dcterms:created covers the date of creation)
    • The attribute dcterms:modified will be added with the same date as dcterms:created
  • If the user deliberately edit the annotation (add properties, change entities, etc.), updates dcterms:modified timestamp and add the user to the list of the annotation's contributors (dcterms:contributors)
  • Update the annotation's XPath if it changes through the course of the session. This doesn't modify the annotation, i.e., doesn't update the date or add contributors.
  • When the file is duplicated and save elsewhere, the annotation's ID gets updated. This doesn't modify the annotation, i.e., doesn't update the date or add contributors.

@lucaju
Copy link
Contributor Author

lucaju commented Sep 4, 2020

A few questions for consideration (@SusanBrown @ilovan )

1. What constitutes, or what should trigger an update to an annotation?

Perhaps a better question is What counts toward modifying an annotation? (as opposed to contextual updates)

  • if the user deliberately edits attributes or properties of the annotation or entity.
    • It should trigger a modification. Modified Date is updated, and the user is added to the list of contributors.
  • If XPath changes (due to changes in the document context, like splitting a paragraph or adding a tag around the entity).
    • It doesn't NOT modify the annotation, but the XPath metadata will be updated. No changes to modified date or list of contributors
  • If the file is duplicated.
    • It doesn't NOT modify the annotation, but the annotation's ID changes to reflect the new path (the ID is a concatenation of the source of the file (URL), the type of annotation, and date of creation). No changes to the modified date or list of contributors.

2. Is it ok to have an array of contributors?

DCMI defines the term contributor as "an entity responsible for making contributions to the resource. The guidelines for using names of persons or organizations as creators apply to contributors."

I decided to use it to add users that modify the annotation. Every time a user deliberately modifies an annotation, the modified date gets updated, and the user is added to the list of contributors (if not already, and if not the creator).

But it is not clear in the documentation if there can be more than one contributor. They say, though, that contributor (term) is a subproperty of contributor (element) (both in singular).

Example:

"dcterms:contributor": [
	{
		"dcterms:contributor": {
			"@id": "https://github.com/lucaju",
			"@type": [
				"cwrc:NaturalPerson",
				"schema:Persosn"
			],
			"cwrc:hasName": "Luciano Frizzera",
			"foaf:nick": "lucaju"
		}
	},
        {
		"dcterms:contributor": {
			"@id": "https://github.com/sbrown",
			"@type": [
				"cwrc:NaturalPerson",
				"schema:Persosn"
			],
			"cwrc:hasName": "Susan Brown",
			"foaf:nick": "sbrown"
		}
	}
]

3. On the as:generator attribute, what is the difference between @id and #schema:url?

Both point to the same place: the URL from which CWRC-Writer is been used (e.g., cwrc-writer.cwrc.ca).

"as:generator": {
	"@id": "https://cwrc-writer.cwrc.ca/",
	"@type": "as:Application",
	"rdfs:label": "CWRC Writer",
	"schema:url": "https://cwrc-writer.cwrc.ca",
	"schema:softwareVersion": "1.0"
}

4. Is there any particular order in which JSON-LD should be built?

I understand that @context should be put at the top of JSON-LD document. What about the other attributes? Is there a standard for that, or should we come up with our own order? Alphabetically? Logically (eg., created and modified next to each other)?

lucaju pushed a commit that referenced this issue Dec 4, 2020
# [7.0.0](v6.0.0...v7.0.0) (2020-12-04)

### Bug Fixes

* **annotation:** fix issues with annotations been overwritten when saved ([9e3c0cf](9e3c0cf)), closes [#282](#282)
* **annotation:** get app version from package.json and app id from the current CWRC-Writer instance ([cc52a20](cc52a20))
* **annotation:** improve test to add contributors ([e9841d4](e9841d4))
* **nerve:** use data from the form elements to update the entity ([01a0c37](01a0c37))
* **nssi[nerve]:** skip teiheader when sending document to nssi[nerve] ([f797df5](f797df5)), closes [#285](#285)
* **schematags:** filter tags using uppercase input ([7b8ce13](7b8ce13)), closes [#286](#286)
* assure single attribute dropdown is an array instead of a string ([d69e8a9](d69e8a9)), closes [#283](#283)

### Code Refactoring

* **gitdialog:** provide writer instance to logout component ([9bc1ae6](9bc1ae6))

### Features

* 🎸 config commitzen. Bump up version due to CI changes ([d7119e5](d7119e5))
* 🎸 travis-ci tweak ([7e0f4ad](7e0f4ad))
* 🎸 travis-ci tweaks ([448f75f](448f75f))

### BREAKING CHANGES

* **gitdialog:** provide writer instance to logout component
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants