Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

similarity matching edge cases - same artwork across multiple institutions #21

Open
lklic opened this issue Jul 23, 2021 · 2 comments
Open
Assignees
Labels
validation needs verification before closing

Comments

@lklic
Copy link
Contributor

lklic commented Jul 23, 2021

I was going through the updated pages and had some thoughts on the way we create sameAs links through the matching process, and I wanted to share them here before I forget. I just want to make sure we are able to account for edge cases where there is a greater level of complexity to the matching process.

The scenario I was thinking about was related to our conversation about choosing which Pharos URI would be deprecated after creating a match. While the choice is less relevant when we only have one match between two artworks, the situation becomes more complex when we have multiple pairs of artworks that match across institutions.

In our similarity model let's say we have the following:

AA = Artwork A Pharos URI
A = Artwork A local URI

BB = Artwork B Pharos URI
B = Artwork B local URI

CC = Artwork C Pharos URI
C = Artwork C local URI

In a scenario where there is a match between Artworks local URI A, B, and C, where the artwork is X, a user first marks Artwork A + B as being the same. In this case, we have the following:

AA = Canonical URI for artwork X
A sameAs AA (no change)

BB = Deprecated URI that redirects to AA
B sameAs AA (updated sameAs link, B sameAs link to BB is dropped

So far we are OK.

Next, the user reviews Artwork C with similarity data to Artwork B.
In this case, the model has no way to account for the change that has previously happened with the dropped link between the Pharos URI BB and local URI B. If the user marks these as being the same the following will happen according to our model:

CC = this becomes the canonical URI for artwork X
C sameAs CC (no change)

BB = Deprecated URI that redirects to CC
B sameAs CC (updated sameAs link, B sameAs link to BB is dropped)

OR

BB = this becomes the canonical URI for artwork X
B sameAs BB (no change)

CC = Deprecated URI that redirects to BB
C sameAs BB (updated sameAs link, C sameAs link to CC is dropped)

Either way what we really want is the following:

AA = Canonical URI for artwork X
A sameAs AA (no change)

BB = Deprecated URI that redirects to AA
B sameAs AA (updated sameAs link, B sameAs link to BB is dropped)

CC = Deprecated URI that redirects to AA
C sameAs AA (updated sameAs link, C sameAs link to CC is dropped)

I think that the way to resolve this is to update the insert query for the "Same" button to ensure that one of the Pharos URI's has not been marked as deprecated, and if it has, then to link to the new canonical URI, which may not be one of the two artworks that the user is reviewing.

I think we should run some tests and find cases where a single artwork has multiple matchings across institutions, and then test our these edge cases.

@MinadakisNikos
Copy link

"I think that the way to resolve this is to update the insert query for the "Same" button to ensure that one of the Pharos URI's has not been marked as deprecated, and if it has, then to link to the new canonical URI, which may not be one of the two artworks that the user is reviewing" Yes, this will work and is easily imlememented.

"I think we should run some tests and find cases where a single artwork has multiple matchings across institutions, and then test our these edge cases."
@mafragias a nice example is the "Adoration of the Magi" that we used for testing the IM algorithm that is present in Zeri, VIT and Hertziana. It is the example shows in the presentations that we made the previous year to Pharos members

@mafragias
Copy link
Contributor

mafragias commented Jul 28, 2021

I have already implemented the field, to check and retrieve only the non-depricated URIs.

Furthermore I was able to modify the INSERT query to take into account the edge cases.

My test case was :

Similarity 1 : https://vision.artresearch.net/resource/?providera=https%3A%2F%2Fpharos.artresearch.net%2Fresource%2Fhertziana%2Fsource%2FHertziana&providerb=https%3A%2F%2Fpharos.artresearch.net%2Fresource%2Fitatti%2Fsource%2FITatti&uri=https%3A%2F%2Fvision.artresearch.net%2Fresource%2FSimilarity%2F201988995487132681643471545

where originally the same as properties were :

<https://pharos.artresearch.net/resource/hertziana/work/08084657> owl:sameAs <https://pharos.artresearch.net/resource/pharos/artwork/6e68b3764108d40b008c50a95a2e31e1bc2ec81f>.

<https://pharos.artresearch.net/resource/itatti/work/8000843846> owl:sameAs <https://pharos.artresearch.net/resource/pharos/artwork/932d16024c7ca67c3f544a94ab45fbdf23af94e3>.

After the match it created the below same as properties

<https://pharos.artresearch.net/resource/hertziana/work/08084657> owl:sameAs <https://pharos.artresearch.net/resource/pharos/artwork/6e68b3764108d40b008c50a95a2e31e1bc2ec81f>.

<https://pharos.artresearch.net/resource/itatti/work/8000843846> owl:sameAs <https://pharos.artresearch.net/resource/pharos/artwork/6e68b3764108d40b008c50a95a2e31e1bc2ec81f>.

Similarity 2 : https://vision.artresearch.net/resource/?providera=https%3A%2F%2Fpharos.artresearch.net%2Fresource%2Fhertziana%2Fsource%2FHertziana&providerb=https%3A%2F%2Fpharos.artresearch.net%2Fresource%2Fitatti%2Fsource%2FITatti&uri=https%3A%2F%2Fvision.artresearch.net%2Fresource%2FSimilarity%2F54159493083106447977102971

In the second similarity case we had :

<https://pharos.artresearch.net/resource/hertziana/work/08084657> owl:sameAs <https://pharos.artresearch.net/resource/pharos/artwork/6e68b3764108d40b008c50a95a2e31e1bc2ec81f>.

<https://pharos.artresearch.net/resource/itatti/work/8000851667> owl:sameAs <https://pharos.artresearch.net/resource/pharos/artwork/1d080280d262adc3bf85c868608b91dab5522c7b> .

I set it up so it will take the second URI, so I can test what will happen and it concluded to this result :
image

where all the non pharos URIs has owl:sameAs property with the same pharos URI, while of course setting the rest of the pharos URIs as deprecated.

The basic idea was to check the sameAs connections to the pharos URI that would be set to deprecated and set those to the new pharos URI at hand.

@mafragias mafragias added the validation needs verification before closing label Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
validation needs verification before closing
Projects
None yet
Development

No branches or pull requests

4 participants