You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #522 I talked about a way we might use S3 as a backend. One point I made is that versioning would require pulling down from S3 the entire object which would be terrible in the case of very large objects (TB sized objects but even a few GB would make updates slow).
In his reply@pwinckles stated that the most recent inventory is likely the only thing needed.
This ticket is about thinking through how to detect that a file has been deleted in the next version without needing the whole object (which I can't see is possible - hence why I'm asking for help!).
Consider the following:
v1 v2
|- File A - hash X |- File A - hash X
No change; do not create new version.
v1 v2
|- File A - hash X |- File A - hash X
|- File B - hash Y
New file; create new version referencing File A -> v1 and File B -> v2
v1 v2
|- File A - hash X |- File A - hash Z
|- File B - hash Y |- File B - hash Y
File changed (File A); create new version referencing File B -> v1, File A -> v2
Our library works by digesting the path (walking the object tree and producing pairs of files + hashes) and then comparing the new tree to the existing tree. In all of the cases above we get the expected behaviour. However, if we didn't have the whole dataset available in the new version then changing a file would result in all of the other data being removed from the next version.
v1 v2
|- File A - hash X |- File A - hash Z
|- File B - hash Y
File changed (File A); create new version referencing File A -> v2 but File B
ends up removed from the new version.
So, if I've thought this through correctly, comparing against the latest inventory rather than a full digest means we will pick up file changes and file additions but we won't be able to remove a file from one version to the next as we would always need to include everything that is referenced in the latest version.
Is there another way to detect file deletions across versions without needing all of the object data up to that point?
The text was updated successfully, but these errors were encountered:
After a good long discussion with my colleague @ptsefton it has become clear to me that what I'm discussing is an implementation detail of my library. Accordingly, I'm moving this ticket to my library but I'll link it here so that anyone who wants to follow along can.
Discussion and ideas about handling object operations in a sensible manner: CoEDL/ocfl-js#3
That said, this ticket can be closed if necessary.
Thanks for closing the loop on this, @marcolarosa . It is very exciting to see / follow how you are implementing OCFL for your use case. Please keep everyone posted, and feel free to join the community meetings whenever it makes sense.
In #522 I talked about a way we might use S3 as a backend. One point I made is that versioning would require pulling down from S3 the entire object which would be terrible in the case of very large objects (TB sized objects but even a few GB would make updates slow).
In his reply @pwinckles stated that the most recent inventory is likely the only thing needed.
This ticket is about thinking through how to detect that a file has been deleted in the next version without needing the whole object (which I can't see is possible - hence why I'm asking for help!).
Consider the following:
Our library works by digesting the path (walking the object tree and producing pairs of files + hashes) and then comparing the new tree to the existing tree. In all of the cases above we get the expected behaviour. However, if we didn't have the whole dataset available in the new version then changing a file would result in all of the other data being removed from the next version.
So, if I've thought this through correctly, comparing against the latest inventory rather than a full digest means we will pick up file changes and file additions but we won't be able to remove a file from one version to the next as we would always need to include everything that is referenced in the latest version.
Is there another way to detect file deletions across versions without needing all of the object data up to that point?
The text was updated successfully, but these errors were encountered: