Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore options for recovering deleted draft submissions #749

Open
ktuite opened this issue Oct 9, 2024 · 2 comments · May be fixed by getodk/central-backend#1299
Open

Explore options for recovering deleted draft submissions #749

ktuite opened this issue Oct 9, 2024 · 2 comments · May be fixed by getodk/central-backend#1299
Assignees
Labels
backend Requires a change to the API server enhancement New feature or behavior

Comments

@ktuite
Copy link
Member

ktuite commented Oct 9, 2024

If a draft form gets shared with data collectors and collects real submissions, it can lead to a number of problems:

  • If the draft form gets updated (or deleted, or even published), all submissions associated with that draft version are automatically purged. The only possible option is checking to see if they are still on the devices and resending.
  • There is no easy way to preserve the draft submissions or re-associate them with a published version of the form, i.e. publish the form and have the draft submissions carry over to the published form.

We are redesigning the draft form page and the draft QR code to help people avoid this situation, but is there anything else we can do on the backend to give these submissions a chance to be recovered?

Some ideas:

  • mark the form draft def as soft-deleted when the draft is published, replaced, or abandoned, instead of purging it immediately
  • if the draft submissions are soft deleted, can they be fetched through odata feed?
  • can we eventually have a concept of a result set / submission set so you could partition results by year, location, draft/real and have tools to move between those sets?

This issue is about investigating these ideas to see if there is a quick way to use existing deletion infrastructure to keep deleted draft submissions around for a little while.

@ktuite ktuite added backend Requires a change to the API server enhancement New feature or behavior labels Oct 9, 2024
@ktuite ktuite self-assigned this Oct 9, 2024
@ktuite ktuite moved this to 🕒 backlog in ODK Central Oct 9, 2024
@ktuite
Copy link
Member Author

ktuite commented Nov 8, 2024

Here's what I've learned so far:

We have these query module functions

  • Forms.clearUnneededDrafts (by form id or project id)
  • Submissions.clearDraftSubmissions(formId)
  • Submissions.clearDraftSubmissionsForProject(projectId)

These are used in these scenarios:

  1. Setting managed encryption on a project

    • Purges all draft submissions in that project, purges all unattached draft form defs in the project
    • I think this behavior doesn't need to change? If you're encrypting your project and your draft form and submissions get removed in the process, do you really need to be able to recover them? I.e. has anyone run into this flavor of the issue?
  2. Abandon/delete a form draft

    • Purges unattached draft form defs for that form and clears draft submissions for that form
    • This seems kind of like the desired behavior, too? If you intentionally delete your draft, you want the submission to go away, too.
    • Although, it could be ok to switch this to a soft delete?
  3. Publish a form

    • Possibly makes a new form_def (if the version needs to change)
    • Otherwise sets the currentDefId on the Form to the id of this def (and removes its draft token)
    • This can leave an orphan draft def behind if the version changed...
    • Clear the draft submissions for that form
    • Don't want to lose data here
  4. Update a form draft

    • Makes a new draft form def
    • Clears draft submissions for that form
    • Clears unattached/orphan draft form defs for that form
    • Definitely don't want to lose data here

Database things

  • Submissions (top level record) know if they are drafts... but don't link back for a form def because different sub defs could come from different form def versions. This is why we clear unneeded drafts by just deleting any draft submission in a form.
  • The CASCADE DELETE flow...
    • forms deletion cascades to form_defs, submissions and other things
    • submissions deletion cascades to submission_defs
    • form_defs deletion also cascades to submission_defs
    • If you deleted the form_def, you could get rid of the descendent submission_defs but still have the top level submission

The code was originally set up to NOT delete these things, so we could NOT call clearDraftSubmissions and clearUnneededDrafts in these most problematic cases. But there would be no path to clearing up this stale data later.

How would we go about cleaning up this stale data later? The form def / draft submission hierarchy isn't set up to make this too easy.

idea?: We could possibly soft-delete the draft submissions themselves for a given form (and then the submission purge task would come and clean them up in thirty days) and we could purge any draft form_defs that 1) aren't linked to a form as its current draft and 2) don't have any remaining submissions? Instead of calling clearDraftSubmissions(formId) in certain places, we could call softDeleteDraftSubmissions(formId)? and remove the call to clearUnneededDrafts(formId).

A possible benefit (i think) is if you did this, you could poke at the database (before the subs got purged) to set an old draft def to be the active draft def and undelete the submissions and see them again??

@matthew-white
Copy link
Member

idea?: We could possibly soft-delete the draft submissions themselves for a given form (and then the submission purge task would come and clean them up in thirty days) and we could purge any draft form_defs that 1) aren't linked to a form as its current draft and 2) don't have any remaining submissions?

We discussed this idea on a call. It makes a lot of sense to me.

  1. Setting managed encryption on a project

    • Purges all draft submissions in that project, purges all unattached draft form defs in the project
    • I think this behavior doesn't need to change? If you're encrypting your project and your draft form and submissions get removed in the process, do you really need to be able to recover them? I.e. has anyone run into this flavor of the issue?
  1. Abandon/delete a form draft

    • Purges unattached draft form defs for that form and clears draft submissions for that form
    • This seems kind of like the desired behavior, too? If you intentionally delete your draft, you want the submission to go away, too.
    • Although, it could be ok to switch this to a soft delete?

As far as I know in these two cases, there's no technical reason why we need to purge the submissions immediately rather than soft-delete them. Instead, maybe things would be simpler as a whole if we discarded draft submissions in the same way in every case (i.e., via soft deletion). For example, if we continued to call Submissions.clearDraftSubmissions() in these two cases, that method would need to be modified to not delete draft submissions that are soft-deleted. (It would immediately purge a submission only if the submission isn't already soft-deleted.)

That said, if it's simpler to handle these cases separately, I think that would also work. @lognaturel, let us know if you've heard about users losing submissions in either of these two cases.

  1. Publish a form

    • This can leave an orphan draft def behind if the version changed...

Oh interesting. So there can be orphaned form defs even today? It'd definitely be nice to clean those up. One thing I like about your idea above is that each scenario doesn't need to have its own logic for deleting orphaned form defs. Instead, orphan form defs will get purged on a regular basis by the centralized purge mechanism.

If you deleted the form_def, you could get rid of the descendent submission_defs but still have the top level submission

It looks like we previously encountered the case of a submission without a submission def as the root cause behind getodk/central-backend#911. It makes sense to me that your idea above will purge submissions first (including the logical submission) and only then go on to purge orphaned form defs that no longer have submissions.

The code was originally set up to NOT delete these things, so we could NOT call clearDraftSubmissions and clearUnneededDrafts in these most problematic cases.

I think that's true at least of orphaned form defs: we used to allow orphaned form defs to persist in some (all?) cases. It sounds like you've identified a case even today where a form def can become orphaned. Given that, I bet things will continue working properly if we stop immediately purging orphaned form defs and allow them to persist for 30 days.

I'm less sure that there's ever been a time when we didn't immediately purge draft submissions (except by accident in #911). However, I also don't think there are many queries that have to do with draft submissions exclusively and not also non-draft submissions. Any query that can retrieve non-draft submissions should already know how to handle soft-deleted submissions. It might not be a bad idea to check queries that reference submissions.draft to make sure that they filter on submissions."deletedAt" as they should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Requires a change to the API server enhancement New feature or behavior
Projects
Status: ✏️ in progress
2 participants