Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

global: clean up duplicate table DOIs in production instance #790

Open
GraemeWatt opened this issue May 2, 2024 · 0 comments
Open

global: clean up duplicate table DOIs in production instance #790

GraemeWatt opened this issue May 2, 2024 · 0 comments
Labels
complexity: medium priority: low type: bug Indicates an unexpected problem or unintended behaviour

Comments

@GraemeWatt
Copy link
Member

When reindexing the QA instance after deploying PR #766 some of the records gave an exception:

sqlalchemy.exc.MultipleResultsFound: Multiple rows were found when exactly one was required

from the line:

submission = DataSubmission.query.filter_by(doi=doc["doi"]).one()

I just changed this line in commit 319ff15 to make it tolerate multiple results. However, it should be investigated in more detail why there are multiple DataSubmission objects with the same doi. I found 6 examples:

These all date from the early days of hepdata.net in 2017/2018 when the submission code was buggy and the procedure for replacing uploads was not done cleanly. It should be investigated how to clean up the database to remove the duplicate DOIs.

@GraemeWatt GraemeWatt added type: bug Indicates an unexpected problem or unintended behaviour priority: low complexity: medium labels May 2, 2024
@github-project-automation github-project-automation bot moved this to To do in @HEPData May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity: medium priority: low type: bug Indicates an unexpected problem or unintended behaviour
Projects
Status: To do
Development

No branches or pull requests

1 participant