-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement reindexer sweeper #149
base: main
Are you sure you want to change the base?
Conversation
… or as "keyword" if not present there Manual tests against docker registry: - logged counts are correct - missing mappings are added - types of missing mappings change according to the resolved typename - metadata updates are written to db - metadata updates are sufficient to trigger re-index, causing previously-unsearchable properties to become searchable. - presence of metadata attribute excludes document from document set on subsequent runs
…weep do not erroneously get flagged as processed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
per @jordanpadams review/live-test postponed until next week, after the current site demos. |
…o the resolution logic
9339ede
to
3090b5f
Compare
this should prevent unintended continuous looping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexdunnjpl see comments
@jordanpadams regarding the "needs to work with I suggest punting those. If any of your comments amount to "this should be free-text and not keyword", then I can make those updates. |
…riate this is because the retry would pass the consumed iterator to subsequent calls
…pings protection against race condition is provided by harvest-time constraint to LT sweeper execution timestamp
5ca906b
to
0e9d6b2
Compare
Status: final testing in progress |
dateutil is the official third-party library for parsing
🗒️ Summary
Implements a sweeper with the following behaviour:
registry
index which haven't been swept yet by this sweeper and were not been harvested since the time the sweeper began execution (to ensure consistency of queries)registry-dd
indexregistry-dd
index if available, else defaults tokeyword
_mappings
for all missing propertiesops:Provenance/ops:reindexed_at
, which is used to identify when documents were checked/fixed, and which triggers a whole-document reindex operation, ensuring that any just-added mappings become searchable for that document.@jordanpadams @tloubrieu-jpl could you please review these requirements to ensure that they accurately reflect the desired behaviour?
⚙️ Test Data and/or Report
Manually tested:
@jordanpadams @tloubrieu-jpl having tested locally on the I&T db, would you like me to run the sweeper against one/some of the MCP indices, after the code has been reviewed?
♻️ Related Issues
resolves #148 once run/deployed
related to NASA-PDS/registry#230