Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronization of Legacy Solr Registry in the new #68

Merged
merged 7 commits into from
Sep 12, 2023
Merged

Conversation

tloubrieu-jpl
Copy link
Member

@tloubrieu-jpl tloubrieu-jpl commented Sep 1, 2023

🗒️ Summary

A new sweeper is added.
It can be optionally launched in addition to the default sweepers with option --legacy-sync

⚙️ Test Data and/or Report

Build the docker image.

Run on en-delta:

docker run -e CCS_CONN='atm-delta'  -e PROV_ENDPOINT='https://search-en-delta-cxbnln2tudbdrclt4nr7scdmpu.us-west-2.es.amazonaws.com' -e PROV_CREDENTIALS='{"*******": "*******"}' registry-sweepers /usr/local/bin/sweepers_driver.py --legacy-sync

Run in production:

docker run -e CCS_CONN='atm-prod-ccs,geo-prod-ccs,img-prod-ccs,naif-prod-ccs,ppi-prod-ccs,psa-prod,rms-prod,sbnpsi-prod-ccs,sbnumd-prod-ccs' -e PROV_ENDPOINT='https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com' -e PROV_CREDENTIALS='{"*****": "*********"}' registry-sweepers /usr/local/bin/sweepers_driver.py --legacy-sync

♻️ Related Issues

Fixes #58

@nutjob4life
Copy link
Member

nutjob4life commented Sep 1, 2023

@tloubrieu-jpl in the pull request description the password ****** appears. Is this the actual password?

If so, it should be changed as soon as possible!

Copy link
Member

@nutjob4life nutjob4life left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor fixes, but definitely a security vulnerability should be addressed

docker/README.md Outdated Show resolved Hide resolved
docker/README.md Outdated Show resolved Hide resolved
docker/README.md Outdated Show resolved Hide resolved
docker/README.md Outdated Show resolved Hide resolved
setup.cfg Outdated Show resolved Hide resolved
Copy link
Contributor

@alexdunnjpl alexdunnjpl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mandatory: Branch integration tests pass

Other changes requested per granular comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request refactor - rather than defining sweepers and optional sweepers, then maintaining separate execution paths for each, it is preferable to

  1. define a collection of sweepers (either declaratively or by declaring the always-run sweepers then conditionally mutating that collection immediately to add the desired optional sweepers)
  2. run a common execution path for the set of all sweepers (always + conditional), treating them alike

if that's not clear, let me know and I'll modify it first-thing Tuesday when I'm back at work - it's a quick change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request: use descriptive variable names in preference to o, s

request: use a more idiomatic approach to the optional conditional check

  • argument should set a binary flag with `action='store_true'
  • conditional check should use something like if sweeper_name in args and arg[sweeper_name] == True (I forget the syntax for pulling variable-named args from argparser, but you get the idea)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Alex, indeed I only tested with the option present.

@tloubrieu-jpl
Copy link
Member Author

@alexdunnjpl @nutjob4life I made updates to the PR following your comments.

I applied the linter on the code and had to add comments to make the linter ignore 3 lines.

I also noted that the unit tests are broken but I don't see how this pull request would have an impact on them.

@alexdunnjpl
Copy link
Contributor

alexdunnjpl commented Sep 6, 2023

@tloubrieu-jpl I haven't looked closely at the rest yet, but L520 through L531 show the from the test details/log show the issue

https://github.com/NASA-PDS/registry-sweepers/actions/runs/6103467062/job/16563918148?pr=68#step:5:521

There should not be any changes required to the two utilities files - I'd suggest reverting those (given that they happen to be the files referred to in the failing test stage)

@tloubrieu-jpl
Copy link
Member Author

Hi @alexdunnjpl , those changes were made to go through the linter (tox -e lint). which work but the unit tests (pytest) don't.

Have you had the precommit configuration deployed in your repository (see pre-coomit commands in https://github.com/NASA-PDS/template-repo-python#installation) ?

Another question, how do you run the unit tests, I mean with which command ?

@alexdunnjpl
Copy link
Contributor

@tloubrieu-jpl both the tests and the linter are/should-be run through tox, rather than pytest.

The tests may be run via the (somewhat unintuitively-named) py310 tox profile, so tox -e py310, and are not currently failing, as can be confirmed from examination of the github action logs.

The changes under registrysweepers.db.* are not required to pass linting and should be reverted (I have just tested this to confirm).

The outstanding (linting) errors are as follows, and will require either installation of the library stubs, or appropriate #ignores

src/pds/registrysweepers/legacy_registry_sync/opensearch_loaded_product.py:3: error: Cannot find implementation or library stub for module named "elasticsearch"  [import]
src/pds/registrysweepers/legacy_registry_sync/legacy_registry_sync.py:4: error: Cannot find implementation or library stub for module named "elasticsearch.helpers"  [import]
src/pds/registrysweepers/legacy_registry_sync/legacy_registry_sync.py:4: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
src/pds/registrysweepers/legacy_registry_sync/legacy_registry_sync.py:4: error: Cannot find implementation or library stub for module named "elasticsearch"  [import]

My precommit hook (slightly modified so that it re-runs on autofixable failure) is as follows:

#!/bin/sh
#
# An example hook script to verify what is about to be committed.
# Called by "git commit" with no arguments.  The hook should
# exit with non-zero status after issuing an appropriate message if
# it wants to stop the commit.
#
# To enable this hook, rename this file to "pre-commit".

if git rev-parse --verify HEAD >/dev/null 2>&1
then
        against=HEAD
else
        # Initial commit: diff against an empty tree object
        against=$(git hash-object -t tree /dev/null)
fi

# If you want to allow non-ASCII filenames set this variable to true.
allownonascii=$(git config --bool hooks.allownonascii)

# Redirect output to stderr.
exec 1>&2

# Cross platform projects tend to avoid non-ASCII filenames; prevent
# them from being added to the repository. We exploit the fact that the
# printable range starts at the space character and ends with tilde.
if [ "$allownonascii" != "true" ] &&
        # Note that the use of brackets around a tr range is ok here, (it's
        # even required, for portability to Solaris 10's /usr/bin/tr), since
        # the square bracket bytes happen to fall in the designated range.
        test $(git diff --cached --name-only --diff-filter=A -z $against |
          LC_ALL=C tr -d '[ -~]\0' | wc -c) != 0
then
        cat <<\EOF
Error: Attempt to add a non-ASCII file name.

This can cause problems if you want to work with people on other platforms.

To be portable it is advisable to rename the file.

If you know what you are doing you can disable this check using:

  git config hooks.allownonascii true
EOF
        exit 1
fi

STAGED_FILES=$(git diff --cached --name-only --diff-filter=ACM | grep ".py$")

echo "Linting files"
tox -e lint
git add $STAGED_FILES
if [ $? -ne 0 ]
then
    echo "Initial lint detected errors, re-linting to determine whether errors remain"
    tox -e lint
    if [ $? -ne 0 ]
    then
      exit 1
    fi
fi

exit 0

@tloubrieu-jpl
Copy link
Member Author

@alexdunnjpl @nutjob4life I made some changes to make tox work.

  1. namespace_packages is deprecated in setup.cfg: I renamed the package pds.registry-sweepers and removed the line in the setup.cfg.
  2. I changed the option to allow pytest as an external tool for tox.

Can you review these changes and validate if you are happy with them ?

Copy link
Contributor

@alexdunnjpl alexdunnjpl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good after changes

@tloubrieu-jpl
Copy link
Member Author

Hi @nutjob4life , can you validate that your requested changes are done ? Thanks

@nutjob4life
Copy link
Member

Good morning @tloubrieu-jpl, this all checks out 🎉

@tloubrieu-jpl tloubrieu-jpl merged commit aa747c6 into main Sep 12, 2023
1 check passed
@tloubrieu-jpl tloubrieu-jpl deleted the solr_sync_58 branch September 12, 2023 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regularly synchronize the legacy registry collections on Solr in the registry in opensearch
3 participants