Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid dynamic-plugins writing conflicts #2285

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

gazarenkov
Copy link
Member

@gazarenkov gazarenkov commented Jan 29, 2025

Description

It introduces quasi locking mechanism to make sure install-dynamic-plugins script does not write to dynamic-plugins-root directory concurrently to shared persistence volume.

Which issue(s) does this PR fix

https://issues.redhat.com/browse/RHIDP-5732

PR acceptance criteria

Please make sure that the following steps are complete:

  • GitHub Actions are completed and successful
  • Unit Tests are updated and passing
  • E2E Tests are updated and passing
  • Documentation is updated if necessary (requirement for new features)
  • Add a screenshot if the change is UX/UI related

How to test changes / Special notes to the reviewer

To test on Operator:

  1. Configure PVC for dynamic-plugins-root volume in profile/rhdh/default-config/deployment.yaml
  volumes:
        - name: dynamic-plugins-root
          persistentVolumeClaim:
            claimName: dynamic-plugins-root
#        - ephemeral:
#          name: dynamic-plugins-root     

and deploy Operator with it
2. Create PVC like

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dynamic-plugins-root
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 2Gi
  1. Create multi replica Backstage CR like:
apiVersion: rhdh.redhat.com/v1alpha3
kind: Backstage
metadata:
  name: bs2
spec:
  deployment:
    patch:
      spec:
        replicas: 3
        template:
          spec:
            containers:
              - name: backstage-backend
                image: quay.io/gazarenk/rhdh:latest
            initContainers:
              - name: install-dynamic-plugins
                image: quay.io/gazarenk/rhdh:latest
            volumes:
              - persistentVolumeClaim:
                  claimName: dynamic-plugins-root
  1. Make sure it is deployed successfully and
    kubectl logs backstage-bs2-XXXXXXXXX -c install-dynamic-plugins -n on one Pod should start with:
    `
    ======= Created lock file: /dynamic-plugins-root/install-dynamic-plugins.lock

`on 2 others:

======= Waiting for lock release...
======= Lock released.
======= Created lock file: /dynamic-plugins-root/install-dynamic-plugins.lock

Copy link

openshift-ci bot commented Jan 29, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign coreydaley for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

github-actions bot commented Jan 29, 2025

Hey there and thank you for opening this pull request! 👋🏼

We require pull request titles to follow the Conventional Commits specification and it looks like your proposed title needs to be adjusted.

Details:

No release type found in pull request title "Avoid dynamic-plugins writing conflicts". Add a prefix to indicate what kind of release this pull request corresponds to. For reference, see https://www.conventionalcommits.org/

Available types:
 - feat
 - fix
 - chore
 - docs
 - style
 - refactor
 - perf
 - test
 - revert

Copy link
Contributor

Copy link
Member

@durandom durandom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scenario works if all goes well. But in case of a graceful termination the lock would not be removed - see inline comments.

also, if the pod is killed via SIGKILL or OOM, then the lock remains. That means something is wrong anyway and has to be adjusted. But I think we should provide a command to remove the lock.

oc exec deploy/backstage-rhdh -c install-dynamic-plugins -- rm //dynamic-plugins-root/install-dynamic-plugins.lock

Maybe put that into the docs/dynamic-plugins/installing-plugins.md section and get it into our docs?

dynamicPluginsRoot = sys.argv[1]

lock_file_path = os.path.join(dynamicPluginsRoot, 'install-dynamic-plugins.lock')
atexit.register(remove_lock, lock_file_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if that also handles a SIGTERM, which is send by k8s to gracefully terminate the pod. To handle that scenario you could add this

import signal
# Register signal handlers
signal.signal(signal.SIGTERM, remove_lock)  # Kubernetes graceful shutdown

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks! Fixed

@kadel
Copy link
Member

kadel commented Jan 30, 2025

the scenario works if all goes well. But in case of a graceful termination the lock would not be removed - see inline comments.

also, if the pod is killed via SIGKILL or OOM, then the lock remains. That means something is wrong anyway and has to be adjusted. But I think we should provide a command to remove the lock.

Would it make sense to have a kind of time-out on the lock? For example, if the lock is older than 1 hour, just delete the old lock and assume you script is good to go.

@durandom
Copy link
Member

Would it make sense to have a kind of time-out on the lock? For example, if the lock is older than 1 hour, just delete the old lock and assume you script is good to go.

Maybe, but isn't in that case something broken already and would require manual intervention?

Copy link
Contributor

@gazarenkov
Copy link
Member Author

Would it make sense to have a kind of time-out on the lock? For example, if the lock is older than 1 hour, just delete the old lock and assume you script is good to go.

Maybe, but isn't in that case something broken already and would require manual intervention?

Doc added, in principle can add time-out based releasing too... I thought about it as well but not sure. @kadel Are you insisting on it? :)

Copy link
Contributor

@gazarenkov gazarenkov requested a review from durandom February 1, 2025 09:28
Copy link
Member

@rm3l rm3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments below..

@rm3l
Copy link
Member

rm3l commented Feb 1, 2025

the scenario works if all goes well. But in case of a graceful termination the lock would not be removed - see inline comments.

also, if the pod is killed via SIGKILL or OOM, then the lock remains. That means something is wrong anyway and has to be adjusted. But I think we should provide a command to remove the lock.

oc exec deploy/backstage-rhdh -c install-dynamic-plugins -- rm //dynamic-plugins-root/install-dynamic-plugins.lock

Maybe put that into the docs/dynamic-plugins/installing-plugins.md section and get it into our docs?

Would manual intervention be acceptable to production users?

def create_lock(lock_file_path):
while True:
try:
with open(lock_file_path, 'x'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a risk that multiple replicas call this at the same time, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants