Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid dynamic-plugins writing conflicts #2285

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions docker/install-dynamic-plugins.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@
import subprocess
import base64
import binascii
import atexit
import time
import signal

# This script is used to install dynamic plugins in the Backstage application,
# and is available in the container image to be called at container initialization,
# for example in an init container when using Kubernetes.
Expand Down Expand Up @@ -181,8 +185,39 @@ def verify_package_integrity(plugin: dict, archive: str, working_directory: str)
if hash_digest != output.decode('utf-8').strip():
raise InstallException(f'{package}: The hash of the downloaded package {output.decode("utf-8").strip()} does not match the provided integrity hash {hash_digest} provided in the configuration file')

# Create the lock file, so that other instances of the script will wait for this one to finish
def create_lock(lock_file_path):
while True:
try:
with open(lock_file_path, 'x'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a risk that multiple replicas call this at the same time, no?

print(f"======= Created lock file: {lock_file_path}")
return
except FileExistsError:
wait_for_lock_release(lock_file_path)

# Remove the lock file
def remove_lock(lock_file_path):
os.remove(lock_file_path)
print(f"======= Removed lock file: {lock_file_path}")

# Wait for the lock file to be released
def wait_for_lock_release(lock_file_path):
print("======= Waiting for lock release...")
while True:
if not os.path.exists(lock_file_path):
break
time.sleep(1)
print("======= Lock released.")

def main():

dynamicPluginsRoot = sys.argv[1]

lock_file_path = os.path.join(dynamicPluginsRoot, 'install-dynamic-plugins.lock')
atexit.register(remove_lock, lock_file_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if that also handles a SIGTERM, which is send by k8s to gracefully terminate the pod. To handle that scenario you could add this

import signal
# Register signal handlers
signal.signal(signal.SIGTERM, remove_lock)  # Kubernetes graceful shutdown

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks! Fixed

signal.signal(signal.SIGTERM, lambda signum, frame: sys.exit(0))
create_lock(lock_file_path)

maxEntrySize = int(os.environ.get('MAX_ENTRY_SIZE', 20000000))
skipIntegrityCheck = os.environ.get("SKIP_INTEGRITY_CHECK", "").lower() == "true"

Expand Down
16 changes: 16 additions & 0 deletions docs/dynamic-plugins/installing-plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,3 +119,19 @@ When using RHDH Helm Chart you can just name the Secret using following pattern
When using the Operator ....

//TODO

### Storage of Dynamic Plugins

The directory where dynamic plugins are located is mounted as a volume to the _install-dynamic-plugins_ init container and the _backstage-backend_ container. The _install-dynamic-plugins_ init container is responsible for downloading and extracting the plugins into this directory. Depending on the deployment method, the directory is mounted as an ephemeral or persistent volume. In the latter case, the volume can be shared between several Pods, and the plugins installation script is also responsible for downloading and extracting the plugins only once, avoiding conflicts.

**Important Note:** If _install-dynamic-plugins_ init container was killed with SIGKILL signal (for example in a case of OOM) the script is not able to remove the lock file and the next time the Pod starts, it will be waiting for the lock release. You can see the following message in the logs for all the Pods:

```console
oc logs -n <namespace-name> -f backstage-<backstage-name>-<pod-suffix> -c install-dynamic-plugins
======= Waiting for lock release...
```
In such a case, you can delete the lock file manually from any of the Pods:

```console
oc exec -n <namespace-name> deploy/backstage-<backstage-name> -c install-dynamic-plugins -- rm -f /dynamic-plugins-root/dynamic-plugins.lock
```
Loading