Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running the registry cache with a tmpfs volume #256

Open
diamondburned opened this issue Sep 17, 2024 · 4 comments
Open

Support running the registry cache with a tmpfs volume #256

diamondburned opened this issue Sep 17, 2024 · 4 comments
Labels
kind/enhancement Enhancement, improvement, extension lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@diamondburned
Copy link

diamondburned commented Sep 17, 2024

How to categorize this issue?

/kind enhancement

What would you like to be added:

The extension should permit configuring the registry cache to utilize a tmpfs (RAM-based) volume rather than needing an actual volume via persistent volume claims.

Why is this needed:

My original use case for using the registry cache extension is to allow users to pull images without needing to configure registry credentials every time they use the image. This extension solves this use case by running a local image registry that automatically utilizes the given registry credentials.

Unfortunately, the registry cache also requires a volume to be allocated to it for the actual image registry cache. After all, this is its intended purpose. However, this is not part of my use case, and I would rather avoid needing to allocate a new volume for each user for this reason.

Possible implementation:

A simple (but untested) way to do this would simply be to replace the PersistentVolumeClaim with an actual Volume whose VolumeSource is set to EmptyDir/StorageMediumHugePages.

Show (pseudo) diffs
--- a/pkg/component/registrycaches/registry_caches.go
+++ b/pkg/component/registrycaches/registry_caches.go
@@ @@ func (r *registryCaches) computeResourcesDataForRegistryCache(ctx context.Contex
                     },
                     Volumes: []corev1.Volume{
                         {
                             Name: registryConfigVolumeName,
                             VolumeSource: corev1.VolumeSource{
                                 Secret: &corev1.SecretVolumeSource{
                                     SecretName: configSecret.Name,
                                 },
                             },
                         },
+                        {
+                            Name: registryCacheVolumeName,
+                            VolumeSource: corev1.VolumeSource{
+                                EmptyDir: &corev1.EmptyDirVolumeSource{
+                                    Medium: corev1.StorageMediumHugePages,
+                                },
+                            },
+                        },
                     },
                 },
             },
         },
     }
--- a/pkg/component/registrycaches/registry_caches.go
+++ b/pkg/component/registrycaches/registry_caches.go
@@ @@ func (r *registryCaches) computeResourcesDataForRegistryCache(ctx context.Contex
                     },
                 },
             },
-            VolumeClaimTemplates: []corev1.PersistentVolumeClaim{
-                {
-                    ObjectMeta: metav1.ObjectMeta{
-                        Name:   registryCacheVolumeName,
-                        Labels: getLabels(name, upstreamLabel),
-                    },
-                    Spec: corev1.PersistentVolumeClaimSpec{
-                        AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce},
-                        Resources: corev1.VolumeResourceRequirements{
-                            Requests: corev1.ResourceList{
-                                corev1.ResourceStorage: *cache.Volume.Size,
-                            },
-                        },
-
-                        StorageClassName: storageClassName,
-                    },
-                },
-            },
         },
     }

A Type flag could be introduced to gardener-extension-registry-cache/pkg/apis/registry/v1alpha3.Volume to allow for this:

--- a/pkg/apis/registry/v1alpha3/types.go
+++ b/pkg/apis/registry/v1alpha3/types.go
@@ @@ type RegistryCache struct {

 // Volume contains settings for the registry cache volume.
 type Volume struct {
-       // Size is the size of the registry cache volume.
+       // Type is the type of the registry cache volume.
+       // It can be either `persistentVolumeClaim` or `emptyDir`.
+       // Defaults to `persistentVolumeClaim`.
+       // This field is immutable.
+       // +optional
+       Type string `json:"type,omitempty"`
+       // Size is the size of the registry cache volume. Only applicable for `persistentVolumeClaim` type.
        // Defaults to 10Gi.
        // This field is immutable.
        // +optional
        Size *resource.Quantity `json:"size,omitempty"`
        // StorageClassName is the name of the StorageClass used by the registry cache volume.
+       // Only applicable for `persistentVolumeClaim` type.
        // This field is immutable.
        // +optional
        StorageClassName *string `json:"storageClassName,omitempty"`
@gardener-prow gardener-prow bot added the kind/enhancement Enhancement, improvement, extension label Sep 17, 2024
@ialidzhikov
Copy link
Member

Hi @diamondburned,

My original use case for using the registry cache extension is to allow users to pull images without needing to configure registry credentials every time they use the image. This extension solves this use case by running a local image registry that automatically utilizes the given registry credentials.

While this works in the usual case when the registry-cache is up and running, but this won't work in the case the registry-cache is not available and containerd has to fall back to the upstream.
The way it works today:

  1. containerd uses the image pull Secrets and forwards the authentication info to the registry-cache.
  2. The registry-cache does not respected the provided authentication info and does not use it when it talks to the upstream. Instead it uses the single set of credentials configured for it.

I don't recommend you to NOT specify image pull Secrets. If the image pull Secret is not present it won't be able to pull the image from the upstream when the registry-cache is not available.
I think there is a room for improvement in our docs to warn about this detail.

Unfortunately, the registry cache also requires a volume to be allocated to it for the actual image registry cache. After all, this is its intended purpose. However, this is not part of my use case, and I would rather avoid needing to allocate a new volume for each user for this reason.

I think we understand your use case. You are actually NOT interested in reduce networking traffic and costs to upstream registries but you are only interested in eliminating image pull Secrets.
First, as shared above, it is not recommended the image pull Secret to be eliminated so that the containerd's fallback to the upstream continues to work.
Second, I don't think a RAM-based approach will work. The registry-cache currently is being scaled by VPA. If images are stored in the memory, then VPA will frequently restart the Pod to updates its resource requests. Every restart leads the whole cache to be lost, I guess? If that's the case, I don't think this will work well. emptyDir will work better but still the emptyDir volume will be lost if the registry-cache Pod gets scheduled on another Node. What is the benefit of using emptyDir instead of a PVC?

@diamondburned
Copy link
Author

Thank you so much for the detailed and helpful response!

I think we understand your use case. You are actually NOT interested in reduce networking traffic and costs to upstream registries but you are only interested in eliminating image pull Secrets.

Yes, this is correct. We see the caching as a nice bonus that people can choose to enable, but we want to make it optional. If the user chooses to enable it, then we're fine with allocating an actual volume.

First, as shared above, it is not recommended the image pull Secret to be eliminated so that the containerd's fallback to the upstream continues to work.

I think this is a reasonable point. I know that registry-mirror actually is just a configuration for containerd; would it be possible to inject secrets this way?

emptyDir will work better but still the emptyDir volume will be lost if the registry-cache Pod gets scheduled on another Node. What is the benefit of using emptyDir instead of a PVC?

Using emptyDir instead of a PVC is just my workaround to (somewhat) disable the caching feature without needing to further touch Distribution: instead of needing a real volume with storage, just allocate a small emptyDir so that the caching is essentially useless.

@ialidzhikov
Copy link
Member

ialidzhikov commented Sep 19, 2024

I think this is a reasonable point. I know that registry-mirror actually is just a configuration for containerd; would it be possible to inject secrets this way?

The underlying API/contract of containerd we use is Registry Configuration. It is possible to specify headers and certificates. But I guess this won't help you with image pull Secrets.
Can you elaborate what is the issue with the image pull Secrets and why you want to eliminate them? Maybe we can find a better solution.
For example, there is kubelet credential provider mechanism, see Configure a kubelet image credential provider. This is supported for example for AWS in case you are using ECR as image registry. You don't need to provide any credentials, the ecr-credential-provider fetches them for you. You should be able to write a custom kubelet credential provider plugin if this can help you.

@gardener-ci-robot
Copy link

The Gardener project currently lacks enough active contributors to adequately respond to all issues.
This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Mark this issue as rotten with /lifecycle rotten
  • Close this issue with /close

/lifecycle stale

@gardener-prow gardener-prow bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Enhancement, improvement, extension lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants