Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the type field to the Manifest model #1794

Merged
merged 1 commit into from
Oct 30, 2024

Conversation

git-hyagi
Copy link
Contributor

closes: #1751

Copy link
Member

@lubosmj lubosmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When should we deprecate is_flatpak and is_bootable in favour of type/nature? Is it fine to deprecate them now?

@@ -0,0 +1 @@
Added the `type` field to help differentiate Manifests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Added the `type` field to help differentiate Manifests.
Introduced the `type` field on the manifests endpoint to enable easier differentiation of image types.

pulp_container/app/models.py Show resolved Hide resolved
Comment on lines 225 to 227
if not json_manifest.get("config", None):
return False
return json_manifest.get("config").get("mediaType") == MEDIA_TYPE.HELM
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see a pattern here? It is repeating in is_helm_image and is_cosign. We might want to look for a decorator.

def validate_json_field(field):
    def decorator(func):
        @wraps(func)
        def wrapper(self, *args, **kwargs):
            json_manifest = json.loads(self.data)
            if not json_manifest.get(field, None):
                return False
            return func(self, json_manifest, *args, **kwargs)
        return wrapper
    return decorator
    @validate_json_field('layers')  # Check for 'layers' field for cosign images
    def is_cosign(self, json_manifest):
        return any(
            layer.get("mediaType", None) == MEDIA_TYPE.COSIGN for layer in json_manifest["layers"]
        )
    @validate_json_field('config')  # Check for 'config' field for helm images
    def is_helm_image(self, json_manifest):
        return json_manifest.get("config").get("mediaType") == MEDIA_TYPE.HELM

pulp_container/app/models.py Outdated Show resolved Hide resolved
@@ -154,6 +156,11 @@ def init_image_nature(self):
return self.init_manifest_nature()

def init_manifest_list_nature(self):
updated_type = False
if not self.type:
self.type = self.manifest_list_type()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should save MANIFEST_TYPE.MANIFEST_LIST or MANIFEST_TYPE.OCI_INDEX into the type field. How do we deal with this situation when it comes to is_flatpak and is_bootable? Do we backpropagate this value from a manifest to a manifest list?

@git-hyagi
Copy link
Contributor Author

When should we deprecate is_flatpak and is_bootable in favour of type/nature? Is it fine to deprecate them now?

Well remembered! I talked to the Katello's team, and they are fine with deprecating these fields in favor of the new type/nature.

@git-hyagi git-hyagi changed the title Add the type field to the Manifest model Add the nature field to the Manifest model Oct 22, 2024
Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of suggestions here...

@@ -72,6 +73,7 @@ class Manifest(Content):
digest (models.TextField): The manifest digest.
schema_version (models.IntegerField): The manifest schema version.
media_type (models.TextField): The manifest media type.
nature (models.TextField): The manifest's type (flatpak, bootable, signature, etc.).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this related to the OCI specification?

Is it a pure invention on our side? Is the value easily derived from the "media_type"? Should it maybe be a choice field with MANIFEST_TYPE as the choices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this related to the OCI specification?
Is it a pure invention on our side?

This field is not part of OCI spec. The idea came not only from us, but also as a request from Katello's team: #1751 (comment)

Is the value easily derived from the "media_type"? Should it maybe be a choice field with MANIFEST_TYPE as the choices?

Unfortunately not for all types. To identify flatpak we need to check the manifest labels, whereas bootable images can be recognized through labels or annotations, helm by checking the media_type from configblob, cosigned images through the media_type from layers, and, so far, only "common" images are identified through the manifest media_type. This is why we have the different methods to identify each type.

Comment on lines 77 to 90
MANIFEST_TYPE = SimpleNamespace(
IMAGE="image",
BOOTABLE="bootable",
FLATPAK="flatpak",
HELM="helm",
SIGNATURE="signature",
UNKNOWN="unknown",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These types are mutually exclusive, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

@@ -648,4 +649,4 @@ def _post_save(self, batch):
# it is possible to initialize the nature of the corresponding manifest lists
for ml in manifest_lists:
if ml.init_manifest_list_nature():
ml.save(update_fields=["is_bootable", "is_flatpak"])
ml.save(update_fields=["is_bootable", "is_flatpak", "nature"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_post_save runs in the same transaction as the things are created, right?

I still wonder, can we not know all this before we even save the manifests the first time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the current PR to address this concern, but I didn't modify the current implementation ml.save(update_fields=["is_bootable", "is_flatpak"]) to avoid introducing bugs (and this block will be removed "soon" since these fields are now deprecated).

manifest.save(update_fields=["is_bootable", "is_flatpak"])
manifest.save(update_fields=["is_bootable", "is_flatpak", "nature"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not your fault, apparently, but this appears to be unsafe. (It mutates content, which is both wrong in general and prone to creating deadlocks.)
However, i believe we need a transaction here anyway, because we need to save the manifest list before we can attach the listed manifests, but having an "empty" manifest list committed into the database is an invalid state.
I think the transaction needs to start before the manifest_list safe call.

We may think about this in a different PR though.

Comment on lines 219 to 240
def validate_json_field(field):
def decorator(func):
@wraps(func)
def wrapper(self, *args, **kwargs):
json_manifest = json.loads(self.data)
if not json_manifest.get(field, None):
return False
return func(self, json_manifest, *args, **kwargs)

return wrapper

return decorator

@validate_json_field("layers") # Check for 'layers' field for cosign images
def is_cosign(self, json_manifest):
return any(
layer.get("mediaType", None) == MEDIA_TYPE.COSIGN for layer in json_manifest["layers"]
)

@validate_json_field("config") # Check for 'config' field for helm images
def is_helm_image(self, json_manifest):
return json_manifest.get("config").get("mediaType") == MEDIA_TYPE.HELM
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So manifest.data needs to be a string field, because different formatting would bust the manifest checksums, right? But we can safely assume it contains valid json.
Let me suggest a different (maybe more pythonic) approach here:
We can add a json_manifest property.

@property
def json_manifest(self):
    return json.loads(self.data)

Then we can do:

def is_helm_image(self):
    # should this actually be `is_helm_chart`?
    try:
        return self.json_manifest["config"]["mediaType"] == MEDIA_TYPE.HELM
    except KeyError:
        return False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just

def is_helm_image(self):
    # should this actually be `is_helm_chart`?
    try:
        return json.loads(self.data)["config"]["mediaType"] == MEDIA_TYPE.HELM
    except KeyError:
        return False

in the first place.

Comment on lines 203 to 204
# DEPRECATED: is_bootable is deprecated and will be removed in a future release.
self.is_bootable = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a weird side effect of a function that claims to only calculate whether a condition is met.

Comment on lines 182 to 196
for manifest_type, check_type_function in self.known_types().items():
if check_type_function():
self.nature = manifest_type
return True

return False

def known_types(self):
return {
MANIFEST_TYPE.BOOTABLE: self.is_bootable_image,
MANIFEST_TYPE.FLATPAK: self.is_flatpak_image,
MANIFEST_TYPE.HELM: self.is_helm_image,
MANIFEST_TYPE.SIGNATURE: self.is_cosign,
MANIFEST_TYPE.IMAGE: self.is_manifest_image,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the if elif structure, even for readability.

if not self.nature:
self.nature = MANIFEST_TYPE.UNKNOWN
updated_nature = True

for manifest in self.listed_manifests.all():
# it suffices just to have a single manifest of a specific nature;
# there is no case where the manifest is both bootable and flatpak-based
if manifest.is_bootable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, you are relying on the deprecated fields here.
If you are doing this for zero downtime upgrade reasons, please add a big comment explaining the line of thought. If not, I think it needs to be rewritten.

pulp_container/constants.py Show resolved Hide resolved
@@ -0,0 +1,2 @@
Introduced the `nature` field on the Manifests endpoint to enable easier differentiation of image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest going with simple 'type'

return False

def is_manifest_image(self):
return self.media_type in (MEDIA_TYPE.MANIFEST_OCI, MEDIA_TYPE.MANIFEST_V2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about schema1? we stopped conversion on the fly but still support its mirroring from remote registries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ops... my bad, I thought schema1 was deprecated on our side as well.

@@ -19,6 +19,8 @@
FOREIGN_BLOB_OCI_TAR_GZIP="application/vnd.oci.image.layer.nondistributable.v1.tar+gzip",
FOREIGN_BLOB_OCI_TAR_ZSTD="application/vnd.oci.image.layer.nondistributable.v1.tar+zstd",
OCI_EMPTY_JSON="application/vnd.oci.empty.v1+json",
HELM="application/vnd.cncf.helm.config.v1+json",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather name it config_blob_helm

@@ -19,6 +19,8 @@
FOREIGN_BLOB_OCI_TAR_GZIP="application/vnd.oci.image.layer.nondistributable.v1.tar+gzip",
FOREIGN_BLOB_OCI_TAR_ZSTD="application/vnd.oci.image.layer.nondistributable.v1.tar+zstd",
OCI_EMPTY_JSON="application/vnd.oci.empty.v1+json",
HELM="application/vnd.cncf.helm.config.v1+json",
COSIGN="application/vnd.dev.cosign.simplesigning.v1+json",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cosign_blob to be more precise?

Copy link
Member

@ipanova ipanova Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about attestations and sboms?
these are also cosign produced artifacts how about you split these into cosign_sig_blob, cosign_att_blob and cosign_sbom_blob?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see more examples here https://quay.io/repository/sallyom/hello-go?tab=tags
for sboms one can have spdx type and cyclonedx but you should rather check what's the situation as of today, my info might alreayd be outdated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing the missing types! I also found this doc which was helpful to define the other cosign "derived types": https://github.com/sigstore/cosign/blob/main/specs/SBOM_SPEC.md

BOOTABLE="bootable",
FLATPAK="flatpak",
HELM="helm",
SIGNATURE="signature",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suggest to be more specific - cosign_signature

@lubosmj lubosmj changed the title Add the nature field to the Manifest model Add the type field to the Manifest model Oct 28, 2024
Copy link
Member

@lubosmj lubosmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looking great! 🦆 I only left out a few clarifying questions.

New quest: Would you mind testing the migration command more deeply? For example, (1) sync content with pulp-contianer 2.19, (2) upgrade to 2.21 (without dropping the DB), (3) run the django-management command, (4) checkout back to this branch and upgrade, (5) run the django-management command again, (6) and manually verify the values of types and labels.

Comment on lines 172 to 181
for manifest in self.listed_manifests.all():
# it suffices just to have a single manifest of a specific nature;
# there is no case where the manifest is both bootable and flatpak-based
if manifest.is_bootable:
if manifest.type == MANIFEST_TYPE.BOOTABLE:
self.type = MANIFEST_TYPE.BOOTABLE
self.is_bootable = True
return True
elif manifest.is_flatpak:
elif manifest.type == MANIFEST_TYPE.FLATPAK:
self.type = MANIFEST_TYPE.FLATPAK
self.is_flatpak = True
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a manifest list be of type "helm"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manifest list should stay as list. Nowadays I've seen lists containing artifacts and images, so it's a mixture of things..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, good to know! I initially thought that if any of the listed manifests is of type "bla", then, we can treat the entire manifest list as being of type "bla". My goal was to reduce the number of API calls for our integrators, so they would not need to check each listed manifest individually to determine the type. By treating the manifest list as the "list" type, they would need to iterate through all listed manifests to get the right type and potentially present it in the UI (because users should see only the tagged manifests/manifest lists in the UI).

It might even make sense to allow filtering manifests by their referencing manifest list. Meaning that only one additional API call will be needed at most.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, when it comes to different types of listed manifests, we might even consider prioritizing one over the other. The type that has the highest priority will be displayed in the view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we decide to set manifestlist as type: list, Pulp integrators would need to check and identify the corresponding manifests types by themselves. But if we set the type based on a single manifest from the list (for example, flatpak) there is a risk of providing the wrong type since we can now have a mixture of things. 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless, we do not enforce these types on the DB level...

Copy link
Member

@ipanova ipanova Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially manifest list was introduced for the multi-arch builds of the same image, nowadays it can also be treated as a list of images collected together. There is no enforcement in the specs around this.
Here's a practical example skopeo inspect --raw docker://quay.io/nalind/testing:podman-machine-unified|jq

I would not do any heuristics and just leave type as list. Integrators can take their own risks and parse and prioritize the types they consider more important on their own.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ianballou FYI the rationale behind ^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see different architectures being provided in the example. But, I got your point. Thanks for the explanation.

    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:eb4980b6fa14d9356c94a605b3b11cffe35b8dc9f745924c440c869d29d8e933",
      "size": 11817,
      "platform": {
        "architecture": "arm64",
        "os": "linux"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:93a5ef6e62b75db1fd6c95a4e164b02669d569b23aab20d1dccbb0c381111e35",
      "size": 507,
      "annotations": {
        "disktype": "qemu"
      },
      "platform": {
        "architecture": "x86_64",
        "os": "linux"
      }
    },

Copy link
Member

@ipanova ipanova Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lubosmj to understand what i wanted to explain you should have inspected each child manifest and you would have seen that 2 of them are artifacts and 2 are different arches for a regular image.

Comment on lines 237 to 246
def get_cosign_type(self, media_type):
if media_type in MEDIA_TYPE.COSIGN_SBOM:
return MANIFEST_TYPE.COSIGN_SBOM

cosign_types_mapping = {
MEDIA_TYPE.COSIGN_BLOB: MANIFEST_TYPE.COSIGN_SIGNATURE,
MEDIA_TYPE.COSIGN_ATTESTATION: MANIFEST_TYPE.COSIGN_ATTESTATION,
MEDIA_TYPE.COSIGN_ATTESTATION_BUNDLE: MANIFEST_TYPE.COSIGN_ATTESTATION_BUNDLE,
}
return cosign_types_mapping.get(media_type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should improve the readability aspect of this method. Is such a level of granularity even needed? If yes, let's move this mapping to constants.py.

Comment on lines 1219 to 1221
if manifest.type in [MANIFEST_TYPE.BOOTABLE, MANIFEST_TYPE.FLATPAK]:
manifest_list.type = manifest.type
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, can the manifest list be of type "helm"?

Comment on lines 291 to 298
manifest_list_type = MANIFEST_TYPE.UNKNOWN
for listed_manifest in manifest_list_dc.extra_data["listed_manifests"]:
# Just await here. They will be associated in the post_save hook.
await listed_manifest["manifest_dc"].resolution()

# if manifestlist type is not defined as "unknown" we already checked its nature,
# in this case, there is no need to verify the other manifests from list
if manifest_list_type != MANIFEST_TYPE.UNKNOWN and listed_manifest[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if condition is always evaluated to False, right? What is the rationale behind this procedure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if condition is always evaluated to False, right?

Right, but for the first iteration execution only. For the following iterations, the manifest_list_type may have been changed to flatpak or bootable.

What is the rationale behind this procedure?

This is to avoid having to define the manifest_list_type in every iteration of the loop. If we already identified it as flatpak or bootable, we can skip checking the listed_manifest["manifest_dc"].content.type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you are right! My bad! 🤦‍♂️
The condition should be if manifest_list_type == MANIFEST_TYPE.UNKNOWN and ...

Comment on lines +85 to +86
COSIGN_ATTESTATION="cosign_attestation",
COSIGN_ATTESTATION_BUNDLE="cosign_attestation_bundle",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure we should differentiate between these. What would be a typical use case for seeing a difference between the bundle and regular attestation? Please, elucidate further.

@@ -71,3 +75,34 @@
SIGNATURE_PAYLOAD_MAX_SIZE = 4 * MEGABYTE

SIGNATURE_API_EXTENSION_VERSION = 2

MANIFEST_TYPE = SimpleNamespace(
IMAGE="image",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a residue, but I see you asserting this value in the tests. Did not we agree on "unkown" for images of unknown types? What is the purpose of having the "image" type if all images are images by their nature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regular image that adheres to the specs is friendly to the container runtimes, meaning that it can be instantiated by podman run, flatpak image, for example cannot be instantiated(podman run will fail), hence you have a special cli to manage flatpak images and not podman.
So ideally if the image fully adheres to the specs, meaning that manifest is of the v2 type, config and layers are regular ones then it's a regular application image. For the rest, if there are deviations like in the config type or layer type that is not part or the constants we have then it's an 'unkown' type of image (for us).
On the thing that all images are images, you are right. The question whether we want to distinguish them between those 2 cases i've described.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking along the same lines as Ina. I thought the type image could be useful to differentiate "common" container images from other oci artifacts (like flatpak, or helm charts, etc).

Copy link
Member

@lubosmj lubosmj Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the image fully adheres to the specs, its media type is already shown in the Manifests endpoint as either "docker.manifest.v2" or "oci.manifest", correct? Clients do not read our API endpoints. So, this type thingy is irrelevant to them. However, the integrators can still benefit from the information we provide. For example, they can determine that if media_type is one of the above and the type is set to "unknown", it can be treated as a regular image. How does that sound to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not insisting on getting rid of the "image" type. I am just trying to provide the rationale behind not exposing it at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what would be best approach. Concept of a manifest is a more low level thingy, and the fact that it splits into an image/artifact/list. End users usually are familiar with container image concept. So if we keep list, types we parse like (flatpak, bootable) then maybe it makes sense to leave image for the rest instead of unknown

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speaking of.. i did not really see this PR handling the artifacttype, was it intentional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's leave the image and list in place. We can still remove it later as needed. @git-hyagi, please, consider adding the artifact type to the list.

Copy link
Member

@lubosmj lubosmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give it a shot! 🔫

@lubosmj lubosmj merged commit 23bf3a3 into pulp:main Oct 30, 2024
12 checks passed
@git-hyagi git-hyagi deleted the manifest-type-field branch October 30, 2024 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

As a user I want to differentiate cosign signatures when looking at the image manifests
4 participants