Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add manifest arch, os, and compressed layers size fields #1782

Merged
merged 1 commit into from
Nov 1, 2024

Conversation

git-hyagi
Copy link
Contributor

closes: #1767

Copy link
Member

@lubosmj lubosmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of remarks:

  1. If we need to store the "architecture" and "os" fields on the Manifest model, we should try to fetch the values from a manifest list first. An OCI Index has the "architecture" and "os" fields required. There is no need to read the data from a config blob if a manifest is listed within an index (https://github.com/opencontainers/image-spec/blob/main/image-index.md#image-index-property-descriptions).
  2. Currently, there are three separate migrations. I would like to squash them. Every migration takes time to run some boilerplate which is transparent to us.
  3. Currently, there are three commits referencing the same issue. I would like to squash them. Such a separation is not necessary.
  4. If we decided to create another django-admin command, we should align with Katello to see if it makes sense for them to run a management command reading data from the storage.

Comment on lines 1 to 5
The Manifest model has been enhanced with a new:
* `architecture` field, which specifies the CPU architecture for which the binaries in the
image are designed to run.
* `os` field, which specifies the operating system which the image is built to run on.
* `compressed_layers_size` field, which specifies the sum of the sizes of all compressed layers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite wordy. We should stick to shorter changelog messages (see https://pulpproject.org/pulpcore/changes/ for reference). It is not necessary to describe every manifest field in detail.

@@ -103,6 +108,9 @@ class Manifest(Content):

annotations = models.JSONField(default=dict)
labels = models.JSONField(default=dict)
architecture = models.TextField(null=True)
os = models.TextField(null=True)
compressed_layers_size = models.TextField(null=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to use IntegerField here?

@ianballou
Copy link
Contributor

4. If we decided to create another django-admin command, we should align with Katello to see if it makes sense for them to run a management command reading data from the storage.

From the Katello side, we're okay with the command being the same as before or a new one. Choose whatever is most efficient from the Pulp side.

@ianballou
Copy link
Contributor

ianballou commented Sep 30, 2024

Also I wanted to clarify to be super sure -- OCI Image Index manifests will always have a null arch, OS, and size? I'm not super clear since, according to https://specs.opencontainers.org/image-spec/image-index/?v=v1.0.1, the optional platform property can have an arch and an OS,

@git-hyagi
Copy link
Contributor Author

Also I wanted to clarify to be super sure -- OCI Image Index manifests will always have a null arch, OS, and size? I'm not super clear since, according to https://specs.opencontainers.org/image-spec/image-index/?v=v1.0.1, the optional platform property can have an arch and an OS,

Thank you for bringing this!
We reviewed and discussed the os and architecture fields from manifest list, and we will work on it in this PR.
If the manifest list (or oci index) contains the platform (an optional) field, we will populate pulp manifest with arch and os (whenever platform is defined, os and arch are required).

@git-hyagi
Copy link
Contributor Author

git-hyagi commented Oct 1, 2024

After re-reading the specs, I realized that manifestlist or oci-index do not have a platform field. Actually, platform is a field from manifests. In this link, we can see it better: https://github.com/opencontainers/image-spec/blob/main/image-index.md (platform is a bullet inside manifests).

So, please, ignore my last comment and yes, "OCI Image Index manifests will always have a null arch, OS, and size".

@ianballou
Copy link
Contributor

After re-reading the specs, I realized that manifestlist or oci-index do not have a platform field. Actually, platform is a field from manifests. In this link, we can see it better: https://github.com/opencontainers/image-spec/blob/main/image-index.md (platform is a bullet inside manifests).

So, please, ignore my last comment and yes, "OCI Image Index manifests will always have a null arch, OS, and size".

Nice catch! I didn't notice it was under manifests, so that's perfect.

@git-hyagi
Copy link
Contributor Author

If we need to store the "architecture" and "os" fields on the Manifest model, we should try to fetch the values from a manifest list first. An OCI Index has the "architecture" and "os" fields required. There is no need to read the data from a config blob if a manifest is listed within an index (https://github.com/opencontainers/image-spec/blob/main/image-index.md#image-index-property-descriptions).

I did some investigation in each workflow, and here are my findings:

  • for the sync and pull-through tasks, the first declarative content that we pass to the next pipeline stage is/are blobs/configblobs

    async def handle_blobs(self, manifest_dc, content_data):
    """
    Handle blobs.
    """
    manifest_dc.extra_data["blob_dcs"] = []
    for layer in content_data.get("layers") or content_data.get("fsLayers"):
    if not self._include_layer(layer):
    continue
    blob_dc = self.create_blob(layer)
    manifest_dc.extra_data["blob_dcs"].append(blob_dc)
    await self.put(blob_dc)
    layer = content_data.get("config", None)
    if layer:
    blob_dc = self.create_blob(layer, deferred_download=False)
    manifest_dc.extra_data["config_blob_dc"] = blob_dc
    await self.put(blob_dc)

    • after that, in the resolve_flush method, we then send the manifests and manifest lists in this order:
      async def resolve_flush(self):
      """Resolve pending contents dependencies and put in the pipeline."""
      # Order matters! Things depended on must be resolved first.
      for manifest_dc in self.manifest_dcs:
      config_blob_dc = manifest_dc.extra_data.get("config_blob_dc")
      if config_blob_dc:
      manifest_dc.content.config_blob = await config_blob_dc.resolution()
      await sync_to_async(manifest_dc.content.init_labels)()
      manifest_dc.content.init_image_nature()
      for blob_dc in manifest_dc.extra_data["blob_dcs"]:
      # Just await here. They will be associated in the post_save hook.
      await blob_dc.resolution()
      await self.put(manifest_dc)
      self.manifest_dcs.clear()
      for manifest_list_dc in self.manifest_list_dcs:
      for listed_manifest in manifest_list_dc.extra_data["listed_manifests"]:
      # Just await here. They will be associated in the post_save hook.
      await listed_manifest["manifest_dc"].resolution()
      await self.put(manifest_list_dc)
      self.manifest_list_dcs.clear()
      for tag_dc in self.tag_dcs:
      tagged_manifest_dc = tag_dc.extra_data["tagged_manifest_dc"]
      tag_dc.content.tagged_manifest = await tagged_manifest_dc.resolution()
      await self.put(tag_dc)
      self.tag_dcs.clear()
      for signature_dc in self.signature_dcs:
      signed_manifest_dc = signature_dc.extra_data["signed_manifest_dc"]
      signature_dc.content.signed_manifest = await signed_manifest_dc.resolution()
      await self.put(signature_dc)
      self.signature_dcs.clear()
  • for the push workflow, we first push the blobs and the corresponding manifest for that image, and, just after pushing all images, the manifest-list/oci-index is pushed

I couldn't find how to fetch these values from the manifest list in advance. I'm not sure if I overlooked something or misunderstood the code workflow.

@lubosmj
Copy link
Member

lubosmj commented Oct 3, 2024

@git-hyagi git-hyagi force-pushed the add-manifest-size-arch-fields branch from 27cfe78 to c7816f8 Compare October 3, 2024 18:49
@git-hyagi
Copy link
Contributor Author

Thank you for the help and the suggestions/optimizations!

@git-hyagi git-hyagi marked this pull request as draft October 3, 2024 19:08
@git-hyagi git-hyagi marked this pull request as ready for review October 3, 2024 19:37
@lubosmj lubosmj marked this pull request as draft October 11, 2024 13:14
@git-hyagi git-hyagi force-pushed the add-manifest-size-arch-fields branch from c7816f8 to 8b53a1e Compare October 22, 2024 18:54
@git-hyagi git-hyagi force-pushed the add-manifest-size-arch-fields branch 2 times, most recently from ef816f0 to c152cf4 Compare October 22, 2024 20:05
@ianballou
Copy link
Contributor

cc @sjha4 @qcjames53 we should keep an eye on this and integrate with it in Katello sooner rather than later to avoid excess reindexing of container manifests.

@git-hyagi git-hyagi force-pushed the add-manifest-size-arch-fields branch from c152cf4 to 8f390fa Compare October 23, 2024 14:45
@git-hyagi git-hyagi marked this pull request as ready for review October 23, 2024 15:32
@@ -103,6 +108,9 @@ class Manifest(Content):

annotations = models.JSONField(default=dict)
labels = models.JSONField(default=dict)
architecture = models.TextField(null=True)
os = models.TextField(null=True)
compressed_layers_size = models.IntegerField(null=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about compressed_image_size

@lubosmj
Copy link
Member

lubosmj commented Oct 30, 2024

@git-hyagi, please, rebase this PR against the main branch and prepare it for the final review.

@git-hyagi git-hyagi force-pushed the add-manifest-size-arch-fields branch from 8f390fa to 91147ca Compare October 30, 2024 11:15
@git-hyagi git-hyagi marked this pull request as draft October 30, 2024 11:15
@git-hyagi git-hyagi force-pushed the add-manifest-size-arch-fields branch 2 times, most recently from 88bda0e to ec2d14e Compare October 30, 2024 15:29
@git-hyagi git-hyagi marked this pull request as ready for review October 30, 2024 15:52
Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the migration is zero downtime safe. And also i don't see any code failing before the fields get populated (The results may be incomplete, but you are supposed to run the command anyway.).
None of the comments should be blocking here.

pulp_container/app/registry_api.py Show resolved Hide resolved
Comment on lines 647 to 653
if platform := listed_manifest.get("platform"):
manifest_list_manifest.architecture = platform["architecture"]
manifest_list_manifest.os = platform["os"]
manifest_list_manifest.features = platform.get("features")
manifest_list_manifest.variant = platform.get("variant")
manifest_list_manifest.os_version = platform.get("os.version")
manifest_list_manifest.os_features = platform.get("os.features")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is at least the third time i'm seeing this pattern now.

Maybe a dataclass together with asdict can help here? (Maybe not in this PR.)
https://docs.python.org/3/library/dataclasses.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this and agreed on opening a task issue to investigate this later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why this cannot be addressed in this PR. Also, notice how in some cases you are initializing os_version as p.get("os.version") and in some cases as p.get("os.version", ""). Not to mention that you even do (p.get("os.version", ""),). I am not a fan of such discrepancies. Please, resolve this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use the @dataclass decorator, but I had trouble to make it work (probably because of the image_manifest and manifest_list foreignkey fields).

mdellweg
mdellweg previously approved these changes Oct 31, 2024
pulp_container/app/registry_api.py Outdated Show resolved Hide resolved
pulp_container/tests/functional/conftest.py Show resolved Hide resolved
Comment on lines 647 to 653
if platform := listed_manifest.get("platform"):
manifest_list_manifest.architecture = platform["architecture"]
manifest_list_manifest.os = platform["os"]
manifest_list_manifest.features = platform.get("features")
manifest_list_manifest.variant = platform.get("variant")
manifest_list_manifest.os_version = platform.get("os.version")
manifest_list_manifest.os_features = platform.get("os.features")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why this cannot be addressed in this PR. Also, notice how in some cases you are initializing os_version as p.get("os.version") and in some cases as p.get("os.version", ""). Not to mention that you even do (p.get("os.version", ""),). I am not a fan of such discrepancies. Please, resolve this.

@git-hyagi git-hyagi force-pushed the add-manifest-size-arch-fields branch from c182212 to aad0286 Compare November 1, 2024 20:09
@lubosmj lubosmj merged commit b48cd28 into pulp:main Nov 1, 2024
12 checks passed
@git-hyagi git-hyagi deleted the add-manifest-size-arch-fields branch November 1, 2024 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retrieve manifest image size and architecture from API
5 participants