-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for kernel builds that are not a single binary #58
Comments
Thank you for a nicely-worded and detailed issue! This is what I think we could do:
So, a subset of output files for this build could look something like this: {
"output_files": [
{
"name": "cki-kernel.repo",
"url": "https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/703863786/publish%20x86_64%20debug/3372589884/artifacts/cki-kernel.repo"
},
{
"name": "cki-kernel.dnf",
"url": "https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts/703863786/publish%20x86_64%20debug/3372589884/artifacts/repo/5.14.0-201.mr1687_221124_0516.el9.x86_64/",
"resources": [
{
"name": "repodata",
"url": "https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts/703863786/publish%20x86_64%20debug/3372589884/artifacts/repo/5.14.0-201.mr1687_221124_0516.el9.x86_64/repodata/",
"resources": [
{
"name": "repomd.xml",
"url": "https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/703863786/publish%20x86_64%20debug/3372589884/artifacts/repo/5.14.0-201.mr1687_221124_0516.el9.x86_64/repodata/repomd.xml"
}
]
},
{
"name": "kernel-5.14.0-201.mr1687_221124_0516.el9.src.rpm",
"url": "https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/703863786/publish%20x86_64%20debug/3372589884/artifacts/repo/5.14.0-201.mr1687_221124_0516.el9.x86_64/kernel-5.14.0-201.mr1687_221124_0516.el9.src.rpm"
},
{
"name": "kernel-headers-5.14.0-201.mr1687_221124_0516.el9.x86_64.rpm",
"url": "https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/703863786/publish%20x86_64%20debug/3372589884/artifacts/repo/5.14.0-201.mr1687_221124_0516.el9.x86_64/kernel-headers-5.14.0-201.mr1687_221124_0516.el9.x86_64.rpm"
}
]
}
]
} |
This is very much increasing the bar of entry for submitters 🙈 having to loop over the nested directories and list everything in them, when the simplest step would be to only link that top level and consider everything under it as it is. From a technical standpoint, it would not work either, as not all dnf repos are browsable and have an index page. It's not a requirement yum/dnf needs, and CKI does not have them either. We do provide a different location that is browsable (for human use), but that one is not usable by yum/dnf instead. |
Most submitters don't have to report nested directories. CKI is the only one so far. There's no additional complexity, if you don't. And, again, linking just to the human-targeted "index" page wouldn't work for archival. However, using this schema, if you don't really want to have the yum/dnf repo archived, you can just report the index page "resource" with an empty
We can make the |
Also, AFAIK, yum/dnf don't read that "index" page, but they just assume existence of various entry files under the given "repo" URL. |
I meant additional complexity for new submitters who'd like to start using this feature. IF the idea is to also provide the builds to be usable for reproducing results (e.g. a kernel maintainer trying to reproduce a failure), then we need a way to provide the repo (not the index page) through KCIDB. Now, I already mentioned that I don't see the necessary point of KCIDB triaging the build binaries and such and I still think that, so in this particular case we may be able to get away with providing a repo file and hosting the repos ourselves for a certain period of time, and that would avoid some trouble with the unfriendliness of having to list the full content of the repos. The optional URL fields for resources would definitely be a useful trick for our side 👍 That way we could still submit the repos as they are and people could use them, but they wouldn't pollute the triaging etc. from your side. Am I missing something else we'd need to solve here? |
Well, it's not reaally that hard. Took me maybe half an hour to write this command-line tool which takes a URL base and a directory and generates the JSON: #!/usr/bin/python3
import os
import sys
import json
from urllib.parse import urljoin
def urljoin_dirs(url_base, root_dirpath, dirpath, name):
return urljoin(
url_base,
os.path.join(os.path.relpath(dirpath, root_dirpath), name)
)
def dir2json(url_base, root_dirpath):
resources_map = {}
root = []
for dirpath, dirnames, filenames in os.walk(root_dirpath):
resources = root if dirpath == root_dirpath else resources_map[dirpath]
for dirname in dirnames:
dir_resources = []
resources_map[os.path.join(dirpath, dirname)] = dir_resources
resources.append(dict(
name=dirname,
url=urljoin_dirs(url_base, root_dirpath, dirpath, dirname),
resources=dir_resources
))
resources += [
dict(
name=filename,
url=urljoin_dirs(url_base, root_dirpath, dirpath, filename)
)
for filename in filenames
]
return root
json.dump(
dir2json(sys.argv[1], os.path.normpath(sys.argv[2])),
sys.stdout,
indent=4
) We could supply people with something like this as a sample and solve that problem.
Yes, that's the idea. However, there's no way to discover and download directory hierarchies over HTTP, as you probably know, only separate files. We could've done it over FTP, but people don't use thaaat these days. So we have to either express the complete hierarchy in JSON, or pack our dnf repos into archives, which defeats the convenience of just pointing directly to a "directory" hosted by KCIDB. And, no, we're not going to use something like
No, KCIDB is not likely to triage binaries, but archiving the artifacts would probably be among its major values (as long as LF can afford it 🙈). We wouldn't be able to offer meaningful
Great 😁 👍
Nooo, I think that's probably it. |
One other thing I'm thinking about, though, is redoing the schema to ensure that names are unique. The current schema was based on what BigQuery could do with "repeating records", but since we'd have to forgo that anyway for materialized views, and for other databases, we could as well make the schema better, so the sample above looks something like this: {
"output_files": {
"cki-kernel.repo": {
"url": "https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/703863786/publish%20x86_64%20debug/3372589884/artifacts/cki-kernel.repo"
},
"cki-kernel.dnf": {
"url": "https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts/703863786/publish%20x86_64%20debug/3372589884/artifacts/repo/5.14.0-201.mr1687_221124_0516.el9.x86_64/",
"resources": {
"repodata": {
"url": "https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts/703863786/publish%20x86_64%20debug/3372589884/artifacts/repo/5.14.0-201.mr1687_221124_0516.el9.x86_64/repodata/",
"resources": {
"repomd.xml": {
"url": "https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/703863786/publish%20x86_64%20debug/3372589884/artifacts/repo/5.14.0-201.mr1687_221124_0516.el9.x86_64/repodata/repomd.xml"
}
}
},
"kernel-5.14.0-201.mr1687_221124_0516.el9.src.rpm": {
"url": "https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/703863786/publish%20x86_64%20debug/3372589884/artifacts/repo/5.14.0-201.mr1687_221124_0516.el9.x86_64/kernel-5.14.0-201.mr1687_221124_0516.el9.src.rpm"
},
"kernel-headers-5.14.0-201.mr1687_221124_0516.el9.x86_64.rpm": {
"url": "https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/703863786/publish%20x86_64%20debug/3372589884/artifacts/repo/5.14.0-201.mr1687_221124_0516.el9.x86_64/kernel-headers-5.14.0-201.mr1687_221124_0516.el9.x86_64.rpm"
}
}
}
}
} With this approach we could preserve the original filenames in KCIDB archives more easily and avoid a few issues. |
From user (someone looking into the data/results), tester and CI developer perspective, this proposal is really counterproductive 😞 It exposes test system details to users who'd have to understand all those to figure out which file is the important one to check. This is already a problem brought up when interacting with developers and maintainers. Also, test systems can create multiple files which have the same standard name, but different full path (e.g. subresults of a single test in Beaker). Forcing the unique filenames would mean either the test system expectations are going to be broken, or the CI system would need to implement arbitrary renames and moves when submitting the results. It's not about how hard it is to write the code that does that, it's about every submitter having to keep this requirement in mind and doing the extra work, only for the result being users who refuse to check the random names. Being able to keep the paths as they are while being able to give them a user friendly name as a pointer is really useful for all involved parties, and it's a shame that this feature that would be really useful (seeing a friendly name that points to whatever test system picked) is the one thing that you're trying to get rid of. |
Hey, I'm not going to do it if it's a problem for you, no worries 😁 Let's try to work something out. First of all, before throwing more arguments at you, could you perhaps share what exact problem this (array->dict) change would prevent you from solving? |
Which array->dict change? 🙈 I don't think there was any place where that part would be a problem, but by this point of the discussion I may be lost |
Ha-ha, perfect 😂 Sorry! Could we try to talk it over in a small meeting? |
Sure! Pick some time tomorrow, maybe 1PM my time? Tomorrow is kinda busy with meetings otherwise and then we're on PTO, so this would probably work the best? |
OK, after the meeting we came to an agreement regarding the following:
|
In some distributions, such as RPM based distributions (e.g. Fedora, CentOS Stream), the result of the build usually isn't a single binary but multiple RPM packages. While these may be handled separately, such a thing is very inconvenient and doesn't allow easy system installation with how the packages depend on each other. For that, the resulting packages are distributed in a yum/dnf repository, which can be both browsable or a link that is consumable by yum/dnf.
If KCIDB wishes to properly support distribution kernels,
output_files
for a kernel build should support this use case where the link doesn't point to a single binary, but a directory that may by usable by either human or a machine.The text was updated successfully, but these errors were encountered: