-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(uv): parse the dist-manifest.json to not hardcode sha256 in rules_python #2578
base: main
Are you sure you want to change the base?
Conversation
Not fully sure why it failed. |
I'll do the moving of files around in a separate PR so that the functionality con stay in this PR. |
This PR starts establishing a structure that will eventually become a part of our API. This is a prerequisite for bazelbuild#2578 which removes the versions.bzl file in favour of a more dynamic configuration of the extension. We also remove the `defs.bzl` to establish a one symbol per file convention. Things that I wish we could change is `//python/uv:extensions.bzl` and the fact that we have `extensions` in the load path. I think it cannot be removed, because that may break the BCR test. On the other hand, maybe we could remove it and do an alpha release to verify this assumption. Work towards bazelbuild#1975
f94d74c
to
087fb57
Compare
This PR starts establishing a structure that will eventually become a part of our API. This is a prerequisite for #2578 which removes the versions.bzl file in favour of a more dynamic configuration of the extension. We also remove the `defs.bzl` to establish a one symbol per file convention. Things that I wish we could change is `//python/uv:extensions.bzl` and the fact that we have `extensions` in the load path. I think it cannot be removed, because that may break the BCR test. On the other hand, maybe we could remove it and do an alpha release to verify this assumption. Work towards #1975
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only had an extra minute free, so didn't get far in review.
I see some new tag classes added. Can you give a quick overview of the thinking?
sha256 values for each binary. | ||
""" | ||
dist_manifest = module_ctx.path(_DIST_MANIFEST_JSON) | ||
module_ctx.download(base_url + "/" + _DIST_MANIFEST_JSON, output = dist_manifest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want a sha for the dist manifest download?
Using untrusted shas to verify a download is equivalent to not using shas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the sha256
, I am downloading the manifest that only has links and I don't see how the extra sha256
value here could be beneficial.
This has been inspired by https://github.com/bazel-contrib/rules_go/blob/7f6a9bf5870f2b5ffbba1615658676dcabf9edc7/go/private/sdk.bzl#L84 where rules_go
is downloading things from the manifest if the user does not specify anything.
Maybe we should instead an optional sha256s
dict where we have platform labels as keys and the sha values as values? And we could print a warning with buildozer
command to add the said sha256s
values to their MODULE.bazel
file if they want things to be deterministic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I see your point now. Hm.
On the one hand, not using shas means, if someone MITM the manifest, then they can control what actual uv stuff is downloaded and then used.
On the other hand, using a sha means we are tied to a particular manifest and don't benefit from the automatic-ness of using the manifest. Which is pretty appealing behavior.
Hm, not sure how, or if, we can split the difference. These seem at odds.
Also:
- Does the sha arg of download() allow bazel to better cache it?
- Does the sha arg feed into MODULE.lock? I would expect so, since this is module-phase behavior.
- Without the sha arg set, does Bazel print a warning? (http_archive does, not sure if download does)
The main idea would be to leverage the The users can use the following tag_classes to interact:
After having written this, maybe we should just have:
My thinking was that any user may want to do the following to customize the So all of the above could be done by: uv.config(version="0.5.24") # override the version
uv.config(base_url="my_internal") # override the base URL
uv.platform(...) # add an extra platform to support locking on And if they wish, they can just ignore this and register their own |
This is using the `dist-manifest.json` on the GH releases page so that we can get the expected `sha256` value of each available file and download all of the usable archives. This means that `rules_python` no longer needs to be updated for `uv` version bumps. The remaining bits for closing the ticket: - [ ] Finalize the `lock` interface. - [ ] Add it to the `pip.parse` hub repo if `pyproject.toml` is passed in. - [ ] Add a rule/target for `venv` creation. Work towards bazelbuild#1975.
9144ce8
to
b8fa595
Compare
Marking as ready to be reviewed even though this is not yet ready to be merged, but I would like feedbakc on the APIs. |
+1 on uv.config. I get uv.toolchain. It's telling what version of uv we want to use. I don't fully grok uv.platform. Is the idea that a uv toolchain is registered for every platform() call? Ah -- it's carrying the constraint labels that get put onto the toolchain. Plus also telling the repo name to use. Is that it? Because putting those on uv.toolchain() is tricky; it would require passing e.g. I think I like it. The constraints for a platform are going to be the same, regardless of e.g. uv version. Is part of the idea here that rules_python calls uv.platform() for the common ones? And if a user has a special platform, e.g there's a musl build of uv. they would do: |
}, | ||
) | ||
|
||
platform = tag_class( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that isn't clear from the doc here is how the word "for" is operating.
Does this define the platforms uv will run on, or the platform that uv will generate a lockfile for? I'm assuming the former.
Maybe we should just have uv.config instead of the separate tag classes that have barely any api?
Correct. For uv we have to have some labels that we can put into tcw on the toolchain, so I am doing a very similar thing to what the
Yes that is the idea.
Also correct. |
See below -- thoughts?
Hm. Let see what some psuedo code looks like and what it might imply.
I just merged all 3 tag classes and threw it in a blender like a code fuzzer, for exploratory purposes. I'm also throwing in complications like we see from the pip/python extensions just for fun. Aesthetically, that doesn't look too bad. It seems fairly clear to understand? The docs for the config "function" will be a bit unwieldy, but that seems a minor loss. Three observations First: Recall that tag_class calls preserve their order. So, one way to interpret that pseudo code is:
It's kind of neat I was able to be so specific. An interesting implication of respecting the order of calls is I can easily insert toolchains earlier in the ordering. This allows me to define a toolchain with more specific criteria (linux, arm, musl) so that it matches before one with more broad criteria (linux, arm). Hm, though, things that affect "defaults" seem worthy of looking more distinct (different "function call"), e.g. L1,2,4,5. The order of those doesn't matter. With the sort-of exception of L1 ("first defined version is default"). (edit: inspired aside: maybe add e.g. Second: as I wrote the pseudo code, I was reminded of builder patterns, in particular Java protos. These are pretty good at giving fine-grained control in pretty clear and understandable ways:
My point being, were it available, we'd probably use a builder style API. The accumulative nature of tag_class calls seems akin to that. Third: we have multiple dimensions of interaction: version x platform constraints x url x arbitrary constraints. What we're trying to do give users enough control over those variables and their expansion that they get what they need without too much of what they don't need. This is a pretty tall ask of any API. Fourth: I'm reminded of the recent PR that reads the is_default setting from a file. Another approach would be to offload everything into a separate config file altogether. This seems like overkill, but it did pop into my head. |
I actually like the expressivity of the API you came up with for the platform definitions. Maybe it would be an interesting thing to have two things: uv.defaults() # used by `rules_python` to set the default platforms, version, etc
uv.config() # used by the users to customize things similar to what you suggested above I think that would be also interesting in the pip.parse(hub_name = "foo", requirements_lock = "default_requirements_lock.txt")
pip.parse(hub_name = "foo", requirements_lock = "requirements.linux_x86_64.txt", constraint_values = ["@platforms//os:linux", "@platforms//cpu:x86_64"])
pip.parse(hub_name = "foo", requirements_lock = "requirements.gpu.linux_x86_64.txt", constraint_values = ["@platforms//os:linux", "@platforms//cpu:x86_64"], flag_values = ["@//:my_gpu_flag": "enabled"]) This is related to #2548 and could potentially be a good way to allow users to modify things. However, the
Maybe we could solve the concerns above by adding another attribute called I am thinking that the following would be unambiguous: uv.config() # the builder pattern
...
uv.toolchain() # non-builder pattern
# likewise for pip
pip.parse() # non-builder pattern
pip.config() # builder pattern I'll think about this more and when I have time to finish implementation might go with the builder/non-builder patterns present in the same bzlmod extension. |
Yeah, +1 to something like this. That
I like |
Other ideas for naming: append_config, build_config, append_last_config, merge_last_config, add_or_merge_config. |
This is using the
dist-manifest.json
on the GH releases page so thatwe can get the expected
sha256
value of each available file anddownload all of the usable archives. This means that
rules_python
nolonger needs to be updated for
uv
version bumps.The remaining bits for closing the ticket:
lock
interface.pip.parse
hub repo ifpyproject.toml
is passed in.
venv
creation.Work towards #1975, stacked on #2580.