-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Experimental environment activation cache #2367
base: main
Are you sure you want to change the base?
perf: Experimental environment activation cache #2367
Conversation
Most of this code is copied from prefix-dev#1832 build by @borchero but the merge conflicts made me copy it onto main by hand.
I think your profiling showed that a major bottleneck of We could relatively easily switch to a different format for the data (e.g. a binary format). I also think that reading the PrefixRecord is probably mostly slow because you have a lot of |
Profiled, with and without cache and compared to old pixi (main) |
This pr will add a hash of the lock file to the `EnvironmentFile` (`.pixi/envs/default/conda-meta/pixi`) It will look like this: ```json { "manifest_path": "/home/rarts/dev/pixi/pixi.toml", "environment_name": "default", "pixi_version": "0.34.0", "environment_lock_file_hash": "4f36ee620f10329d" } ``` And that hash will be compared with the current lockfile on `pixi run` `pixi shell` `pixi shell-hook`. ### Profile result ```shell ❯ hyperfine "pixi run echo" "old-pixi run echo" Benchmark 1: pixi run echo Time (mean ± σ): 381.6 ms ± 22.1 ms [User: 193.8 ms, System: 246.3 ms] Range (min … max): 344.5 ms … 414.3 ms 10 runs Benchmark 2: old-pixi run echo Time (mean ± σ): 868.2 ms ± 58.2 ms [User: 480.0 ms, System: 557.0 ms] Range (min … max): 791.1 ms … 950.8 ms 10 runs Summary pixi run echo ran 2.28 ± 0.20 times faster than old-pixi run echo ``` > [!NOTE] > The remaining `381ms` is the activation which is fixed by #2367 ### UX - It's turned on by default - You can request a re-validate on `pixi run/shell/shell-hook` with `--revalidate` - All commands designed to update the lock file or `pixi install` will always re-validate. ### TODO: - [x] : Add tests: chosen python integration tests as I was to fed up with the extreme amount of time spent on writing tests in Rust - [x] : use Enum instead of booleans. Using `UpdateMode::QuickValidate` and `UpdateMode::Revalidate` - [x] : Document behavior - [x] : Extend logic to cli. `--revalidate` --------- Co-authored-by: Hofer-Julian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, awesome improvement @ruben-arts!
I only had small comments.
I could also reproduce the speedup:
nu ❯ hyperfine "pixi run echo hello" "pixi-activation-cache run echo hello" "pixi-activation-cache run --force-activate echo hello" -r 20 --warmup 1
Benchmark 1: pixi run echo hello
Time (mean ± σ): 946.2 ms ± 6.4 ms [User: 458.8 ms, System: 800.2 ms]
Range (min … max): 934.6 ms … 961.9 ms 20 runs
Benchmark 2: pixi-activation-cache run echo hello
Time (mean ± σ): 538.6 ms ± 4.4 ms [User: 265.5 ms, System: 393.9 ms]
Range (min … max): 529.8 ms … 545.1 ms 20 runs
Benchmark 3: pixi-activation-cache run --force-activate echo hello
Time (mean ± σ): 906.2 ms ± 7.6 ms [User: 419.1 ms, System: 672.2 ms]
Range (min … max): 894.5 ms … 922.2 ms 20 runs
Summary
pixi-activation-cache run echo hello ran
1.68 ± 0.02 times faster than pixi-activation-cache run --force-activate echo hello
1.76 ± 0.02 times faster than pixi run echo hello
names | ||
.into_iter() | ||
.map(|name| { | ||
let value = std::env::var(name).unwrap_or_default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need to cache environment variables that are not on the system?
If yes, an None
would be nicer than an empty string
/// If it can get the cache, it will validate it with the lock file and the current environment. | ||
/// If the cache is valid, it will return the environment variables from the cache. | ||
async fn try_get_valid_activation_cache( | ||
lock_file: Option<&LockFile>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lock_file: Option<&LockFile>, | |
lock_file: &LockFile, |
This makes the expectation of the function clearer
if cache.hash == hash { | ||
return Some(cache.environment_variables); | ||
} | ||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if cache.hash == hash { | |
return Some(cache.environment_variables); | |
} | |
None | |
if cache.hash == hash { | |
Some(cache.environment_variables) | |
} else { | |
None | |
} |
with cache_path.open("r+") as f: | ||
contents = f.read() | ||
new_contents = contents.replace("test123", "test456") | ||
f.seek(0) # Move pointer to start of the file | ||
f.write(new_contents) | ||
f.truncate() # Remove any remaining original content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deserializing that to a dict, doing the transformations there and serializing it again would make this more readable I think :)
Disclaimer: Most of this code is copied from #1832 build by @borchero, but the merge conflicts made me copy it onto main by hand, I did do some small changes so it's not a one-on-one. But the "work" was done by @borchero.
This PR will add a
.pixi/activation-env-v0
folder after activation, in which it can store the environment activation cache.The cache
The cache consists of an
EnvironmentHash
this is the same hash as the one used in thetask_cache
. It's build by hasing the activation scripts, the environment data from the lockfile including all package.url's.Next to the hash it contains the full set of environment variables, that are the result of the initial activation.
The invalidation
This cache is currently not used when:
lock_file
is given to the activator it will skip checking for a cacheEnvironmentHash
is not similar to theEnvironmentHash
of the activation. (Generated on the go)[activation]
table is hashedEnable the feature
This is an experimental feature (a first for pixi)
You can test it by:
To disable it you can force it to always activate, which only makes sense if you turn on
experimental
BREAKING task_cache
I've had to add
activation.env
to theEnvironmentHash
, which will break the existing environment hashes on people projects. To me this is the right thing to do. It will not update if you don't have[activation.env]
so it is not all project being effected..TODO
pixi clean
ing only this cache