-
-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(vacuum): add vacuum management for cleaning expired packages #3442
Conversation
Thank you for your contribution! |
Thank you for your great work! According to the benchmark test, it looks like the overhead is really small. Command interfaceI don't think e.g. aqua vacuum show [--expired]
aqua vacuum run The command DB schema
{
"LastUsageTime": "2025 ...",
"PkgPath": [""]
} The key is not good because we can change package name freely. I think I think we can add metadata such as type, name, etc to the value. Code review
When we return an error or output error or warn log, definitely any operation fails.
CheckWe should confirm if the treat of locale has no problem. func (vc *Controller) isPackageExpired(pkg *PackageVacuumEntry) bool {
const secondsInADay = 24 * 60 * 60
threshold := int64(*vc.Param.VacuumDays) * secondsInADay
return time.Since(pkg.PackageEntry.LastUsageTime).Seconds() > float64(threshold)
} |
I’ll revisit my draft and get back to you, taking your feedback into account! |
Refactored code to exclusively use `StorePackage` as asynchronous functions. It is important to note that from now on, **you must ensure all tasks are finalized by calling the `Close` functions after invoking `StorePackage`.** Replaced the generated key with `pkgPath` to ensure uniqueness. Changed `pkgPath` type from an array of strings to a single string.
Rewrite is completed, take a look at this @suzuki-shunsuke. I think you'll prefer this implementation. Command interfaceCommands have been refund : aqua vacuum run
aqua vacuum list
aqua vacuum list --expired DB schemaPkgPath is know the key of the package in vacuum database. We can add/remove metadata of corresponding value if needed. type Package struct {
Type string // Type of package (e.g. "github_release")
Name string // Name of package (e.g. "cli/cli")
Version string // Version of package (e.g. "v1.0.0")
PkgPath string // Path to the install path without the rootDir/pkgs/ prefix
} Check
func (vc *Controller) isPackageExpired(pkg *PackageVacuumEntry) bool {
const secondsInADay = 24 * 60 * 60
threshold := int64(*vc.Param.VacuumDays) * secondsInADay
return time.Since(pkg.PackageEntry.LastUsageTime).Seconds() > float64(threshold)
} To : func (vc *Controller) isPackageExpired(pkg *PackageVacuumEntry) bool {
const secondsInADay = 24 * 60 * 60
threshold := vc.Param.VacuumDays * secondsInADay
lastUsageTime := pkg.PackageEntry.LastUsageTime
if lastUsageTime.Location() != time.UTC {
lastUsageTime = lastUsageTime.In(time.UTC)
}
timeSinceLastUsage := time.Since(lastUsageTime).Seconds()
return timeSinceLastUsage > float64(threshold)
} Added convertion to UTC trying to resolve location issue. Visualizing Managed PackagesCoverage
A high latence is observed for test du to tests of failling access to vacuum.db Performances :The rewrite of this implementation has a noticeably similar impact based on the hyperfine results. However, the benchmark highlights a significant improvement in the average operation duration and the number of allocations per operation.
|
pkg/cli/exec/command.go
Outdated
@@ -54,5 +56,6 @@ func (i *command) action(c *cli.Context) error { | |||
if err != nil { | |||
return fmt.Errorf("parse args: %w", err) | |||
} | |||
return ctrl.Exec(c.Context, i.r.LogE, param, exeName, args...) //nolint:wrapcheck | |||
defer ctrl.CloseVacuum(logE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I don't think this works here, because aqua exec can use: https://github.com/aquaproj/aqua/blob/main/pkg/controller/exec/exec.go#L126.
And this (on UNIX) uses the "Exec" function of syscall_unix, which, when executed, replaces the calling process with the called process.
This cuts short the rest of the code, and the functions called in defer are not played. Finally, asynchronous processes are not necessarily terminated either...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Thank you.
I've confirmed defer function isn't executed.
package main
import (
"log"
"os"
"golang.org/x/sys/unix"
)
func main() {
if err := core(); err != nil {
log.Fatal(err)
}
}
func core() error {
defer func() {
log.Println("defer")
if err := os.WriteFile("test.txt", []byte(`hello`), 0o644); err != nil {
log.Print(err)
}
}()
return exe()
}
func exe() error {
return unix.Exec("/usr/bin/curl", []string{"curl", "--version"}, os.Environ())
}
$ go run main.go
curl 8.7.1 (x86_64-apple-darwin24.0) libcurl/8.7.1 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.12 nghttp2/1.63.0
Release-Date: 2024-03-27
Protocols: dict file ftp ftps gopher gophers http https imap imaps ipfs ipns ldap ldaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS GSS-API HSTS HTTP2 HTTPS-proxy IPv6 Kerberos Largefile libz MultiSSL NTLM SPNEGO SSL threadsafe UnixSockets
$ ls # test.txt isn't created.
go.mod go.sum main.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ AQUA_VACUUM_DAYS=5 hyperfine -N --warmup 3 '/Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v' '/Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v'
Benchmark 1: /Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v
Time (mean ± σ): 50.5 ms ± 1.4 ms [User: 4.2 ms, System: 2.0 ms]
Range (min … max): 47.7 ms … 56.3 ms 60 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 2: /Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v
Time (mean ± σ): 40.4 ms ± 1.4 ms [User: 6.4 ms, System: 1.8 ms]
Range (min … max): 38.7 ms … 44.6 ms 73 runs
Summary
/Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v ran
1.25 ± 0.05 times faster than /Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v 1.25 is a bit slow. Without $ hyperfine -N --warmup 3 '/Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v' '/Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v'
Benchmark 1: /Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v
Time (mean ± σ): 38.7 ms ± 1.2 ms [User: 5.1 ms, System: 1.8 ms]
Range (min … max): 36.9 ms … 42.1 ms 76 runs
Benchmark 2: /Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v
Time (mean ± σ): 40.7 ms ± 1.3 ms [User: 6.1 ms, System: 1.8 ms]
Range (min … max): 39.2 ms … 45.7 ms 72 runs
Summary
/Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v ran
1.05 ± 0.05 times faster than /Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v $ git checkout abd0bf400a42f9a313324c0024fb6dec92b6ddcc
$ AQUA_VACUUM_DAYS=5 hyperfine -N --warmup 3 '/Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v' '/Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v'
Benchmark 1: /Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v
Time (mean ± σ): 56.3 ms ± 8.6 ms [User: 5.1 ms, System: 2.3 ms]
Range (min … max): 49.5 ms … 101.7 ms 55 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 2: /Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v
Time (mean ± σ): 44.3 ms ± 1.5 ms [User: 7.6 ms, System: 2.1 ms]
Range (min … max): 42.0 ms … 48.1 ms 66 runs
Summary
/Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v ran
1.27 ± 0.20 times faster than /Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v |
I've done the performance test using some revision on my M3 Pro, but the performance is worse than the result you shared. #3442 (comment) Idea of new architectureMaybe we need to change the architecture. I came up with an idea.
This is just an idea. |
Probably |
Oh, this approach doesn't need database.
|
About benchmarksI'm surprised by the big difference between the benchmark I've made and yours, 1.27 is substantial but I see that there's a "+/- 0.20", do you still get the same thing?
Between the different tests, it fluctuates quite a bit, but overall, when there's no Regarding the idea of using files instead of a dbThis may seem like a good idea, as it avoids having to manage an additional database and would necessarily reduce the impact in terms of time, as there wouldn't be the extra layer provided by bbolt. But it's actually going to generate a lot of extra files, and in trying to clean up, we're adding a layer of "crap"... To be testedThe real problem here is that asynchronous operation is not really feasible because of the way the syscall exec works, which rempals the current process. Could we try to launch another process to manage the addition to db? If this process is not killed by the exec, this would solve the problem. |
About benchmarks, I'll try it again later.
If we create a file per package version, I guess this is no problem.
Yeah, it's a problem that defer functions don't work...
Do you mean to run daemon process? |
I missed this warning, but I specified
But I didn't specify
📝
|
No i mean to launch another process during execution of aqua exec. Process:
|
Interesting. |
$ AQUA_VACUUM_DAYS=5 hyperfine -r 100 -N --warmup 3 '/Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v' '/Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v'
Benchmark 1: /Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v
Time (mean ± σ): 51.1 ms ± 1.5 ms [User: 4.7 ms, System: 2.1 ms]
Range (min … max): 48.8 ms … 59.4 ms 100 runs
Benchmark 2: /Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v
Time (mean ± σ): 40.7 ms ± 1.3 ms [User: 5.8 ms, System: 1.8 ms]
Range (min … max): 39.2 ms … 44.8 ms 100 runs
Summary
/Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v ran
1.25 ± 0.05 times faster than /Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v git rev-parse HEAD
337188f60399da030748d36ef898f7e4f149bc8a $ hyperfine --version
hyperfine 1.19.0 |
I quickly tried out what it could do with a "simple" code (generated with claude.ai), which can be found here: https://github.com/NikitaCOEUR/isolation_process. In principle, it works! |
Oh, nice. I'll take a look. This is still work in progress, but I'm trying to implement #3442 (comment) . |
Closed in favor of #3467 |
Check List
Require signed commits
, so all commits must be signedIntroduce Vacuum Feature for Expired Packages
Addresses #2942 and Discussion #3086.
This pull request introduces a new feature for vacuuming expired packages.
Workflow
Environment Variable Requirement
The feature activates only when the environment variable
AQUA_VACUUM_DAYS
is set with a positive integer greater than 0.Implementation in
InstallPackage
exec
orinstall
(but notaqua update
oraqua cp
, which currently useprovideNilVacuumController
in their controller initialization), package-related information is added or updated in a BoltDB (vacuum.db
) located inROOT_DIR
.Expiration Criteria
AQUA_VACUUM_DAYS
. Expired packages become eligible for "vacuum."Execution of
aqua vacuum
Commandvacuum
command performs the following operations:ROOT_DIR/pkgs/type.../pkg/version
.Optional - Not Implemented: Automatic Vacuum Execution
aqua install
command to trigger vacuum expiration automatically, enabling "auto-cleaning."Visualizing Managed Packages
The
aqua vacuum
command includes two flags,-list
and-expired
:-list
lists all packages.-expired
lists only expired packages.These flags enable the use of a fuzzy finder in the console to explore the
vacuum.db
database and retrieve package-related information:Coverage
A high latence is observed for test du to tests of failling access to vacuum.db
Performance Considerations
Significant effort has been made to minimize the overhead introduced by this feature on Aqua commands. Benchmark tests in the
exec
controller attest to this.Three methods for adding packages to the database were implemented (storePackage, storesPackages, and asyncStorePackage). However, the asynchronous method (asyncStorePackage) shows the least overhead and is preferred. I think i'll remove all others.
Benchmark when adding 100 packages at the same time :
The asynchronous method is the best option in all cases, especially when adding one package at a time (the typical use case for Aqua).
Benchmark of exec command via hyperfine :
Main impact of storing packages is the method PackageInfo.PkgPaths, but i think the best choice to have all Paths concerned by each package. There are used to remove package from system during vacuum operation.