-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up after unexpectedly terminated build #25102
Conversation
596c6bb
to
c607887
Compare
libpod/runtime.go
Outdated
if err != nil { | ||
return stageContainersPruneReports, err | ||
} | ||
if _, err := os.Stat(filepath.Join(path, "buildah.json")); errors.Is(err, fs.ErrNotExist) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use fileutils.Exists()
@nalind Is there another (better?) way to check if a storage container is from buildah?
libpod/runtime.go
Outdated
size, err := r.store.ContainerSize(container.ID) | ||
if err != nil { | ||
report.Err = err | ||
logrus.Warnf("Failed to get size of build stage container %s: %v", container.ID, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not report the error to the caller and also print a warning.
Just reporting the errors back to the caller is good enough. The caller can then log once.
libpod/runtime.go
Outdated
report.Err = err | ||
logrus.Warnf("Failed to remove build stage container %s: %v", container.ID, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
pkg/domain/infra/abi/system.go
Outdated
|
||
reclaimedSpace := (uint64)(0) | ||
found := true | ||
for found { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a loop? Running once should be enough.
pkg/domain/infra/abi/system.go
Outdated
stageContainersPruneReports, err := ic.Libpod.PruneStageContainers() | ||
if err != nil { | ||
return nil, err | ||
} | ||
if len(stageContainersPruneReports) > 0 { | ||
found = true | ||
} | ||
reclaimedSpace += reports.PruneReportsSize(stageContainersPruneReports) | ||
systemPruneReport.ContainerPruneReports = append(systemPruneReport.ContainerPruneReports, stageContainersPruneReports...) | ||
|
||
// Prune Images | ||
imagePruneOptions := entities.ImagePruneOptions{ | ||
External: true, | ||
BuildCache: true, | ||
} | ||
imageEngine := ImageEngine{Libpod: ic.Libpod} | ||
imagePruneReports, err := imageEngine.Prune(ctx, imagePruneOptions) | ||
if err != nil { | ||
return nil, err | ||
} | ||
if len(imagePruneReports) > 0 { | ||
found = true | ||
} | ||
reclaimedSpace += reports.PruneReportsSize(imagePruneReports) | ||
systemPruneReport.ImagePruneReports = append(systemPruneReport.ImagePruneReports, imagePruneReports...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems inconsistent with the documentation, from the docs you wrote it sounds like it only prune the build containers but this prunes everything other containers and images as well.
I don't think this is desirable.
If we want that behavior then the code should not cause a conflict with --build option and only do this in addition to the existing code in SystemPrune() instead of duplicating the image logic here.
test/e2e/prune_test.go
Outdated
hasNone, result := none.GrepString("<none>") | ||
Expect(result).To(HaveLen(1)) | ||
Expect(hasNone).To(BeTrue()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leads to horrible Expected false to be true errors.
Please use the proper matchers, something like Expect(none.OutputToString()).To(ContanSubstring("none"))
test/e2e/prune_test.go
Outdated
dirents, err := os.ReadDir(containerStorageDir) | ||
Expect(err).ToNot(HaveOccurred()) | ||
Expect(dirents).To(HaveLen(6)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this matter? This peaks at rather internal storage details which can change requiring the test to be updated often. Would it not be better to run buildah containers
instead? Then in the end ensure it is removed there?
test/e2e/prune_test.go
Outdated
after := podmanTest.Podman([]string{"images", "-a"}) | ||
after.WaitWithDefaultTimeout() | ||
Expect(after).Should(ExitCleanly()) | ||
Expect(len(after.OutputToStringArray())).To(BeNumerically(">", 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this should check, AFAIK in the test/e2e setup we will always have extra images from the additional store shown.
test/e2e/prune_test.go
Outdated
hasNoneAfter, result := after.GrepString("<none>") | ||
Expect(result).To(BeEmpty()) | ||
Expect(hasNoneAfter).To(BeFalse()) | ||
hasNotLeakerImager, _ := after.GrepString("notleaker") | ||
Expect(hasNotLeakerImager).To(BeTrue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also needs to use proper matchers
test/e2e/prune_test.go
Outdated
// still have: volatile-containers.json, containers.json, containers.lock and container dir | ||
dirents, err = os.ReadDir(containerStorageDir) | ||
Expect(err).ToNot(HaveOccurred()) | ||
Expect(dirents).To(HaveLen(4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment I would use buildah containers
to check
Also what this doesn't answer what happens if the build process is still running? And another thing that this does not cleanup is the networking, not sure if there is a good way for that but if you kill the build and run with bridge networking the interfaces/firewall rules and ipam db allocations are still leaked. They should be cleaned after reboot so not that bad like the storage leak and certainly not a blocker for this here. I just mention it as there are other leaks to consider too. |
@Luap99 I have incorporated your review. I've changed the approach and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, code wise this seems good to me. just a minor comment on the man page and we need to make sure the test does not run forever
|
||
Removes any build containers that were created during the build, but were not removed because the build was unexpectedly terminated. | ||
|
||
> **This is not safe operation and should be executed only when no builds are in progress. It can interfere with builds in progress.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general style in the man pages was to write Note: ...
or something like that. I don't think we have used >
elsewhere. I don't mind it for the web view but I have not yet looked at how it looks in the rendered man page.
test/e2e/prune_test.go
Outdated
if build.LineInOutputContains("Please use signal 9") { | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great but maybe add a small sleep for like 10ms or something to not busy poll which also eats a lot of resources.
Second this needs a timeout, if the build process fails and never prints the output this would loop forever causing hard to debug issues in CI.
test/e2e/prune_test.go
Outdated
done := make(chan struct{}) | ||
go func() { | ||
for { | ||
if build.LineInOutputContains("Please use signal 9") { | ||
break | ||
} | ||
time.Sleep(100 * time.Millisecond) | ||
} | ||
build.Signal(syscall.SIGKILL) | ||
close(done) | ||
}() | ||
Eventually(done, defaultWaitTimeout).Should(BeClosed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will still hang the goroutine which means you leak the entire goroutine. This really needs a timeout in the loop so the loop actually ends.
matchedOutput := false
for range 900 {
if build.LineInOutputContains("Please use signal 9") {
matchedOutput = true
break
}
time.Sleep(100 * time.Millisecond)
}
if !matchedOutput {
Fail("Did not match special string in podman build")
}
Also when using goroutines in a ginkgo test one should use defer GinkgoRecover()
to ensure error reporting there works. In this case it may not strictly be needed as there is no actual matching done but in case of a panic it should still be recovered by ginkgo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I didn't realize that. Fixed.
The `podman system prune` command is able to remove build containers that were created during the build, but were not removed because the build terminated unexpectedly. By default, build containers are not removed to prevent interference with builds in progress. Use the **--build** flag when running the command to remove build containers as well. Fixes: https://issues.redhat.com/browse/RHEL-62009 Signed-off-by: Jan Rodák <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Honny1, Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Removed the lgtm label again (forgot there wasn't a second review) |
I am hitting this all the time when playing with AI builds Thanks @Honny1 |
The
podman system prune
command can remove build containers that were created during the build but were not removed because the build terminated unexpectedly.By default, build containers are not removed to prevent interference with builds in progress. Use the --build flag when running the command to remove build containers as well.
Reproducer:
run.sh
:podman unshare du -sh ~/.local/share/containers/
podman unshare du -sh ~/.local/share/containers/
podman unshare du -sh ~/.local/share/containers/
Fixes: #14523
Fixes: #23683
Fixes: https://issues.redhat.com/browse/RHEL-62009
Does this PR introduce a user-facing change?