Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[podman] conmon should restart dead child processes #277

Open
tobwen opened this issue Jun 15, 2021 · 10 comments
Open

[podman] conmon should restart dead child processes #277

tobwen opened this issue Jun 15, 2021 · 10 comments

Comments

@tobwen
Copy link

tobwen commented Jun 15, 2021

What's the issue?

When killing slirp4netns, a pod or a container keeps on running without warnings, but without networking.

How to reproduce?

podman pod create --name systemd-pod
podman create --pod systemd-pod alpine top
podman create --pod systemd-pod alpine top
podman pod start systemd-pod
pkill -U tobwen 'slirp4netns'

What's expected?

  1. conmon (or podman) should take care of the child processes and restart them if they crash or die.
  2. A notification in the logs would also be nice.

What's the environment?

podman version 3.3.0-dev
conmon version 2.0.30-dev

@haircommander
Copy link
Collaborator

@mheon @giuseppe is it even possible to restart slirp4netns in this case? I would imagine there'd be some runtime state that would be lost. I would expect slirp4netns dying to kill the container tbh

@mheon
Copy link
Member

mheon commented Jun 15, 2021

I think you'd lose active connections, but you'd lose those on the container going down too. I don't really think you can do the restart straight from Conmon, though. We really need access to the full container definition from the Podman DB to proceed.

@tobwen
Copy link
Author

tobwen commented Jun 15, 2021

I think you'd lose active connections, but you'd lose those on the container going down too

Sure, but without a heartbeat or another check, a user wouldn't get informed about this. Can't we get a log entry at least?

@mheon
Copy link
Member

mheon commented Jun 15, 2021

Log entry is definitely viable. Container being killed is also viable. We could probably do slirp restart, but it'd require a fair bit of hacking - we'd need to be able to pass in a command for Conmon to run on slirp exit that is different and distinct from the exit command.

@tobwen
Copy link
Author

tobwen commented Jun 15, 2021

With a normal systemd-setup, a gracefully killed container would restart, so would slirp. Sounds well :-)

@giuseppe
Copy link
Member

@mheon @giuseppe is it even possible to restart slirp4netns in this case? I would imagine there'd be some runtime state that would be lost. I would expect slirp4netns dying to kill the container tbh

I don't think conmon should know about slirp4netns.

IMO, slirp4netns should be seen as infrastructure for the container. Killing slirp4netns is equivalent to dropping the iptables rules for root containers or killing fuse-overlayfs when it is used for rootless.

@tobwen
Copy link
Author

tobwen commented Jun 17, 2021

Oops, I forgot to add fuse-overlayfs in my post.

Killing, of course, was an edge case, of course. I just wanted to simulate: When happens, if slirp4netns or fuse-overlayfs crash by itself. Will the container heal itself, will there be logs, etc.

So it's even fine when the container gets stopped (or restarted). But an entry in the logs would be fine, so the admin could react.

@giuseppe
Copy link
Member

we could move slirp4netns to a separate cgroup (or at least make it configurable) so that systemd could report the failure. I'd not worry about fuse-overlayfs since we are moving to use the native overlay support for rootless as well.

@rhatdan
Copy link
Member

rhatdan commented Jun 18, 2021

I would love to see conmon kill the container if slirp4netns and/or fuse-overlayfs exited and exit with an error state Then it would be up to podman or systemd to decide if the pod/container should restart.

Could we potentially do this by passing pidfds to conmon, and then having conmon wait on those pids, if they exit, then conmon throws an error.

@mheon
Copy link
Member

mheon commented Jun 18, 2021

I like the pidfd idea a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants