[podman] conmon should restart dead child processes #277

tobwen · 2021-06-15T14:43:42Z

What's the issue?

When killing slirp4netns, a pod or a container keeps on running without warnings, but without networking.

How to reproduce?

podman pod create --name systemd-pod
podman create --pod systemd-pod alpine top
podman create --pod systemd-pod alpine top
podman pod start systemd-pod
pkill -U tobwen 'slirp4netns'

What's expected?

conmon (or podman) should take care of the child processes and restart them if they crash or die.
A notification in the logs would also be nice.

What's the environment?

podman version 3.3.0-dev
conmon version 2.0.30-dev

The text was updated successfully, but these errors were encountered:

haircommander · 2021-06-15T15:14:31Z

@mheon @giuseppe is it even possible to restart slirp4netns in this case? I would imagine there'd be some runtime state that would be lost. I would expect slirp4netns dying to kill the container tbh

mheon · 2021-06-15T16:25:32Z

I think you'd lose active connections, but you'd lose those on the container going down too. I don't really think you can do the restart straight from Conmon, though. We really need access to the full container definition from the Podman DB to proceed.

tobwen · 2021-06-15T16:50:30Z

I think you'd lose active connections, but you'd lose those on the container going down too

Sure, but without a heartbeat or another check, a user wouldn't get informed about this. Can't we get a log entry at least?

mheon · 2021-06-15T17:09:55Z

Log entry is definitely viable. Container being killed is also viable. We could probably do slirp restart, but it'd require a fair bit of hacking - we'd need to be able to pass in a command for Conmon to run on slirp exit that is different and distinct from the exit command.

tobwen · 2021-06-15T17:12:13Z

With a normal systemd-setup, a gracefully killed container would restart, so would slirp. Sounds well :-)

giuseppe · 2021-06-17T07:50:46Z

@mheon @giuseppe is it even possible to restart slirp4netns in this case? I would imagine there'd be some runtime state that would be lost. I would expect slirp4netns dying to kill the container tbh

I don't think conmon should know about slirp4netns.

IMO, slirp4netns should be seen as infrastructure for the container. Killing slirp4netns is equivalent to dropping the iptables rules for root containers or killing fuse-overlayfs when it is used for rootless.

tobwen · 2021-06-17T08:46:02Z

Oops, I forgot to add fuse-overlayfs in my post.

Killing, of course, was an edge case, of course. I just wanted to simulate: When happens, if slirp4netns or fuse-overlayfs crash by itself. Will the container heal itself, will there be logs, etc.

So it's even fine when the container gets stopped (or restarted). But an entry in the logs would be fine, so the admin could react.

giuseppe · 2021-06-17T11:08:07Z

we could move slirp4netns to a separate cgroup (or at least make it configurable) so that systemd could report the failure. I'd not worry about fuse-overlayfs since we are moving to use the native overlay support for rootless as well.

rhatdan · 2021-06-18T14:54:49Z

I would love to see conmon kill the container if slirp4netns and/or fuse-overlayfs exited and exit with an error state Then it would be up to podman or systemd to decide if the pod/container should restart.

Could we potentially do this by passing pidfds to conmon, and then having conmon wait on those pids, if they exit, then conmon throws an error.

mheon · 2021-06-18T14:56:13Z

I like the pidfd idea a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[podman] conmon should restart dead child processes #277

[podman] conmon should restart dead child processes #277

tobwen commented Jun 15, 2021

haircommander commented Jun 15, 2021

mheon commented Jun 15, 2021

tobwen commented Jun 15, 2021

mheon commented Jun 15, 2021

tobwen commented Jun 15, 2021

giuseppe commented Jun 17, 2021

tobwen commented Jun 17, 2021

giuseppe commented Jun 17, 2021

rhatdan commented Jun 18, 2021

mheon commented Jun 18, 2021

[podman] conmon should restart dead child processes #277

[podman] conmon should restart dead child processes #277

Comments

tobwen commented Jun 15, 2021

What's the issue?

How to reproduce?

What's expected?

What's the environment?

haircommander commented Jun 15, 2021

mheon commented Jun 15, 2021

tobwen commented Jun 15, 2021

mheon commented Jun 15, 2021

tobwen commented Jun 15, 2021

giuseppe commented Jun 17, 2021

tobwen commented Jun 17, 2021

giuseppe commented Jun 17, 2021

rhatdan commented Jun 18, 2021

mheon commented Jun 18, 2021