-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runsv locks up after service's directory is removed and recreated (xbps-install -f) #25
Comments
More details on this, which I though I had added when exploring the issue: |
The trigger of this bug is now fixed void-linux/xbps#561, however the bug itself (if it really is one) can still be triggered otherwise. Is this issue worth keeping? |
I think it could be reasonable to check if any |
Ok, i think the actual bug is that the runsv in this situation ignores SIGTERM, because runsvdir should kill it when it detects the svdir has changed inode. |
The manual page for runsvdir suggests that this would trigger only if the symlink in |
runsvdir calls stat (so follows symlinks), tho? |
Hum right |
The link's destination changing wouldn't change the link itself, nor the dir the link is in, right? If I get it correctly, the rescan is not even done if the services dir has not changed. |
There are no signals, the mtime of the |
Not sure what the bug is here then, everything seems to work as expected, |
Removing the dir of a service in the services dir should cause the corresponding runsv to be killed by runsvdir. But since there's the indirection of the symlink, the actual trigger in runsvdir (the services dir changing) never activates. Now I wonder if we'd get it to actually kill the runsv process by doing an unrelated change to the services dir and letting it rescan. |
Yes touching the runsvdir will trigger the rescan. It behaves a bit flaky, but works in the end. Since runsvdir will start new runsv processes before it terminates old ones, the first attempt of starting runsv on the "new" service will fail because of the lock in supervise. Eventually when the "old" runsv successfully exited further runs of runsv will successfully start the service.
|
I think the proper solution would be for
|
|
Not by polling, right now runsv processes are suspended in the poll syscall and wake up on signals like sigchild or data on the control fd. I don't think there is a way to make this poll wake up if the directory is deleted with posix mechanisms. Waking up at an interval to check the directory sucks, inotify would work, but is not portable. |
When the service's directory is removed and recreated, like during an xbps package reinstall,
runsv
will enter a broken state. Once in that state, a command to down the service will causerunsv
to terminate the supervised process, but will fail to update the status file and possibly more, soft-locking.The same bug happens for the log service if present, but presence of a log service is not needed.
Minimal reproducer
This can be done as a normal user
Have a
<service>
directory containing solely a run file and asupervise -> ../supervise
symlink.From the parent directory, run
runsv <service>
Copy delete, and recreate
<service>
From the same dir as step 2, run
SVDIR=$(pwd) sv stop <service>
The supervised process will stop but the
sv
command will fail on timeout, reporting the supervised process' PID as the one it had before being terminated.Repeating the
stop
command will continue to fail and report that same invalid PID.Trying to start the service will immediately 'succeed', reporting the same PID.
I recommend using as the supervised process a program that outputs frequently, makes it easier to see when the supervised process is running/stopped. I using a simple binary that prints an incremented number each second.
The text was updated successfully, but these errors were encountered: