-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: correctly adjudicate collision bind() of specific port #82
Conversation
On Linux (at least) SO_REUSEADDR, which allows a new listener to bind while an existing sock is in FIN-WAIT. Apparently this allows any number of sockets to bind(), but only when listen() to succeed. Further, on Linux there is a known documented race condition which can result in all listen() failing. It isn't clear how to handle this case without a potentially infinite loop, so ignore it. If this happens, then eg. no PVA server will get port 5075. So when probing for another listener, it is necessary to enter the listening state. When this fails, the socket is no longer usable for another bind(), so it is necessary to allocate another for the next attempt.
c4f74b3
to
a3a685b
Compare
The OSX build CI failures will be resolved by epics-base/setuptools_dso#35 |
It looks like this is only for stream sockets and TCP, so if no server gets port 5075 that won't prevent UDP searche packets from being received and distributed to other servers via the localhost loopback. Is that correct? Have you given any thought to how much work might be needed for a server to accept its sockets from inetd (stdin/stdout) or systemd via |
Correct. As I understand it, this laziness of S1=socket(AF_INET, SOCK_STREAM)
S2=socket(AF_INET, SOCK_STREAM)
S1.bind(('127.0.0.1', 5000))
S2.bind(('127.0.0.1', 5000)) # fails! (EADDRINUSE) S1=socket(AF_INET, SOCK_STREAM)
S2=socket(AF_INET, SOCK_STREAM)
S1.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
S2.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
S1.bind(('127.0.0.1', 5000))
S2.bind(('127.0.0.1', 5000)) # succeeds!
S1.listen(4)
S2.listen(4) # fails! (EADDRINUSE) |
No, not really. It seems like a lot of work for not much benefit, with a high probability of mis-configured .socket files (plural!) causing chaos. What I have thought about is calling |
... of course usage of |
fyi. my attempt at provoking this race was not successful. I guess a shell loop is too slow with so many cat > tick.db << EOF
record(calc, "$(P=)cnt") {
field(INPA, "$(P=)cnt")
field(CALC, "A+1")
field(SCAN, "1 second")
}
EOF
for n in `seq 1 100`; do sh -c "softIocPVX -m P=$n: -d tick.db -S </dev/null &" ; done Followed by for n in `seq 1 100`; do echo $n:cnt; done | xargs pvxget Will complete without timeout if all PVA servers started. cleanup killall softIocPVX |
It can, sometimes, eventually. Looping through iocBomb.sh gets a timeout on one or two PVs within a couple of minutes on my laptop without this PR. With this PR applied, I eventually got bored. while sh iocBomb.sh; do date; done I am satisfied with this result. |
Attempts to address #81.
On Linux (at least) SO_REUSEADDR, which allows a new listener to bind while an existing sock is in FIN-WAIT. Apparently this allows any number of sockets to bind(), but only when listen() to succeed.
Further, on Linux there is a known documented race condition which can result in all listen() failing. It isn't clear how to handle this case without a potentially infinite loop, so ignore it. If this happens, then eg. no PVA server will get port 5075.
So when probing for another listener, it is necessary to enter the listening state. When this fails, the socket is no longer usable for another bind(), so it is necessary to allocate another for the next attempt.