Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsock'ed NGINX reverse proxy server issue #2

Open
Bert-Proesmans opened this issue Sep 25, 2024 · 6 comments
Open

Unsock'ed NGINX reverse proxy server issue #2

Bert-Proesmans opened this issue Sep 25, 2024 · 6 comments
Assignees

Comments

@Bert-Proesmans
Copy link

Bert-Proesmans commented Sep 25, 2024

Hi, inspiring (great) tool!

I've been trying the library on nginx to redirect into VSOCK (starting from both AF_INET and AF_UNIX). It seems to work, but not 100% without issue.

Setup;

  • Configure VSOCK redirect configfile at expected place for AF_INET redirect
  • Run nginx with http reverse proxy for the configured IP+PORT or point it directly to the config file using unix:/ prefix
  • Unsock nginx with ld preload

To clarify, I validated that my proxy and upstream VM's properly communicate with each other; socat VSOCK-LISTEN and VSOCK-CONNECT allows me to bidirectionally communicate.
My nginx configuration also validates without any issues (nginx -t).

Observed behaviour;
Nginx actually properly proxies towards the upstream. The upstream receives the request and responds with data. Then nginx never forwards that data to the client.
Nginx terminates the client connection due to proxy timeout after set amount of seconds (60s in my case).
The weird thing is that nginx sometimes logs a special client disconnect message (can't exactly reproduce it), this message includes how many bytes were received/sent to client and upstream, and it certainly shows a pretty large amount of bytes received by the upstream (corresponding to the actual response data of the upstream).

I tested; 127.0.0.1:443 -> NGINX Server -> 127.175.0.0:8000 - [UNSOCK] -> VSOCK:10:8000
I tested; 127.0.0.1:443 -> NGINX Server -> unix:/run/nginx-vsock/upstream.vsock - [UNSOCK] -> VSOCK:10:8000
I tested; 127.0.0.1:443 -> NGINX Stream -> unix:/run/nginx/frontend.sock -> NGINX Server -> unix:/run/nginx-vsock/upstream.vsock - [UNSOCK] -> VSOCK:10:8000
Every scenario has the same symptoms.

Excerpt of logs when symptoms occur

Server 1 - Proxy

[bert-proesmans@1-test:~]$ journalctl -efu nginx
Sep 25 21:51:16 1-test systemd[1]: Starting Nginx Web Server...
Sep 25 21:51:16 1-test nginx-pre-start[556]: nginx: the configuration file /nix/store/3sfxs2lyqyvavv7xgg2f107glqm8d9xz-nginx.conf syntax is ok
Sep 25 21:51:16 1-test nginx-pre-start[556]: nginx: configuration file /nix/store/3sfxs2lyqyvavv7xgg2f107glqm8d9xz-nginx.conf test is successful
[...]
Sep 25 21:51:16 1-test generate-vsock-config[571]: UNSOCK_FILE: /run/nginx-unsock/photos-upstream.vsock
Sep 25 21:51:16 1-test generate-vsock-config[571]: UNSOCK_VSOCK_PORT: 10000
Sep 25 21:51:16 1-test generate-vsock-config[571]: UNSOCK_VSOCK_CID: 90000
Sep 25 21:51:16 1-test generate-vsock-config[571]: UNSOCK_VSOCK_CONNECT_SIBLING: 1
Sep 25 21:51:16 1-test systemd[1]: Started Nginx Web Server.
Sep 25 21:51:17 1-test nginx[574]: 2024/09/25 21:51:17 [notice] 574#574: using the "epoll" event method
Sep 25 21:51:17 1-test nginx[574]: 2024/09/25 21:51:17 [notice] 574#574: nginx/1.26.2
Sep 25 21:51:17 1-test nginx[574]: 2024/09/25 21:51:17 [notice] 574#574: built by gcc 13.3.0 (GCC)
Sep 25 21:51:17 1-test nginx[574]: 2024/09/25 21:51:17 [notice] 574#574: OS: Linux 6.6.52
Sep 25 21:51:17 1-test nginx[574]: 2024/09/25 21:51:17 [notice] 574#574: getrlimit(RLIMIT_NOFILE): 1024:524288
Sep 25 21:51:17 1-test nginx[574]: 2024/09/25 21:51:17 [notice] 574#574: start worker processes
Sep 25 21:51:17 1-test nginx[574]: 2024/09/25 21:51:17 [notice] 574#574: start worker process 578
Sep 25 21:51:17 1-test systemd[1]: Reloading Nginx Web Server...
Sep 25 21:51:17 1-test nginx[585]: nginx: the configuration file /nix/store/3sfxs2lyqyvavv7xgg2f107glqm8d9xz-nginx.conf syntax is ok
Sep 25 21:51:17 1-test nginx[585]: nginx: configuration file /nix/store/3sfxs2lyqyvavv7xgg2f107glqm8d9xz-nginx.conf test is successful
[...]
Sep 25 21:54:32 1-test nginx[592]: 2024/09/25 21:54:32 [error] 592#592: *1 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 127.0.0.1, server: photos.alpha.proesmans.eu, request: "GET / HTTP/2.0", upstream: "http://unix:/run/nginx-unsock/photos-upstream.vsock/", host: "photos.alpha.proesmans.eu"

EDIT; I patched unsock with this patch to add the VMADDR_FLAG_TO_HOST flag to the VSOCK socket struct flags field, this instructs the driver to forward vsock data to the host (cid 2) even though the destination cid is another value.

https://github.com/Bert-Proesmans/nix/blob/cb77bf615af27c8413c419e57b46bbb802851834/packages/unsock/001-flag-to-host.patch

Server 1 - Local curl test

[bert-proesmans@1-test:~]$ curl -vv --resolve *:80:127.0.0.1 --resolve *:443:127.0.0.1 --insecure https://photos.alpha.proesmans.eu
* Added *:80:127.0.0.1 to DNS cache
* RESOLVE *:80 using wildcard
* Added *:443:127.0.0.1 to DNS cache
* RESOLVE *:443 using wildcard
* Hostname photos.alpha.proesmans.eu was found in DNS cache
*   Trying 127.0.0.1:443...
* Connected to photos.alpha.proesmans.eu (127.0.0.1) port 443
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / X25519 / id-ecPublicKey
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=photos.alpha.proesmans.eu
*  start date: Sep 25 21:51:16 2024 GMT
*  expire date: Oct 25 21:51:16 2026 GMT
*  issuer: CN=minica root ca 3b3aca
*  SSL certificate verify result: self-signed certificate in certificate chain (19), continuing anyway.
*   Certificate level 0: Public key type EC/secp384r1 (384/192 Bits/secBits), signed using ecdsa-with-SHA384
*   Certificate level 1: Public key type EC/secp384r1 (384/192 Bits/secBits), signed using ecdsa-with-SHA384
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://photos.alpha.proesmans.eu/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: photos.alpha.proesmans.eu]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [user-agent: curl/8.9.1]
* [HTTP/2] [1] [accept: */*]
> GET / HTTP/2
> Host: photos.alpha.proesmans.eu
> User-Agent: curl/8.9.1
> Accept: */*
> 
* Request completely sent off
< HTTP/2 504 
< server: nginx
< date: Wed, 25 Sep 2024 21:54:32 GMT
< content-type: text/html
< content-length: 160
< 
<html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
</html>
* Connection #0 to host photos.alpha.proesmans.eu left intact

The above log records a 504 Gateway Time-out on the client-side. The returned value should be an empty http directory listing.

Server 2 - Upstream

[bert-proesmans@2-test:~]$ python3 -m http.server -b 127.0.0.1 8080 &
socat -d -d VSOCK-LISTEN:10000,fork TCP4-CONNECT:127.0.0.1:8080
[1] 624
2024/09/25 21:52:20 socat[625] N VSOCK CID=90000
2024/09/25 21:52:20 socat[625] N listening on AF=40 cid:4294967295 port:10000
Serving HTTP on 127.0.0.1 port 8080 (http://127.0.0.1:8080/) ...
2024/09/25 21:53:32 socat[625] N accepting connection from AF=40 cid:10000 port:2672637753 on AF=40 cid:90000 port:10000
2024/09/25 21:53:32 socat[625] N forked off child process 626
2024/09/25 21:53:32 socat[625] N listening on AF=40 cid:4294967295 port:10000
2024/09/25 21:53:32 socat[626] N opening connection to AF=2 127.0.0.1:8080
2024/09/25 21:53:32 socat[626] N successfully connected from local address AF=2 127.0.0.1:53622
2024/09/25 21:53:32 socat[626] N starting data transfer loop with FDs [6,6] and [5,5]
127.0.0.1 - - [25/Sep/2024 21:53:32] "GET / HTTP/1.1" 200 -
2024/09/25 21:53:32 socat[626] N write(5, 0x55d650742000, 275) completed
2024/09/25 21:53:32 socat[626] N write(6, 0x55d650742000, 342) completed
2024/09/25 21:53:32 socat[626] N socket 2 (fd 5) is at EOF
2024/09/25 21:53:33 socat[626] N exiting with status 0
2024/09/25 21:53:33 socat[625] N childdied(): handling signal 17

EDIT; You can see here at the end; the HTTP upstream server replied instantly to the proxied request.

Nginx config
pid /run/nginx/nginx.pid;
error_log stderr debug;
daemon off;
events {
}
http {
	# Load mime types.
	include /nix/store/3zrkasqf3sqr9ff6sv5fddhlbf072a36-mailcap-2.1.54/etc/nginx/mime.types;
	# When recommendedOptimisation is disabled nginx fails to start because the mailmap mime.types database
	# contains 1026 entries and the default is only 1024. Setting to a higher number to remove the need to
	# overwrite it because nginx does not allow duplicated settings.
	types_hash_max_size 4096;
	include /nix/store/lf89iyal2p7jj26kngfmmias80aia2sc-nginx-1.26.2/conf/fastcgi.conf;
	include /nix/store/lf89iyal2p7jj26kngfmmias80aia2sc-nginx-1.26.2/conf/uwsgi_params;
	default_type application/octet-stream;
	upstream photos-upstream {
		server unix:/run/nginx-unsock/photos-upstream.vsock ;
	}
	# optimisation
	sendfile on;
	tcp_nopush on;
	tcp_nodelay on;
	keepalive_timeout 65;
	ssl_protocols TLSv1.2 TLSv1.3;
	ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-CHACHA20-POLY1305;
	# Keep in sync with https://ssl-config.mozilla.org/#server=nginx&config=intermediate
	ssl_session_timeout 1d;
	ssl_session_cache shared:SSL:10m;
	# Breaks forward secrecy: https://github.com/mozilla/server-side-tls/issues/135
	ssl_session_tickets off;
	# We don't enable insecure ciphers by default, so this allows
	# clients to pick the most performant, per https://github.com/mozilla/server-side-tls/issues/260
	ssl_prefer_server_ciphers off;
	# OCSP stapling
	ssl_stapling on;
	ssl_stapling_verify on;
	brotli on;
	brotli_static on;
	brotli_comp_level 5;
	brotli_window 512k;
	brotli_min_length 256;
	brotli_types application/atom+xml application/geo+json application/javascript application/json application/ld+json application/manifest+json application/rdf+xml application/vnd.ms-fontobject application/wasm application/x-rss+xml application/x-web-app-manifest+json application/xhtml+xml application/xliff+xml application/xml font/collection font/otf font/ttf image/bmp image/svg+xml image/vnd.microsoft.icon text/cache-manifest text/calendar text/css text/csv text/javascript text/markdown text/plain text/vcard text/vnd.rim.location.xloc text/vtt text/x-component text/xml;
	gzip on;
	gzip_static on;
	gzip_vary on;
	gzip_comp_level 5;
	gzip_min_length 256;
	gzip_proxied expired no-cache no-store private auth;
	gzip_types application/atom+xml application/geo+json application/javascript application/json application/ld+json application/manifest+json application/rdf+xml application/vnd.ms-fontobject application/wasm application/x-rss+xml application/x-web-app-manifest+json application/xhtml+xml application/xliff+xml application/xml font/collection font/otf font/ttf image/bmp image/svg+xml image/vnd.microsoft.icon text/cache-manifest text/calendar text/css text/csv text/javascript text/markdown text/plain text/vcard text/vnd.rim.location.xloc text/vtt text/x-component text/xml;
	proxy_redirect          off;
	proxy_connect_timeout   60s;
	proxy_send_timeout      60s;
	proxy_read_timeout      60s;
	proxy_http_version      1.1;
	# don't let clients close the keep-alive connection to upstream. See the nginx blog for details:
	# https://www.nginx.com/blog/avoiding-top-10-nginx-configuration-mistakes/#no-keepalives
	proxy_set_header        "Connection" "";
	include /nix/store/rmcl3ckxhx2q8sg32274y7rdf7y6ir61-nginx-recommended-proxy-headers.conf;
	# $connection_upgrade is used for websocket proxying
	map $http_upgrade $connection_upgrade {
		default upgrade;
		''      close;
	}
	client_max_body_size 10m;
	server_tokens off;
	[...]
	server {
		listen 0.0.0.0:443 ssl ;
		server_name photos.alpha.proesmans.eu ;
		http2 on;
		[...]
		location / {
			proxy_pass http://photos-upstream;
			include /nix/store/rmcl3ckxhx2q8sg32274y7rdf7y6ir61-nginx-recommended-proxy-headers.conf;
		}
	}
}

Expected behaviour;
Nginx proxies upstream data correctly to the client. Upstream connections are transparently proxied through the VSOCK driver.

It seems like some signaling is not happening correctly. I've scoured the internet looking for similar symptoms but after a few days i'm tired of seeing causes about too low file-size limits or too low timeout values...
I'll have to pull out strace to debug this issue, but at this point I'm in a bit too deep over my head. Hoping you have some ideas into which direction I should investigate.

As a null hypothesis kinda thing; not unsocking nginx and using straight unix sockets works completely as expected. the cause of the symptoms is the unsock library somehow.

@kohlschuetter
Copy link
Member

Hi @Bert-Proesmans, thanks for reporting!

It looks like nginx clears all environment variables prior to launching its worker processes, see https://nginx.org/en/docs/ngx_core_module.html

Try setting "env UNSOCK_DIR=/tmp/unsockets;" or similar in nginx.conf.

I haven't tried this, though. The nginx documents mention that setting these this way may be too late (our constructor may be called before that).

Right now, we configure everything upon process initialization. Deferring that until UNSOCK_DIR etc. is available at an arbitrary time would be possible in theory but would come with a significant increase in complexity, timing problems, etc.

Please let me know if this gets you further. Also consider filing a bug against nginx.

@Bert-Proesmans
Copy link
Author

Bert-Proesmans commented Sep 26, 2024

Thanks for the swift reply!

I'll test with the additional configuration but fail to see how the whitelisted environment variables will change the effects because there is proof that the socket type is replaced correctly, at least there is data going out through VSOCK which is received by the upstream service.

Also consider filing a bug against nginx.

I was thinking about asking the nginx devs about this case too and started figuring out how to contact them!

@Bert-Proesmans
Copy link
Author

Bert-Proesmans commented Sep 27, 2024

Me again!

Whitelisting/setting environment variables for the nginx workers did not have any effect.

I traced nginx and after staring a long time at the trace it's actually obvious what is happening;

  1. Nginx creates AF_UNIX socket
  2. Nginx records socket FD into epoll (epoll is default mechanism on linux 2.6+)

https://github.com/nginx/nginx/blob/51857ce40400b48bc8900b9e3930cf7474fa0c41/src/event/modules/ngx_epoll_module.c#L611-L627

  1. Nginx calls connect -> UNSOCK takes over

The missing piece is that the new AF_VSOCK FD is not registered into epoll, because nginx did this for the old socket right after getting its file descriptor, and that explains why the upstream response is never sent to the client and nginx times out waiting.

strace excerpt
[...]
# FD 7 is client
# FD 12 / 14 -> 12 is upstream
14841 read(7, 0x55bdfc197d63, 16709)    = -1 EAGAIN (Resource temporarily unavailable)
14841 socket(AF_UNIX, SOCK_STREAM, 0)   = 12
14841 ioctl(12, FIONBIO, [1])           = 0
14841 epoll_ctl(9, EPOLL_CTL_ADD, 12, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=4229308656, u64=94274466488560}}) = 0
14841 newfstatat(AT_FDCWD, "/home/bert-proesmans/nginx-vsock/photos-upstream.vsock", {st_mode=S_IFREG|0600, st_size=452, ...}, 0) = 0
14841 openat(AT_FDCWD, "/home/bert-proesmans/nginx-vsock/photos-upstream.vsock", O_RDONLY) = 13
14841 read(13, "UNSK\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0\0\0\0\0"..., 452) = 452
14841 getsockopt(12, SOL_SOCKET, SO_DOMAIN, [1], [4]) = 0
14841 getsockopt(12, SOL_SOCKET, SO_TYPE, [1], [4]) = 0
14841 getsockopt(12, SOL_SOCKET, SO_PROTOCOL, [0], [4]) = 0
14841 getsockopt(12, SOL_SOCKET, SO_RCVTIMEO_OLD, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", [16]) = 0
14841 getsockopt(12, SOL_SOCKET, SO_SNDTIMEO_OLD, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", [16]) = 0
14841 socket(AF_VSOCK, SOCK_STREAM, 0)  = 14
14841 setsockopt(14, SOL_SOCKET, SO_RCVTIMEO_OLD, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
14841 setsockopt(14, SOL_SOCKET, SO_SNDTIMEO_OLD, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
14841 dup3(14, 12, 0)                   = 12
14841 connect(12, {sa_family=AF_VSOCK, svm_cid=0x22b, svm_port=0x22b8, svm_flags=0}, 16) = 0
14841 getsockopt(12, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
14841 writev(12, [{iov_base="GET / HTTP/1.1\r\nHost: photos.alp"..., iov_len=275}], 1) = 275
14841 write(7, "\27\3\3\0B7\213\235\326\360\251\354\252\264\262\203\220\276\325+B\372\246\314\216\354VrP\277'\266"..., 71) = 71
14841 epoll_wait(9, [{events=EPOLLIN, data={u32=4229308416, u64=94274466488320}}], 512, 60000) = 1
[...]

As for a quick fix; I'm currently looking at http://nginx.org/en/docs/events.html and trying to validate against the code if any of the other polling methods have different ordering so my approach works without code changes.
Otherwise I might just write a UNSOCK patch to check if epoll registration must be copied within fixsocketFD(..).

EDIT; Yeah, quick solution is to enable the select or poll event systems. See http://nginx.org/en/docs/events.html for more information because these are not included into the build if epoll is available.
I might still take a stab at creating a patch for UNSOCK though basically; catch the creation of the epoll file descriptor by symbol hooking, during fixfd query the epoll structure for the old fd, after dedup send a modify to epoll with the queried structure from before. if the mechanism does diff-checking, overwrite the file-descriptor value inside the queried structure before sending it through modify.

@kohlschuetter
Copy link
Member

Nice find!

In hindsight, my suggestion was wrong anyways: a missing environment variable UNSOCK_DIR would have caused the child process to exit with an error...

epoll is an interesting case. So they poll on the old (and closed) file description, which — after our dup3 — is no longer associated with the old file descriptor.

In order to support that, unsock would have to add a wrapper for epoll, and keep track of these changes. That's not trivial, but doable.

Tragically, the kernel lacks support for querying the state of a file descriptor added to an epoll fd ("EPOLL_CTL_GET"), although this was already proposed a long time ago. [1][2] (and even then, it would be great to get all epoll fds associated with that fd, which is yet another problem).

So right now, if you wanted to support this in unsock, you would not only have to keep track of the open epoll fds, but also any registered/modified/deregistered file descriptors, and correctly re-register the new fd with all these epollfds after dup3.

[1] https://lists.linuxfoundation.org/pipermail/bugme-new/2004-August/010979.html
[2] mjt's comment in https://linux-kernel.vger.kernel.narkive.com/jjpFDND3/epoll-ctl-and-const-correctness

@kohlschuetter
Copy link
Member

You could also try adding a wrapper around epoll_ctl that always returns -1 with errno=EPERM, that may cause nginx to fall back to regular poll/select (I'd have to check the source code)

@kohlschuetter
Copy link
Member

Also see https://nginx.org/en/docs/events.html

You can try setting the following nginx config directive:
use poll;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants