Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mod_tile appears to require a single-process (but can handle multiple threads) Apache server to work correctly #238

Open
alankila opened this issue Jun 4, 2021 · 3 comments

Comments

@alankila
Copy link

alankila commented Jun 4, 2021

I was struggling for some days with the problem that under heavy load, mod_tile seemed to not return a tile promptly to waiting http client connection. The tile was in fact rendered, but the http connection appeared to be stuck waiting for acknowledgement from render, I think.

I read the code a bit and came across this comment in mod_tile/includes/protocol.h:

A client may not bother waiting for a response if the render daemon is too slow
causing responses to get slightly out of step with requests.

This gave me cause to investigate the possibility that in a multiprocessing apache, multiple independent http server processes have reached renderd and are writing commands to render tiles and read for response. My hypothesis is that there is a single socket shared by all connected renderd clients, and in that case, if the server that did not submit the request to render a tile reads the response, then it simply throws it away as it can't notify another process's thread, and the Apache process whose thread did submit that request gets stuck waiting for it until something timeouts.

I have no conclusively proven that this is indeed what is happening, but I got rid of my slowly rendered tiles by enabling mod_event and setting it up in such a way that it starts only 1 process, and ThreadLimit, MaxRequests, etc. were all set to 150 so that apache will not fork more than 1 worker process. I presume that as long as there is only 1 apache process, when it serves multiple threads, it is able to discover which thread is waiting for the tile rendering acknowledgement.

It is also worth noting that my mod_tile configuration is as follows:

ModTileTileDir /var/cache/renderd/tiles
LoadTileConfigFile /etc/renderd.conf
ModTileEnableStats On
ModTileRequestTImeout 60
ModTileMissingRequestTimeout 60
ModTileRenderdSocketName /run/renderd/renderd.sock

i.e. I have set a long RequestTimeout and MissingRequestTimeout. My clients are willing to wait for as long as it takes for the tile to come, instead of a shorter duration. It may be that the shorter 3 and 10 second timeout values somewhat mask the problem. I think that request does return with the tile's data if the tile gets rendered by renderd before the timeout, it just takes unnecessarily long time to get the tile. In my case, the timeout was excessively long and thus I was motivated to try to figure out why it kept happening.

@alankila
Copy link
Author

alankila commented Jun 4, 2021

Also happy to report that this setup repairs the oddly low cpu usage I have seen with renderd. If I set it to use 8 threads and scroll around in unseen map region, then I get 800 % CPU usage of the renderd process.

@pantierra
Copy link
Collaborator

It would be nice to document your findings.

@stephankn
Copy link
Contributor

@alankila Do I get it right that the render socket misses a session handling? I would like to understand your problem analysis better.

here the server opens a unix domain socket:

fd = socket(PF_UNIX, SOCK_STREAM, 0);

PF_UNIX, SOCK_STREAM. To my understanding, the following accept calls will lead to a unique combination of file descriptors managed by the kernel.

incoming = accept(listen_fd, (struct sockaddr *) &in_addr, &in_addrlen);

The man-page confirms that with SOCK_STREAM you have a connection. This means well known communication partners.

https://man7.org/linux/man-pages/man2/socket.2.html

   SOCK_STREAM
          Provides sequenced, reliable, two-way, connection-based
          byte streams.

So at least on socket/connection level the rendering daemon should know well which response belongs to which request.

the loop calls rx_request to process a command:

enum protoCmd rsp = rx_request(&cmd, fd);

This one knows about connection details in a data structure item

struct item {

So I do not yet understand where the connection detail should get lost. The client sending out a request should get exactly their answer back, as the connection is maintained in the item structure.

Can you provide a maybe synthetic reproduction case which shows the problem? You mentioned that the "missing" requests had actually completed. So we can exclude error cases? What environment are you running? Do you have AppArmor/SELinux rules in place?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants