Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak + seat reservation expired error #32

Open
CookedApps opened this issue Jul 21, 2022 · 13 comments
Open

Memory leak + seat reservation expired error #32

CookedApps opened this issue Jul 21, 2022 · 13 comments

Comments

@CookedApps
Copy link

CookedApps commented Jul 21, 2022

Yesterday, we switched our live system to our new Kubernetes setup, utilizing the Colyseus Proxy together with MongoDB and Redis for load balancing. We had a public beta over the last month with about 800 players a day and everything worked fine. But after about 20k players played for a day we were seeing seat reservation expired more and more often up to a point where nobody was able to join or create any lobby.

What we found:

Examining the resource consumption of the Colyseus Proxy over the last 24 hours suggests a memory leak:
Colyseus Proxy resource consumption

Our logs repeatedly show these errors:

Using proxy 1 /NLK_aUr7s/HVCFC?sessionId=eGUwvAl7F
Error: seat reservation expired.
  at uWebSocketsTransport.onConnection (/usr/app/node_modules/@colyseus/uwebsockets-transport/build/uWebSocketsTransport.js:118:23)
  at open (/usr/app/node_modules/@colyseus/uwebsockets-transport/build/uWebSocketsTransport.js:59:28)
  at uWS.HttpResponse.upgrade (<anonymous>)
  at upgrade (/usr/app/node_modules/@colyseus/uwebsockets-transport/build/uWebSocketsTransport.js:47:21)
2022-07-21T06:01:15.208Z colyseus:errors Error: seat reservation expired.
  at uWebSocketsTransport.onConnection (/usr/app/node_modules/@colyseus/uwebsockets-transport/build/uWebSocketsTransport.js:118:23)
  at open (/usr/app/node_modules/@colyseus/uwebsockets-transport/build/uWebSocketsTransport.js:59:28)
  at uWS.HttpResponse.upgrade (<anonymous>)
  at upgrade (/usr/app/node_modules/@colyseus/uwebsockets-transport/build/uWebSocketsTransport.js:47:21)

Restarting the proxies fixes the problem temporarily.

Setup:

  • Colyseus version: 0.14.29
  • Colyseus proxy version: 0.12.8
  • Node version: 16.15.1-alpine

Edit: We were running 2 proxies behind a load balancer and 5 gameserver instances. This might be related to #30.

We really need help with this issue, as I am at my wit's end.
Thank you in advance! 🙏

@damnedOperator
Copy link

+1 Am also out of things to try

@endel
Copy link
Member

endel commented Jul 21, 2022

The memory leak is a known issue unfortunately (OptimalBits/redbird#237), although circumstances are not clear for when it happens. I suspect it's related to TLS termination at the Node/proxy level. In Arena this problem doesn't exist I believe because TLS termination happens at another level (haproxy or nginx)

The upcoming version (0.15, currently in @preview) is introducing an alternative to the proxy, by using a regular load balancer behind all Colyseus nodes, and specifying a public address for each node, you can see the preview (from colyseus/docs#90) here: https://deploy-preview-90--colyseus-docs.netlify.app/colyseus/scalability/#alternative-2-without-the-proxy

If your cluster is at an inconsistent state, I'd recommend checking for the roomcount and colyseus:nodes contents on Redis, they should contain the same amount of entries as you have in Node processes.

@damnedOperator
Copy link

Well we terminate Https at the Ingress Controller, which would situate it pretty near to how it's deployed in Arena (Termination is done by a load balancer). So I doubt it has to do with tls termination :/

@endel
Copy link
Member

endel commented Jul 21, 2022

Apparently a user of http-proxy managed to reproduce the memory leak consistently here http-party/node-http-proxy#1586

EDIT: not sure it's the same leak we have, sounds reasonable though

@damnedOperator
Copy link

So http-proxy is a dependency of the coly-proxy?

@endel
Copy link
Member

endel commented Jul 21, 2022

Yes, it is!

@damnedOperator
Copy link

Sounds like we cannot do anything atm to mitigate this?

@CookedApps
Copy link
Author

If your cluster is at an inconsistent state, I'd recommend checking for the roomcount and colyseus:nodes contents on Redis, they should contain the same amount of entries as you have in Node processes.

I am not sure if I understand this. What do you mean with "inconsistent state"? @endel

@nzmax
Copy link

nzmax commented Jul 26, 2022

We also met this issue several times, so the only solution is to use the 0.15 Preview that @endel provides? Is there any other solution?

@nzmax
Copy link

nzmax commented Aug 2, 2022

ANY UPDATES?

@CookedApps
Copy link
Author

@nzmax It seems to me that the proxy will no longer be fixed. Endel has not commented on this. Looks like we'll have to work with the new architecture in version 0.15. We don't know how this is supposed to work in a Kubernetes environment and are still waiting for news...

@endel
Copy link
Member

endel commented Aug 3, 2022

We do are interested in fixing this issue. We are still trying to reproduce the memory leak issue in a controlled environment. There are 2 things you can do to help:

@hunkydoryrepair
Copy link

We were seeing consistent memory leaks, gradually growing over time.
we have replaced the version of http-proxy we use with @refactorjs/http-proxy. It was almost a drop in replacement, but seems to export slightly differently, so I had to change the imports, but got it to work in just a couple minutes.

So far, it seems promising. I will update in a week or so if it resolves the issue. It tends to take about a week before our proxies crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants