-
-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trigger GC after a set number of messages #3031
Conversation
Rebased onto current master. |
@ponylang/core I'm unlikely to be able to join sync anytime soon so let me know if there are any questions or concerns about this via comments on this PR. |
My first worry is that this change will produce a GC trace of an actor that has not done any allocation at all, and that this will continue to happen as long as messages come in. My second worry is that a GC on block may exasperate an already existing performance problem: reporting to the cycle detector is slow, and doing a GC trace along with that may be extra slow. One possible approach (off the top of my head) would be to keep a per-actor GC threshold that begins at the global default. As messages come in and GC does not occur, this per-actor threshold could be lowered, but never below some minimum (for example, 1.2). At that point, the block message wouldn't need to influence GC. The per-actor threshold would return to the global default after a GC trace. This would ensure an actor that had done no allocation would never do a GC trace. Thoughts? |
@sylvanc Thanks for the feedback. I thought I had ensured that GC tracing wouldn't occur unless there was heap used ( Re: GC after sending a Re: your proposed alternative: I don't believe it alleviates the issue with actors that block needing to be GC'd as your proposal would only trigger GC if an actor is receiving messages (similar to the
The goal of the I don't believe either of the changes in this PR has a negative performance impact (once I fix the oversight regarding GC trace of actors with no heap allocations). However, it is possible that I am missing something in regards to the performance impact of these changes. If so, I would appreciate any pointers to the specifics so we can determine if they can be worked around or not. |
Rebased onto current master and added a commit to ensure actors with |
@dipinhora this needs a rebase. |
@SeanTAllen rebased |
This is failing due to the SSL tests. #3179 fixes them. I'll rebase again once that PR is merged. |
Rebased to pick up SSL fixes. |
rebased to catch up with all the changes to master. |
@dipinhora i keep looking at this and it seems like it basically replaces the existing gc strategy entirely. whereas one would expect now for the amount of memory needed to be allocated to increase as an actor gc's so to avoid doing it often, this puts it at a fixed amount. if an actor isn't allocating more memory, it will continue to gc based on number of messages. this seems flawed to me. i get the idea of what you are doing with this and I think its an interesting approach but I think we need something that takes a bit of the idea you have here and the current approach. I don't have a good algo that I think we can come up with one. I'm going to spend more time thinking about what you've written and looking at the code and am going to come back with thoughts, questions, or ideas. I apologize for the delay. I think you are trying to address a very important problem and I'm stepping lightly and slowly here. |
@SeanTAllen you said (emphasis added):
This PR was not intended to replace the gc strategy entirely but to be a middle solution (although it's possible that I misunderstood the true impact of the changes). The goal was to only force an actor to GC if it had processed 10000 (actor default batch size of 100 * 100) messages without GC'ing due to the normal/existing GC mechanism based on heap usage so that it can free any memory it is holding on to unnecessarily that it may not have released yet because it has been a long time since it reached the normal GC heap threshold. The actor's heap is allowed to grow via current GC process prior to it getting to a point where if the GC hasn't run for a long time because it's memory allocation has grown enough that it will not get triggered for longer than 10000 messages processed then GC will be forced due to the actor having processed 10000 messages. Note, this does limit an actor's memory usage, but it is limited as factor of how much memory an actor uses per 10000 messages. There will be some actors that will only use It seems to me that the concern is that the Btw, no rush on this. I did the rebase mainly to ensure the code didn't bitrot and start running into issues. Take your time and let me know your thoughts on how to solve the competing concerns of letting actors use more and more memory over time to ensure they don't GC too frequently vs ensuring actors don't hold onto memory forever. In the meantime, should I break off the |
@dipinhora i think breaking them apart seems reasonable. |
@SeanTAllen Broken apart. This PR has been rebased on master and the "GC after blocking" changes reverted. The "GC after blocking" change can be found in #3278 |
@dipinhora when you have time, i'd love to chat about how we can test this to determine the performance impact be it positive or negative. |
Re: messages processed part of this PR @dipinhora. I think you are on to something, but I think it might be a little too course grained. Question: We bump the heap used by some EQUIV when receiving items from other actors, so shouldn't that eventually do more or less what the "after X number of messages" part of this does. Unless those messages are only primitives rather than objects then I believe we don't do a heap equiv bump. I can see because equiv is "smallish" that is next-gc is sufficiently large that it would take a large number of messages to result in freeing. My thinking currently is, rather than us trying to do something clever with gc related to number of messages etc that perhaps we should be considering exposing gc via the standard library in a way that allows programmer to force a gc. Or even better, allow for actors to have a default gc strategy but be able to change it programatically. |
Prior to this commit, GC would only be triggered if an actor's heap grew to reach the next heap size cutoff for GC. This would potentially result in some actors not getting GC'd (and so holding on to memory for longer than necessary) because they happen to not allocate large amounts of memory even when processing lots of application messages and take a very long time to reach the next heap size for GC to occur. This commit adds an alternate way that GC for an actor can be triggered in order to force actors that don't allocate large amounts of memory to GC and free up memory more frequently. This is done by keeping track of the number of application messages processed since the last GC and forcing a GC if the number of messages handled passes a threshold (10x actor batch size).
Prior to this commit, if an actor blocked, it did not run GC to free any memory it no longer needed. This would result in blocked actors holding on to (potentially lots of) memory unnecessarily. This commit causes GC to be triggered when the cycle detector asks an actor if it is blocked and the actor responds telling the cycle detector that it is blocked. This should result in memory being held by blocked actors to be freed more quickly even if the cycle detector doesn't end up detecting a cycle and reaping the actors.
That doesn't do same as what the Lines 97 to 101 in 2c4f404
Lines 349 to 359 in 2c4f404
GC after X messages would.
|
I'm still not in favor of this and I think it adds unnecessary gc allocations. I understand the goal, but it can result in unneeded garbage collection. We have a gc strategy, I would like to work within that. An idea, @dipinhora, do you think that if on a per actor basis, you could set the gcinitial for that actor and gcfactor, it would accomplish what you are looking for in a way that is in the programmer's control? With such a mechanism you could, if you wanted to set a initial value of X with a factor of 1 and always GC at that level of memory but never only for a given actor. Additionally, I can already GC anytime (using GC after X messages or when blocked isn't something I would want on in just about any application that I write. |
Fair enough. Closing. Might revisit via one of the alternatives suggested if it makes sense. |
This PR includes two changes related to when actor GC gets triggered:
andTrigger GC for actors when they tell the cycle detector they're blockedPrior to this commit, if an actor blocked, it did not run GC to freeany memory it no longer needed. This would result in blocked actors
holding on to (potentially lots of) memory unnecessarily.
This commit causes GC to be triggered when the cycle detector asks(broken out into #3278)an actor if it is blocked and the actor responds telling the cycle
detector that it is blocked. This should result in memory being
held by blocked actors to be freed more quickly even if the cycle
detector doesn't end up detecting a cycle and reaping the actors.