Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger GC after a set number of messages #3031

Closed
wants to merge 5 commits into from

Conversation

dipinhora
Copy link
Contributor

@dipinhora dipinhora commented Feb 27, 2019

This PR includes two changes related to when actor GC gets triggered:

Trigger GC for actors after they process a number of app msgs

Prior to this commit, GC would only be triggered if an actor's heap
grew to reach the next heap size cutoff for GC. This would
potentially result in some actors not getting GC'd (and so holding
on to memory for longer than necessary) because they happen to not
allocate large amounts of memory even when processing lots of
application messages and take a very long time to reach the next
heap size for GC to occur.

This commit adds an alternate way that GC for an actor can be
triggered in order to force actors that don't allocate large
amounts of memory to GC and free up memory more frequently. This
is done by keeping track of the number of application messages
processed since the last GC and forcing a GC if the number of
messages handled passes a threshold (10x actor batch size).

and

Trigger GC for actors when they tell the cycle detector they're blocked

Prior to this commit, if an actor blocked, it did not run GC to free
any memory it no longer needed. This would result in blocked actors
holding on to (potentially lots of) memory unnecessarily.

This commit causes GC to be triggered when the cycle detector asks
an actor if it is blocked and the actor responds telling the cycle
detector that it is blocked. This should result in memory being
held by blocked actors to be freed more quickly even if the cycle
detector doesn't end up detecting a cycle and reaping the actors.
(broken out into #3278)

@dipinhora
Copy link
Contributor Author

Rebased onto current master.

@dipinhora
Copy link
Contributor Author

@ponylang/core I'm unlikely to be able to join sync anytime soon so let me know if there are any questions or concerns about this via comments on this PR.

@SeanTAllen SeanTAllen added the do not merge This PR should not be merged at this time label Apr 2, 2019
@sylvanc
Copy link
Contributor

sylvanc commented May 14, 2019

My first worry is that this change will produce a GC trace of an actor that has not done any allocation at all, and that this will continue to happen as long as messages come in. My second worry is that a GC on block may exasperate an already existing performance problem: reporting to the cycle detector is slow, and doing a GC trace along with that may be extra slow.

One possible approach (off the top of my head) would be to keep a per-actor GC threshold that begins at the global default. As messages come in and GC does not occur, this per-actor threshold could be lowered, but never below some minimum (for example, 1.2). At that point, the block message wouldn't need to influence GC. The per-actor threshold would return to the global default after a GC trace. This would ensure an actor that had done no allocation would never do a GC trace.

Thoughts?

@dipinhora
Copy link
Contributor Author

@sylvanc Thanks for the feedback. I thought I had ensured that GC tracing wouldn't occur unless there was heap used (actor->heap->used > 0) but it seems I only handled that for when GC is triggered due to sending a block message and not for the trigger GC after N messages. I'll fix that shortly.

Re: GC after sending a block message to the cycle detector: can you elaborate on the performance concerns with sending a block message to the cycle detector? From what I can tell, sending a block message to the cycle detector (via ponyint_cycle_block) is one small allocation + setting an actor flag (FLAG_BLOCKED_SENT), and doesn't seem very expensive. The volume of cycle detection related messages used to be a major performance concern related to cycle detection but my understanding is that #2709 greatly reduced this to the point of it being almost negligible (by proxy of # of messages for message-ubench increasing by about 5M and being almost the same with and without --ponynoblock; we don't have a direct measure of cycle detection related messages before/after the PR but I can measure it if it would be helpful). Another potential concern regarding performance impact of running GC when an actor sends a block message is if actors are sending block messages frequently and so GC'ing frequently, however, this also does not occur very often since the merge of #2709 as that PR ensures actors only send block messages when the cycle detectors asks via an isblocked message as opposed to every time they block. One last concern I can think of, would be that the overhead of GC'ing when an actor sends a block would be taking resources away from other work that could be done by an application (other actors running, etc). I don't believe this is easy to quantify or evaluate whether it is better to GC or let another actor run (please correct me if I'm misunderstanding), however, it is easy to quantify that if a blocked actor releases the memory is it holding on to that is a net win because the freed memory can then be re-used by other actors.

Re: your proposed alternative: I don't believe it alleviates the issue with actors that block needing to be GC'd as your proposal would only trigger GC if an actor is receiving messages (similar to the trigger GC for actors after they process a number of app msgs change in this PR) whereas blocked actors would not be receiving messages at all (potentially). To give a contrived example (that may not be 100% accurate in every detail of how the runtime currently behaves but should be reasonably close):

The Main creates actor Searcher to read data from a file that is 1 GB in size and do some processing on it and then return a subset of the data that matches (a regex search for example) and Searcher is a naive implementation that loads all the data into memory prior to doing the search and returning only the subset needed to be returned due to matches (let's assume it does a copy of the subset to ensure it's not holding a reference that would prevent GC from freeing the memory). Due to how the implementation of Searcher works, the logic would result in GC getting run but unable to reclaim any data while the search is proceeding but would result in actor->heap->nextgc increasing to 1GB * heap_nextgc_factor. When the Searcher is done with all of it's work, it would send the data back to Main and stop being scheduled (because Main would never send it any more messages) and would not GC because it has not hit the new nextgc threshold yet. The cycle detector would eventually send Searcher an isblocked message (causing Searcher to get scheduled) to which Searcher would respond with a block message and Searcher would not GC because it still has not hit the new nextgc threshold. Ideally, Searcher would free the memory trapped in its heap even if the cycle detector doesn't destroy it.

The goal of the sending a block message should result in GC being run included in this PR is for exactly this scenario.

I don't believe either of the changes in this PR has a negative performance impact (once I fix the oversight regarding GC trace of actors with no heap allocations). However, it is possible that I am missing something in regards to the performance impact of these changes. If so, I would appreciate any pointers to the specifics so we can determine if they can be worked around or not.

@dipinhora
Copy link
Contributor Author

Rebased onto current master and added a commit to ensure actors with heap.used == 0 don't do a GC trace even if they've processed enough messages to warrant a GC run.

@SeanTAllen
Copy link
Member

@dipinhora this needs a rebase.

@dipinhora
Copy link
Contributor Author

@SeanTAllen rebased

@dipinhora
Copy link
Contributor Author

This is failing due to the SSL tests. #3179 fixes them. I'll rebase again once that PR is merged.

@dipinhora
Copy link
Contributor Author

Rebased to pick up SSL fixes.

@dipinhora
Copy link
Contributor Author

rebased to catch up with all the changes to master.

@SeanTAllen
Copy link
Member

@dipinhora i keep looking at this and it seems like it basically replaces the existing gc strategy entirely. whereas one would expect now for the amount of memory needed to be allocated to increase as an actor gc's so to avoid doing it often, this puts it at a fixed amount.

if an actor isn't allocating more memory, it will continue to gc based on number of messages. this seems flawed to me.

i get the idea of what you are doing with this and I think its an interesting approach but I think we need something that takes a bit of the idea you have here and the current approach.

I don't have a good algo that I think we can come up with one.

I'm going to spend more time thinking about what you've written and looking at the code and am going to come back with thoughts, questions, or ideas.

I apologize for the delay. I think you are trying to address a very important problem and I'm stepping lightly and slowly here.

@dipinhora
Copy link
Contributor Author

@SeanTAllen you said (emphasis added):

i keep looking at this and it seems like it basically replaces the existing gc strategy entirely. whereas one would expect now for the amount of memory needed to be allocated to increase as an actor gc's so to avoid doing it often, this puts it at a fixed amount.

This PR was not intended to replace the gc strategy entirely but to be a middle solution (although it's possible that I misunderstood the true impact of the changes). The goal was to only force an actor to GC if it had processed 10000 (actor default batch size of 100 * 100) messages without GC'ing due to the normal/existing GC mechanism based on heap usage so that it can free any memory it is holding on to unnecessarily that it may not have released yet because it has been a long time since it reached the normal GC heap threshold. The actor's heap is allowed to grow via current GC process prior to it getting to a point where if the GC hasn't run for a long time because it's memory allocation has grown enough that it will not get triggered for longer than 10000 messages processed then GC will be forced due to the actor having processed 10000 messages. Note, this does limit an actor's memory usage, but it is limited as factor of how much memory an actor uses per 10000 messages. There will be some actors that will only use 10000 bytes (and be below the --ponygcinitial threshold) and be forced to GC and there will be other actors that will use 100 MB or 10 GB per 10000 messages before they are forced to GC because of # of messages. I don't know that the 10000 number is the right heuristic but it seemed like a place to start. The main point was to ensure that just because an actor needs to use100 MB or 10 GB before it would hit the next GC threshold, shouldn't mean that it should be allowed to hold on to that memory forever if it never hits that next threshold.

It seems to me that the concern is that the 10000 is a static number and caps memory for an actor at some point. Maybe an option would be to make the 10000 into a number that grows slowly over time so that actor's memory usage can increase over time but after a certain point (the 10000 messages between heap usage trigger GC) the memory growth goes from a factor of 2.0x based on memory a factor based on # of messages which grows by something smaller like 1.2x or 1.5x?

Btw, no rush on this. I did the rebase mainly to ensure the code didn't bitrot and start running into issues. Take your time and let me know your thoughts on how to solve the competing concerns of letting actors use more and more memory over time to ensure they don't GC too frequently vs ensuring actors don't hold onto memory forever.

In the meantime, should I break off the GC on block into a separate PR if that change is less concerning/controversial?

@SeanTAllen
Copy link
Member

@dipinhora i think breaking them apart seems reasonable.

@dipinhora
Copy link
Contributor Author

@SeanTAllen Broken apart. This PR has been rebased on master and the "GC after blocking" changes reverted.

The "GC after blocking" change can be found in #3278

@SeanTAllen
Copy link
Member

@dipinhora when you have time, i'd love to chat about how we can test this to determine the performance impact be it positive or negative.

@SeanTAllen
Copy link
Member

Re: messages processed part of this PR @dipinhora. I think you are on to something, but I think it might be a little too course grained.

Question:

We bump the heap used by some EQUIV when receiving items from other actors, so shouldn't that eventually do more or less what the "after X number of messages" part of this does. Unless those messages are only primitives rather than objects then I believe we don't do a heap equiv bump.
That's my quickie read of things in gc.c where the calls to ponyint_heap_used happen.

I can see because equiv is "smallish" that is next-gc is sufficiently large that it would take a large number of messages to result in freeing.

My thinking currently is, rather than us trying to do something clever with gc related to number of messages etc that perhaps we should be considering exposing gc via the standard library in a way that allows programmer to force a gc. Or even better, allow for actors to have a default gc strategy but be able to change it programatically.

Base automatically changed from master to main February 8, 2021 23:02
Prior to this commit, GC would only be triggered if an actor's heap
grew to reach the next heap size cutoff for GC. This would
potentially result in some actors not getting GC'd (and so holding
on to memory for longer than necessary) because they happen to not
allocate large amounts of memory even when processing lots of
application messages and take a very long time to reach the next
heap size for GC to occur.

This commit adds an alternate way that GC for an actor can be
triggered in order to force actors that don't allocate large
amounts of memory to GC and free up memory more frequently. This
is done by keeping track of the number of application messages
processed since the last GC and forcing a GC if the number of
messages handled passes a threshold (10x actor batch size).
Prior to this commit, if an actor blocked, it did not run GC to free
any memory it no longer needed. This would result in blocked actors
holding on to (potentially lots of) memory unnecessarily.

This commit causes GC to be triggered when the cycle detector asks
an actor if it is blocked and the actor responds telling the cycle
detector that it is blocked. This should result in memory being
held by blocked actors to be freed more quickly even if the cycle
detector doesn't end up detecting a cycle and reaping the actors.
@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Jul 11, 2022
@dipinhora
Copy link
Contributor Author

We bump the heap used by some EQUIV when receiving items from other actors, so shouldn't that eventually do more or less what the "after X number of messages" part of this does.

That doesn't do same as what the after X number of messages of this PR does. We only bump by some EQUIV for an actor the first time we receive any item from that actor (see:

if(aref->rc == 0)
{
// Increase apparent used memory to provoke GC.
ponyint_heap_used(ponyint_actor_heap(ctx->current), GC_ACTOR_HEAP_EQUIV);
}
) and we bump some EQUIV for an object the first time we receive the item only if it is immutable (see:

ponyc/src/libponyrt/gc/gc.c

Lines 349 to 359 in 2c4f404

if(obj->rc == 0)
{
// Increase apparent used memory to provoke GC.
ponyint_heap_used(ponyint_actor_heap(ctx->current),
ponyint_heap_size(chunk));
// Increase apparent used memory further if the object is immutable, to
// account for memory that is reachable but not traced by this actor.
if(mutability == PONY_TRACE_IMMUTABLE)
ponyint_heap_used(ponyint_actor_heap(ctx->current), GC_IMMUT_HEAP_EQUIV);
}
). As a result, if we're consistently receiving and processing messages from the same actor the EQUIV logic wouldn't trigger a GC while the GC after X messages would.

@SeanTAllen
Copy link
Member

I'm still not in favor of this and I think it adds unnecessary gc allocations. I understand the goal, but it can result in unneeded garbage collection. We have a gc strategy, I would like to work within that.

An idea, @dipinhora, do you think that if on a per actor basis, you could set the gcinitial for that actor and gcfactor, it would accomplish what you are looking for in a way that is in the programmer's control? With such a mechanism you could, if you wanted to set a initial value of X with a factor of 1 and always GC at that level of memory but never only for a given actor.

Additionally, I can already GC anytime (using pony_triggergc) I want but I can't do it based on number of messages (unless I track application messages myself).

GC after X messages or when blocked isn't something I would want on in just about any application that I write.

@dipinhora dipinhora changed the title Trigger GC after a set number of messages and when blocked Trigger GC after a set number of messages Jul 12, 2022
@dipinhora
Copy link
Contributor Author

Fair enough. Closing. Might revisit via one of the alternatives suggested if it makes sense.

@dipinhora dipinhora closed this Jul 12, 2022
@ponylang-main ponylang-main added discuss during sync Should be discussed during an upcoming sync and removed discuss during sync Should be discussed during an upcoming sync labels Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss during sync Should be discussed during an upcoming sync do not merge This PR should not be merged at this time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants