Trigger GC after a set number of messages #3031

dipinhora · 2019-02-27T05:09:02Z

This PR includes two changes related to when actor GC gets triggered:

Trigger GC for actors after they process a number of app msgs

Prior to this commit, GC would only be triggered if an actor's heap
grew to reach the next heap size cutoff for GC. This would
potentially result in some actors not getting GC'd (and so holding
on to memory for longer than necessary) because they happen to not
allocate large amounts of memory even when processing lots of
application messages and take a very long time to reach the next
heap size for GC to occur.

This commit adds an alternate way that GC for an actor can be
triggered in order to force actors that don't allocate large
amounts of memory to GC and free up memory more frequently. This
is done by keeping track of the number of application messages
processed since the last GC and forcing a GC if the number of
messages handled passes a threshold (10x actor batch size).

~~and~~

~~Trigger GC for actors when they tell the cycle detector they're blocked~~

Prior to this commit, if an actor blocked, it did not run GC to free
any memory it no longer needed. This would result in blocked actors
holding on to (potentially lots of) memory unnecessarily.

This commit causes GC to be triggered when the cycle detector asks
an actor if it is blocked and the actor responds telling the cycle
detector that it is blocked. This should result in memory being
held by blocked actors to be freed more quickly even if the cycle
detector doesn't end up detecting a cycle and reaping the actors. (broken out into #3278)

dipinhora · 2019-03-17T16:45:59Z

Rebased onto current master.

dipinhora · 2019-03-29T02:12:58Z

@ponylang/core I'm unlikely to be able to join sync anytime soon so let me know if there are any questions or concerns about this via comments on this PR.

sylvanc · 2019-05-14T16:10:03Z

My first worry is that this change will produce a GC trace of an actor that has not done any allocation at all, and that this will continue to happen as long as messages come in. My second worry is that a GC on block may exasperate an already existing performance problem: reporting to the cycle detector is slow, and doing a GC trace along with that may be extra slow.

One possible approach (off the top of my head) would be to keep a per-actor GC threshold that begins at the global default. As messages come in and GC does not occur, this per-actor threshold could be lowered, but never below some minimum (for example, 1.2). At that point, the block message wouldn't need to influence GC. The per-actor threshold would return to the global default after a GC trace. This would ensure an actor that had done no allocation would never do a GC trace.

Thoughts?

dipinhora · 2019-05-14T22:02:40Z

@sylvanc Thanks for the feedback. I thought I had ensured that GC tracing wouldn't occur unless there was heap used (actor->heap->used > 0) but it seems I only handled that for when GC is triggered due to sending a block message and not for the trigger GC after N messages. I'll fix that shortly.

Re: GC after sending a block message to the cycle detector: can you elaborate on the performance concerns with sending a block message to the cycle detector? From what I can tell, sending a block message to the cycle detector (via ponyint_cycle_block) is one small allocation + setting an actor flag (FLAG_BLOCKED_SENT), and doesn't seem very expensive. The volume of cycle detection related messages used to be a major performance concern related to cycle detection but my understanding is that #2709 greatly reduced this to the point of it being almost negligible (by proxy of # of messages for message-ubench increasing by about 5M and being almost the same with and without --ponynoblock; we don't have a direct measure of cycle detection related messages before/after the PR but I can measure it if it would be helpful). Another potential concern regarding performance impact of running GC when an actor sends a block message is if actors are sending block messages frequently and so GC'ing frequently, however, this also does not occur very often since the merge of #2709 as that PR ensures actors only send block messages when the cycle detectors asks via an isblocked message as opposed to every time they block. One last concern I can think of, would be that the overhead of GC'ing when an actor sends a block would be taking resources away from other work that could be done by an application (other actors running, etc). I don't believe this is easy to quantify or evaluate whether it is better to GC or let another actor run (please correct me if I'm misunderstanding), however, it is easy to quantify that if a blocked actor releases the memory is it holding on to that is a net win because the freed memory can then be re-used by other actors.

Re: your proposed alternative: I don't believe it alleviates the issue with actors that block needing to be GC'd as your proposal would only trigger GC if an actor is receiving messages (similar to the trigger GC for actors after they process a number of app msgs change in this PR) whereas blocked actors would not be receiving messages at all (potentially). To give a contrived example (that may not be 100% accurate in every detail of how the runtime currently behaves but should be reasonably close):

The Main creates actor Searcher to read data from a file that is 1 GB in size and do some processing on it and then return a subset of the data that matches (a regex search for example) and Searcher is a naive implementation that loads all the data into memory prior to doing the search and returning only the subset needed to be returned due to matches (let's assume it does a copy of the subset to ensure it's not holding a reference that would prevent GC from freeing the memory). Due to how the implementation of Searcher works, the logic would result in GC getting run but unable to reclaim any data while the search is proceeding but would result in actor->heap->nextgc increasing to 1GB * heap_nextgc_factor. When the Searcher is done with all of it's work, it would send the data back to Main and stop being scheduled (because Main would never send it any more messages) and would not GC because it has not hit the new nextgc threshold yet. The cycle detector would eventually send Searcher an isblocked message (causing Searcher to get scheduled) to which Searcher would respond with a block message and Searcher would not GC because it still has not hit the new nextgc threshold. Ideally, Searcher would free the memory trapped in its heap even if the cycle detector doesn't destroy it.

The goal of the sending a block message should result in GC being run included in this PR is for exactly this scenario.

I don't believe either of the changes in this PR has a negative performance impact (once I fix the oversight regarding GC trace of actors with no heap allocations). However, it is possible that I am missing something in regards to the performance impact of these changes. If so, I would appreciate any pointers to the specifics so we can determine if they can be worked around or not.

dipinhora · 2019-05-15T02:36:32Z

Rebased onto current master and added a commit to ensure actors with heap.used == 0 don't do a GC trace even if they've processed enough messages to warrant a GC run.

SeanTAllen · 2019-06-11T00:44:12Z

@dipinhora this needs a rebase.

dipinhora · 2019-06-11T01:08:55Z

@SeanTAllen rebased

dipinhora · 2019-06-11T01:52:39Z

This is failing due to the SSL tests. #3179 fixes them. I'll rebase again once that PR is merged.

dipinhora · 2019-06-13T00:11:29Z

Rebased to pick up SSL fixes.

dipinhora · 2019-07-31T01:36:46Z

rebased to catch up with all the changes to master.

SeanTAllen · 2019-07-31T01:42:19Z

@dipinhora i keep looking at this and it seems like it basically replaces the existing gc strategy entirely. whereas one would expect now for the amount of memory needed to be allocated to increase as an actor gc's so to avoid doing it often, this puts it at a fixed amount.

if an actor isn't allocating more memory, it will continue to gc based on number of messages. this seems flawed to me.

i get the idea of what you are doing with this and I think its an interesting approach but I think we need something that takes a bit of the idea you have here and the current approach.

I don't have a good algo that I think we can come up with one.

I'm going to spend more time thinking about what you've written and looking at the code and am going to come back with thoughts, questions, or ideas.

I apologize for the delay. I think you are trying to address a very important problem and I'm stepping lightly and slowly here.

dipinhora · 2019-07-31T03:30:57Z

@SeanTAllen you said (emphasis added):

i keep looking at this and it seems like it basically replaces the existing gc strategy entirely. whereas one would expect now for the amount of memory needed to be allocated to increase as an actor gc's so to avoid doing it often, this puts it at a fixed amount.

This PR was not intended to replace the gc strategy entirely but to be a middle solution (although it's possible that I misunderstood the true impact of the changes). The goal was to only force an actor to GC if it had processed 10000 (actor default batch size of 100 * 100) messages without GC'ing due to the normal/existing GC mechanism based on heap usage so that it can free any memory it is holding on to unnecessarily that it may not have released yet because it has been a long time since it reached the normal GC heap threshold. The actor's heap is allowed to grow via current GC process prior to it getting to a point where if the GC hasn't run for a long time because it's memory allocation has grown enough that it will not get triggered for longer than 10000 messages processed then GC will be forced due to the actor having processed 10000 messages. Note, this does limit an actor's memory usage, but it is limited as factor of how much memory an actor uses per 10000 messages. There will be some actors that will only use 10000 bytes (and be below the --ponygcinitial threshold) and be forced to GC and there will be other actors that will use 100 MB or 10 GB per 10000 messages before they are forced to GC because of # of messages. I don't know that the 10000 number is the right heuristic but it seemed like a place to start. The main point was to ensure that just because an actor needs to use100 MB or 10 GB before it would hit the next GC threshold, shouldn't mean that it should be allowed to hold on to that memory forever if it never hits that next threshold.

It seems to me that the concern is that the 10000 is a static number and caps memory for an actor at some point. Maybe an option would be to make the 10000 into a number that grows slowly over time so that actor's memory usage can increase over time but after a certain point (the 10000 messages between heap usage trigger GC) the memory growth goes from a factor of 2.0x based on memory a factor based on # of messages which grows by something smaller like 1.2x or 1.5x?

Btw, no rush on this. I did the rebase mainly to ensure the code didn't bitrot and start running into issues. Take your time and let me know your thoughts on how to solve the competing concerns of letting actors use more and more memory over time to ensure they don't GC too frequently vs ensuring actors don't hold onto memory forever.

In the meantime, should I break off the GC on block into a separate PR if that change is less concerning/controversial?

SeanTAllen · 2019-07-31T03:35:24Z

@dipinhora i think breaking them apart seems reasonable.

dipinhora · 2019-08-11T02:00:52Z

@SeanTAllen Broken apart. This PR has been rebased on master and the "GC after blocking" changes reverted.

The "GC after blocking" change can be found in #3278

SeanTAllen · 2020-09-25T03:04:09Z

@dipinhora when you have time, i'd love to chat about how we can test this to determine the performance impact be it positive or negative.

SeanTAllen · 2020-09-26T13:46:08Z

Re: messages processed part of this PR @dipinhora. I think you are on to something, but I think it might be a little too course grained.

Question:

We bump the heap used by some EQUIV when receiving items from other actors, so shouldn't that eventually do more or less what the "after X number of messages" part of this does. Unless those messages are only primitives rather than objects then I believe we don't do a heap equiv bump.
That's my quickie read of things in gc.c where the calls to ponyint_heap_used happen.

I can see because equiv is "smallish" that is next-gc is sufficiently large that it would take a large number of messages to result in freeing.

My thinking currently is, rather than us trying to do something clever with gc related to number of messages etc that perhaps we should be considering exposing gc via the standard library in a way that allows programmer to force a gc. Or even better, allow for actors to have a default gc strategy but be able to change it programatically.

Prior to this commit, GC would only be triggered if an actor's heap grew to reach the next heap size cutoff for GC. This would potentially result in some actors not getting GC'd (and so holding on to memory for longer than necessary) because they happen to not allocate large amounts of memory even when processing lots of application messages and take a very long time to reach the next heap size for GC to occur. This commit adds an alternate way that GC for an actor can be triggered in order to force actors that don't allocate large amounts of memory to GC and free up memory more frequently. This is done by keeping track of the number of application messages processed since the last GC and forcing a GC if the number of messages handled passes a threshold (10x actor batch size).

Prior to this commit, if an actor blocked, it did not run GC to free any memory it no longer needed. This would result in blocked actors holding on to (potentially lots of) memory unnecessarily. This commit causes GC to be triggered when the cycle detector asks an actor if it is blocked and the actor responds telling the cycle detector that it is blocked. This should result in memory being held by blocked actors to be freed more quickly even if the cycle detector doesn't end up detecting a cycle and reaping the actors.

dipinhora · 2022-07-11T23:52:07Z

We bump the heap used by some EQUIV when receiving items from other actors, so shouldn't that eventually do more or less what the "after X number of messages" part of this does.

That doesn't do same as what the after X number of messages of this PR does. We only bump by some EQUIV for an actor the first time we receive any item from that actor (see:

ponyc/src/libponyrt/gc/gc.c

Lines 97 to 101 in 2c4f404

    
           if(aref->rc == 0) 
        
           { 
        
             // Increase apparent used memory to provoke GC. 
        
             ponyint_heap_used(ponyint_actor_heap(ctx->current), GC_ACTOR_HEAP_EQUIV); 
        
           }

) and we bump some EQUIV for an object the first time we receive the item only if it is immutable (see:

ponyc/src/libponyrt/gc/gc.c

Lines 349 to 359 in 2c4f404

    
           if(obj->rc == 0) 
        
           { 
        
             // Increase apparent used memory to provoke GC. 
        
             ponyint_heap_used(ponyint_actor_heap(ctx->current), 
        
               ponyint_heap_size(chunk)); 
        
             // Increase apparent used memory further if the object is immutable, to 
        
             // account for memory that is reachable but not traced by this actor. 
        
             if(mutability == PONY_TRACE_IMMUTABLE) 
        
               ponyint_heap_used(ponyint_actor_heap(ctx->current), GC_IMMUT_HEAP_EQUIV); 
        
           }

). As a result, if we're consistently receiving and processing messages from the same actor the EQUIV logic wouldn't trigger a GC while the GC after X messages would.

SeanTAllen · 2022-07-12T00:01:56Z

I'm still not in favor of this and I think it adds unnecessary gc allocations. I understand the goal, but it can result in unneeded garbage collection. We have a gc strategy, I would like to work within that.

An idea, @dipinhora, do you think that if on a per actor basis, you could set the gcinitial for that actor and gcfactor, it would accomplish what you are looking for in a way that is in the programmer's control? With such a mechanism you could, if you wanted to set a initial value of X with a factor of 1 and always GC at that level of memory but never only for a given actor.

Additionally, I can already GC anytime (using pony_triggergc) I want but I can't do it based on number of messages (unless I track application messages myself).

GC after X messages or when blocked isn't something I would want on in just about any application that I write.

dipinhora · 2022-07-12T00:09:22Z

Fair enough. Closing. Might revisit via one of the alternatives suggested if it makes sense.

SeanTAllen added the needs discussion during sync label Feb 27, 2019

dipinhora force-pushed the gc_triggers branch from af71a3d to 38376d2 Compare March 17, 2019 16:43

dipinhora force-pushed the gc_triggers branch from 38376d2 to 363b867 Compare March 29, 2019 02:03

SeanTAllen added the do not merge This PR should not be merged at this time label Apr 2, 2019

SeanTAllen removed the needs discussion during sync label May 14, 2019

dipinhora force-pushed the gc_triggers branch from 363b867 to 0024bcd Compare May 15, 2019 01:24

dipinhora force-pushed the gc_triggers branch from 0024bcd to f355e0e Compare June 11, 2019 01:08

dipinhora force-pushed the gc_triggers branch from f355e0e to 9dca900 Compare June 13, 2019 00:10

dipinhora force-pushed the gc_triggers branch from 9dca900 to 83aa54f Compare July 31, 2019 01:36

dipinhora force-pushed the gc_triggers branch from 83aa54f to 394b457 Compare August 11, 2019 01:58

dipinhora mentioned this pull request Aug 28, 2019

Trigger GC for actors when they tell the cycle detector they're blocked #3278

Merged

Base automatically changed from master to main February 8, 2021 23:02

dipinhora added 2 commits July 11, 2022 19:28

dipinhora added 3 commits July 11, 2022 19:29

Ensure GC for # messages threshold only runs if heap used > 0.

8833a1b

Undo trigger GC if blocked

ccf02aa

Update comments tracking size of pony_actor_t

6a3a450

dipinhora force-pushed the gc_triggers branch from 394b457 to 6a3a450 Compare July 11, 2022 23:39

ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Jul 11, 2022

dipinhora changed the title ~~Trigger GC after a set number of messages and when blocked~~ Trigger GC after a set number of messages Jul 12, 2022

dipinhora closed this Jul 12, 2022

ponylang-main added discuss during sync Should be discussed during an upcoming sync and removed discuss during sync Should be discussed during an upcoming sync labels Jul 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trigger GC after a set number of messages #3031

Trigger GC after a set number of messages #3031

dipinhora commented Feb 27, 2019 •

edited

Loading

dipinhora commented Mar 17, 2019

dipinhora commented Mar 29, 2019

sylvanc commented May 14, 2019

dipinhora commented May 14, 2019

dipinhora commented May 15, 2019

SeanTAllen commented Jun 11, 2019

dipinhora commented Jun 11, 2019

dipinhora commented Jun 11, 2019

dipinhora commented Jun 13, 2019

dipinhora commented Jul 31, 2019

SeanTAllen commented Jul 31, 2019

dipinhora commented Jul 31, 2019

SeanTAllen commented Jul 31, 2019

dipinhora commented Aug 11, 2019

SeanTAllen commented Sep 25, 2020

SeanTAllen commented Sep 26, 2020

dipinhora commented Jul 11, 2022

SeanTAllen commented Jul 12, 2022

dipinhora commented Jul 12, 2022

Trigger GC after a set number of messages #3031

Trigger GC after a set number of messages #3031

Conversation

dipinhora commented Feb 27, 2019 • edited Loading

dipinhora commented Mar 17, 2019

dipinhora commented Mar 29, 2019

sylvanc commented May 14, 2019

dipinhora commented May 14, 2019

dipinhora commented May 15, 2019

SeanTAllen commented Jun 11, 2019

dipinhora commented Jun 11, 2019

dipinhora commented Jun 11, 2019

dipinhora commented Jun 13, 2019

dipinhora commented Jul 31, 2019

SeanTAllen commented Jul 31, 2019

dipinhora commented Jul 31, 2019

SeanTAllen commented Jul 31, 2019

dipinhora commented Aug 11, 2019

SeanTAllen commented Sep 25, 2020

SeanTAllen commented Sep 26, 2020

dipinhora commented Jul 11, 2022

SeanTAllen commented Jul 12, 2022

dipinhora commented Jul 12, 2022

dipinhora commented Feb 27, 2019 •

edited

Loading