fix(healthcheck): replace checker callback timer with ngx.thread to lower timer usage #155

windmgc · 2024-05-06T09:33:40Z

This PR modifies the way that checker_callback spawns the background checker for targets.

Previously the checker spawns timers with 0 delay. When the user has defined a huge amount of upstream targets with small check intervals, this may result in running timers hitting the limit in OpenResty. FTI-5847 provides a similar case that due to the nature of timer-ng, this results in a memory "leaking" due to an unlimited number of timers being spawned every interval and the timer job consuming speed cannot catch up with timer creating speed.

The PR tries to modify the checker to spawn light thread directly by using ngx.thread.spawn since there is no need for spawning timers(which are also light threads by nature), so that we can keep behaviour the same as before as much as possible.

Alleviates FTI-5847

…ower timer usage

CLAassistant · 2024-05-06T09:38:58Z

All committers have signed the CLA.

windmgc · 2024-05-08T08:04:31Z

@oowl @locao - Could you please help check if this is a meaningful change to you? will make a change against master if it does

(Although I found it seems meaningless to write test for it)

Tieske · 2024-05-08T08:43:43Z

iirc there is a limit of 100 threads per request-context. Do we risk hitting that in large installations?

windmgc · 2024-05-08T09:02:20Z

@Tieske Hmm I haven't heard of such a limitation, timers are also relying on light threads(coroutines) so they should somehow be similar.

Which limitation are you referring to? Maybe this https://github.com/openresty/lua-nginx-module?tab=readme-ov-file#lua_worker_thread_vm_pool_size?

Tieske · 2024-05-08T10:32:05Z

not sure it's that one, since it refers to VMs. Anyway I cannot reproduce it, so seems the limitation is no longer present.

oowl · 2024-05-09T03:45:54Z

lib/resty/healthcheck.lua

-    })
-    if timer == nil then
-      self:log(ERR, "failed to create timer to check ", health_mode)
+    local thread = ngx.thread.spawn(function(mode, list)


I do not agree with this change
checker_callback() was called by active_check_timer timer function, that function for loop will iterate all active health check instances, and then submit an active health checker task by timer, that's our existing implementation. But After this change, the health checker task from timer to ngx.thread , which means from async thread to sync thread. In async mode, we submit the health checker task and then do not wait for the task return status, but in sync thread, we need to wait task return thread, So every active healthcheck instance for the loop needs to wait for all health check tasks done, That's not accepted for us when we have many upstream node need to active healthchecker eg: if some upstream nodes can not connect, it will trigger timeout logic, the whole for loop execution time was sum by all node connection timeout.

A possible solution to this problem is that when we create an active health check task, we check the completion status of the previous task on this instance. If it is still in progress, we cancel the submission of the current task, but this may be a break change in bad case.

@oowl I created an alternative solution for this: #156

To keep the behaviour as much similar as before, I checked the running status of previous task on this instance, instead of the completion status. ~~The behavior of checking the completion status will be just the same as the current PR which waits for the coroutine to finish.~~

windmgc · 2024-05-10T12:34:55Z

Close in favor of #156

fix(healthcheck): replace checker callback timer with ngx.thread to l…

1670687

…ower timer usage

windmgc changed the base branch from master to release/1.6.x May 6, 2024 09:33

Kong deleted a comment from CLAassistant May 6, 2024

windmgc changed the title ~~Fix replace hc timer with thread~~ fix(healthcheck): replace checker callback timer with ngx.thread to lower timer usage May 6, 2024

windmgc added 2 commits May 7, 2024 15:27

add error handling for thread wait

4444cdd

no need to check return co

1c20e23

windmgc marked this pull request as ready for review May 8, 2024 08:03

windmgc requested review from oowl and locao May 8, 2024 08:04

oowl reviewed May 9, 2024

View reviewed changes

windmgc mentioned this pull request May 9, 2024

fix(*): avoid creating multiple timers to run the same active check #156

Merged

windmgc closed this May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(healthcheck): replace checker callback timer with ngx.thread to lower timer usage #155

fix(healthcheck): replace checker callback timer with ngx.thread to lower timer usage #155

windmgc commented May 6, 2024 •

edited

Loading

CLAassistant commented May 6, 2024 •

edited

Loading

windmgc commented May 8, 2024

Tieske commented May 8, 2024

windmgc commented May 8, 2024 •

edited

Loading

Tieske commented May 8, 2024

oowl May 9, 2024

oowl May 9, 2024

windmgc May 9, 2024 •

edited

Loading

windmgc commented May 10, 2024

fix(healthcheck): replace checker callback timer with ngx.thread to lower timer usage #155

fix(healthcheck): replace checker callback timer with ngx.thread to lower timer usage #155

Conversation

windmgc commented May 6, 2024 • edited Loading

CLAassistant commented May 6, 2024 • edited Loading

windmgc commented May 8, 2024

Tieske commented May 8, 2024

windmgc commented May 8, 2024 • edited Loading

Tieske commented May 8, 2024

oowl May 9, 2024

Choose a reason for hiding this comment

oowl May 9, 2024

Choose a reason for hiding this comment

windmgc May 9, 2024 • edited Loading

Choose a reason for hiding this comment

windmgc commented May 10, 2024

windmgc commented May 6, 2024 •

edited

Loading

CLAassistant commented May 6, 2024 •

edited

Loading

windmgc commented May 8, 2024 •

edited

Loading

windmgc May 9, 2024 •

edited

Loading