From e462456e5ad063ea193fbde1ae8cef6a312d13ff Mon Sep 17 00:00:00 2001 From: Thibault Charbonnier Date: Sat, 17 Mar 2018 01:34:15 -0700 Subject: [PATCH] fix(*) use dedicated shm for rate-limiting plugins Addresses the issue discussed in #3124 and #3241. This is part of a series of fixes to address those errors. Context ------- In the `local` mode of the rate-limiting plugins, storing the rate-limiting counters in the same shm used by Kong's database cache is too invasive for the underlying shm, especially when the rate-limiting plugins are used with a `seconds` precision. On top of exhausting the database cache slots, this approach also generates some form of fragmentation in the shm. This is due to the side-by-side storage of values with sizes of different orders of magnitude (JSON strings vs. an incremented double) and the LRU eviction mechanism. When the shm is full and LRU kicks-in, it is highly probable that several rate-limiting counters will be evicted (due to their proliferation), thus not freeing enough space to store the retrieved data, causing a `no memory` error to be reported by the shm. Solution -------- Declaring shms that are only used by some plugins is not very elegant. Now, all users (even those not using rate-limiting plugins) have to pay a memory cost (although small). Unfortunately, and in the absence of a more dynamic solution to shm configuration such as a more dynamic templating engine, or a `configure_by_lua` phase, this is the safest solution. Size rationale -------------- Running a script generating similar keys and storing similar values (double) indicates that an shm with 12Mb should be able to store about ~48,000 of those values at once. It is important to remind ourselves that one Consumer/IP address might use more than one key (in fact, one per period configured on the plugin), and both the rate-limiting and response-ratelimiting plugins at once, and they use the same shms. Even considering the above statements, ~48,000 keys per node seems somewhat reasonable, considering keys of `second` precision will most likely fill up the shm and be candidates for LRU eviction. Our concern lies instead around long-lived limits (and thus, keys) set by the user. Additionally, a future improvement upon this will be the setting of the `init_ttl` argument for the rate-limiting keys, which will help **quite considerably** in reducing the footprint of the plugins on the shm. As of this day, this feature has been contributed to ngx_lua but not released yet: https://github.com/openresty/lua-nginx-module/pull/1226 Again, this limit only applies when using the **local** strategy, which also likely means that a load-balancer is distributing traffic to a pool of Kong nodes with some sort of consistent load-balancing technique. Thus considerably reducing the number of concurrent Consumers a given node needs to handle at once. See also -------- Another piece of the fixes for the `no memory` errors resides in the behavior of the database caching module upon a full shm. See: https://github.com/thibaultcha/lua-resty-mlcache/pull/41 This patch reduces the likeliness of a full shm (by a lot!), but does not remove it. The above patch ensures a somewhat still sane behavior would the shm happen to be full again. Fix #3124 Fix #3241 --- kong/constants.lua | 1 + kong/plugins/rate-limiting/policies/init.lua | 2 +- kong/plugins/response-ratelimiting/policies/init.lua | 2 +- kong/templates/nginx_kong.lua | 1 + spec/fixtures/custom_nginx.template | 1 + 5 files changed, 5 insertions(+), 2 deletions(-) diff --git a/kong/constants.lua b/kong/constants.lua index d538dca681b7..9947280f9067 100644 --- a/kong/constants.lua +++ b/kong/constants.lua @@ -73,5 +73,6 @@ return { "kong_process_events", "kong_cluster_events", "kong_healthchecks", + "kong_rate_limiting_counters", }, } diff --git a/kong/plugins/rate-limiting/policies/init.lua b/kong/plugins/rate-limiting/policies/init.lua index 86bbd0dd79b2..2e996e6fbfaf 100644 --- a/kong/plugins/rate-limiting/policies/init.lua +++ b/kong/plugins/rate-limiting/policies/init.lua @@ -4,7 +4,7 @@ local redis = require "resty.redis" local policy_cluster = require "kong.plugins.rate-limiting.policies.cluster" local reports = require "kong.core.reports" local ngx_log = ngx.log -local shm = ngx.shared.kong_cache +local shm = ngx.shared.kong_rate_limiting_counters local pairs = pairs local fmt = string.format diff --git a/kong/plugins/response-ratelimiting/policies/init.lua b/kong/plugins/response-ratelimiting/policies/init.lua index d6ed13e71178..a896b1fef71e 100644 --- a/kong/plugins/response-ratelimiting/policies/init.lua +++ b/kong/plugins/response-ratelimiting/policies/init.lua @@ -4,7 +4,7 @@ local redis = require "resty.redis" local policy_cluster = require "kong.plugins.response-ratelimiting.policies.cluster" local reports = require "kong.core.reports" local ngx_log = ngx.log -local shm = ngx.shared.kong_cache +local shm = ngx.shared.kong_rate_limiting_counters local pairs = pairs local fmt = string.format diff --git a/kong/templates/nginx_kong.lua b/kong/templates/nginx_kong.lua index 5639f3195b44..07a3e35b2fe0 100644 --- a/kong/templates/nginx_kong.lua +++ b/kong/templates/nginx_kong.lua @@ -33,6 +33,7 @@ lua_shared_dict kong_cache ${{MEM_CACHE_SIZE}}; lua_shared_dict kong_process_events 5m; lua_shared_dict kong_cluster_events 5m; lua_shared_dict kong_healthchecks 5m; +lua_shared_dict kong_rate_limiting_counters 12m; > if database == "cassandra" then lua_shared_dict kong_cassandra 5m; > end diff --git a/spec/fixtures/custom_nginx.template b/spec/fixtures/custom_nginx.template index 24425eb8f306..5dd1c554907f 100644 --- a/spec/fixtures/custom_nginx.template +++ b/spec/fixtures/custom_nginx.template @@ -46,6 +46,7 @@ http { lua_shared_dict kong_process_events 5m; lua_shared_dict kong_cluster_events 5m; lua_shared_dict kong_healthchecks 5m; + lua_shared_dict kong_rate_limiting_counters 12m; > if database == "cassandra" then lua_shared_dict kong_cassandra 5m; > end