Replies: 4 comments 5 replies
-
Kong 1.5.1 is EOL ( end of life), would you mind to upgrade to latest version (3.7) and try again? |
Beta Was this translation helpful? Give feedback.
-
Could you change your questions to English? then we can help you easily. |
Beta Was this translation helpful? Give feedback.
-
When the IP is scaled down at 7 o'clock, and a new pod IP is generated by the node deployment service at 9 o'clock, Kong may redirect the traffic to the IP that was scaled down at 7 o'clock. Sometimes it will recover within 10 seconds, and sometimes it will loop endlessly, requiring a node restart |
Beta Was this translation helpful? Give feedback.
-
Hi All We face an issue where post new deployments (Basically when new pods come in) Kong continues to try to send the requests to old upstream IPs. On analysing further we also see the target table in postgres is updated with new IPs and does not contain the stale IPs. We are guessing the cache in Kong is not refreshed and hence the issue. As a temp solution we have restarted the kong pods. What could be a permanent fix? |
Beta Was this translation helpful? Give feedback.
-
背景:
1、kong的版本为1.5.1
2、给upstream配置的healthcheck为
{
"created_at": 1607580903,
"hash_on": "none",
"id": "02b1062d-448f-4037-9698-da83e7d790a1",
"algorithm": "round-robin",
"name": "client-shopping-cart",
"tags": [
"k8s-1701624154669388227"
],
"hash_fallback_header": null,
"hash_fallback": "none",
"hash_on_cookie": null,
"host_header": null,
"hash_on_cookie_path": "/",
"healthchecks": {
"active": {
"unhealthy": {
"http_statuses": [
429,
404,
500,
501,
502,
503,
504,
505
],
"tcp_failures": 0,
"timeouts": 0,
"http_failures": 0,
"interval": 2
},
"type": "http",
"http_path": "/health",
"timeout": 1,
"healthy": {
"successes": 1,
"interval": 0,
"http_statuses": [
200,
302
]
},
"https_sni": null,
"https_verify_certificate": true,
"concurrency": 10
},
"passive": {
"unhealthy": {
"http_failures": 0,
"http_statuses": [
429,
500,
503
],
"tcp_failures": 1,
"timeouts": 5
},
"healthy": {
"http_statuses": [
200,
201,
202,
203,
204,
205,
206,
207,
208,
226,
300,
301,
302,
303,
304,
305,
306,
307,
308
],
"successes": 0
},
"type": "tcp"
}
},
"hash_on_header": null,
"slots": 10000
}
3、kong error日志有
[lua] events.lua:273: post(): worker-events: failed posting event "healthy" by "lua-resty-healthcheck [client-shopping-cart]"; no memory, context: ngx.timer
[lua] events.lua:273: post(): worker-events: failed posting event "healthy" by "lua-resty-healthcheck [client-shopping-cart]"; no memory, context: ngx.timer
[lua] healthcheck.lua:1068: log(): [healthcheck] (client-shopping-cart) event: trying to remove an unknown target '...(...:7083)', context: ngx.timer
[lua] targets.lua:65: clean_history(): [Target DAO] Starting cleanup of target table for upstream -448f-4037-9698-da83e7d790a1, client: 127.0.0.1, server: kong_admin, request: "DELETE /upstreams/client-shopping-cart/targets/-1f04-424f-8c91-b5afb800297e HTTP/1.1", host: "localhost:8001"
[lua] events.lua:194: do_handlerlist(): worker-events: event callback failed; source=lua-resty-healthcheck [client-shopping-cart], event=healthy, pid=38 error='/usr/local/share/lua/5.1/resty/healthcheck.lua:247: attempt to index field 'targets' (a nil value)
[lua] balancer.lua:810: do_upstream_event(): failed recreating balancer for client-shopping-cart: timeout waiting for balancer for ***-448f-4037-9698-da83e7d790a1, context: ngx.timer
[lua] events.lua:155: post_event(): worker-events: could not write to shm after 6 tries (no memory), it is either fragmented or cannot allocate more memory, consider increasing 'opts.shm_retries' or increasing the shm size, context: ngx.timer
[lua] events.lua:364: poll(): worker-events: dropping event; waiting for event data timed out, id: 10639087, context: ngx.timer
[lua] events.lua:364: poll(): worker-events: dropping event; waiting for event data timed out, id: 10200660, context: ngx.timer
4、kong的配置
5、现象为k8s的pod重启漂移ip变更、短时间内几十个节点更新、kong对应的upstream开启了健康监测、kong就出现了某几个节点出现了服务pod为不存在的节点
6、问题
我想咨询下是什么触发的此问题?有什么解决办法?shm的大小是否会自动回收?这个shm的监控需要添加吗怎么添加呢?这个健康监测是否配置错误导致的?重试次数opts.shm_retries能否调整?是哪个shm导致的问题呢?
Beta Was this translation helpful? Give feedback.
All reactions