When a container is terminated in Kubernetes, by default a SIGTERM signal is sent to the PID 1 of the container to indicate that the process must stop. Envoy, by default, immediately closes all connections and exits upon receiving a SIGTERM, which can cause errors for the connections that are in flight. To avoid this MARIN3R provides a mechanism to perform connection draining before shutting down the Envoy process.
This mechanism is called the shutdown manager and consists of an extra container that runs alongside the Envoy container and is in charge of draining connections by calling Envoy's admin API whenever the Envoy process is signaled to stop. The shutdown manager is not enabled by default, but you can activate it (and we strongly advise you to do so in production environments) both for Envoy sidecars and for EnvoyDeployments.
In this walkthrough, we are going to enable the shutdown manager for an EnvoyDeployment resource and validate the functionality with a simple test.
You need to have MARIN3R operator installed in the cluster and a DiscoveryService within the namespace you will be using. Follow the installation instructions to do so if you haven't already.
First of all, we need to create the Envoy configuration that our EnvoyDeployment will use. In this case, we are going to use a config that always returns a hardcoded 200 OK HTTP response.
cat <<'EOF' | kubectl apply -f -
apiVersion: marin3r.3scale.net/v1alpha1
kind: EnvoyConfig
metadata:
name: envoy
spec:
nodeID: envoy
serialization: yaml
envoyAPI: v3
resources:
- type: listener
value:
name: http
address:
socket_address:
address: 0.0.0.0
port_value: 8080
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
'@type': type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
route_config:
name: route
virtual_hosts:
- domains:
- '*'
name: any
routes:
- direct_response:
status: 200
match:
prefix: /
stat_prefix: ingress_http
EOF
cat <<'EOF' | kubectl apply -f -
apiVersion: operator.marin3r.3scale.net/v1alpha1
kind: EnvoyDeployment
metadata:
name: envoy
spec:
discoveryServiceRef: discoveryservice
envoyConfigRef: envoy
ports:
- name: http
port: 8080
replicas:
static: 1
# this enables the shutdown manager
shutdownManager: {}
EOF
After some seconds you should see the following pods in the namespace:
▶ kubectl get pods
NAME READY STATUS RESTARTS AGE
marin3r-discoveryservice-f8bc788bd-2296r 1/1 Running 0 2m39s
marin3r-envoydeployment-envoy-64c976564f-jc8q7 2/2 Running 0 94s
We are going to use kubectl port-forward
to access the Envoy Pod. As usual, you can also use a Service of LoadBalancer type if your cluster supports it.
In a different shell execute:
▶ kubectl port-forward deployment/marin3r-envoydeployment-envoy 8080:8080
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
We should be able to curl our Envoy and get an HTTP 200 code.
▶ curl http://localhost:8080 -i
HTTP/1.1 200 OK
date: Mon, 05 Jul 2021 15:23:46 GMT
server: envoy
content-length: 0
Let's now set our EnvoyDeployment to zero replicas and see what happens. Before doing so, we will be opening a persistent connection to the server using telnet. Open another shell and execute:
▶ telnet localhost 8080
Trying ::1...
Connected to localhost.
Escape character is '^]'.
Leave the connection open.
Patch the EnvoyDeployment resource to leave it with 0 replicas:
▶ kubectl patch envoydeployment envoy --type merge --patch '{"spec":{"replicas":{"static":0}}}'
envoydeployment.operator.marin3r.3scale.net/envoy patched
If you list the pods now you will see that the Envoy Pod is still in terminating, but not yet terminated. Our open telnet is preventing the Pod from terminating because the shutdown manager is waiting for all connections to be drained before proceeding with the shutdown of the server:
▶ kubectl get pods
NAME READY STATUS RESTARTS AGE
marin3r-discoveryservice-f8bc788bd-2296r 1/1 Running 0 20m
marin3r-envoydeployment-envoy-64c976564f-dhmct 1/2 Terminating 0 2m59s
We can check the logs of the shutdown manager to see how it checks the number of open connections to determine if it is safe to continue with the shutdown of the server:
▶ kubectl logs -c envoy-shtdn-mgr -f -l app.kubernetes.io/component=envoy-deployment,app.kubernetes.io/instance=envoy
{"level":"info","ts":1625499723.074794,"logger":"shutdownmanager","msg":"file /tmp/shutdown-ok does not exist, recheck in 1s","context":"waitForDrainHandler"}
{"level":"info","ts":1625499723.890846,"logger":"shutdownmanager","msg":"polled open connections","context":"DrainListeners","open_connections":1,"min_connections":0}
{"level":"info","ts":1625499724.0751417,"logger":"shutdownmanager","msg":"file /tmp/shutdown-ok does not exist, recheck in 1s","context":"waitForDrainHandler"}
{"level":"info","ts":1625499725.0754566,"logger":"shutdownmanager","msg":"file /tmp/shutdown-ok does not exist, recheck in 1s","context":"waitForDrainHandler"}
{"level":"info","ts":1625499726.0758047,"logger":"shutdownmanager","msg":"file /tmp/shutdown-ok does not exist, recheck in 1s","context":"waitForDrainHandler"}
{"level":"info","ts":1625499727.0760198,"logger":"shutdownmanager","msg":"file /tmp/shutdown-ok does not exist, recheck in 1s","context":"waitForDrainHandler"}
{"level":"info","ts":1625499728.0763586,"logger":"shutdownmanager","msg":"file /tmp/shutdown-ok does not exist, recheck in 1s","context":"waitForDrainHandler"}
{"level":"info","ts":1625499728.964318,"logger":"shutdownmanager","msg":"polled open connections","context":"DrainListeners","open_connections":1,"min_connections":0}
{"level":"info","ts":1625499729.0767376,"logger":"shutdownmanager","msg":"file /tmp/shutdown-ok does not exist, recheck in 1s","context":"waitForDrainHandler"}
Let's go back to the shell where we opened our telnet connection and close it. You will see that soon after closing the connection the Pod is finally terminated:
▶ kubectl get pods
NAME READY STATUS RESTARTS AGE
marin3r-discoveryservice-f8bc788bd-2296r 1/1 Running 0 24m
In this walkthrough, we have showcased how we can use the shutdown manager component of MARIN3R to ensure properly ordered shutdown of our proxies with connection draining, which is something usually desirable in production environments. Take into account that even with connection draining, MARIN3R configures the Envoy pods to be terminated anyway after 5 minutes if connection draining has not been completed past that time.
Execute the following commands to delete the resources created in this walkthrough:
kubectl delete envoydeployment envoy
kubectl delete envoyconfig envoy