Elasticsearch in production always Back-off restarting failed container #84

chalvern · 2018-04-28T09:15:00Z

elasticsearch version:

docker.elastic.co/elasticsearch/elasticsearch:5.6.0

k8s cluster version

1.10

describe

# kubectl describe pods -n jaeger  elasticsearch-0

Name:           elasticsearch-0
Namespace:      jaeger
Node:           node-1/192.168.205.128
Start Time:     Sat, 28 Apr 2018 16:44:35 +0800
Labels:         app=jaeger-elasticsearch
                controller-revision-hash=elasticsearch-8684f69799
                jaeger-infra=elasticsearch-replica
                statefulset.kubernetes.io/pod-name=elasticsearch-0
Annotations:    <none>
Status:         Running
IP:             192.168.3.197
Controlled By:  StatefulSet/elasticsearch
Containers:
  elasticsearch:
    Container ID:  docker://941824d0c9186862372c793d41d578a5e34c0972c877771d00629dc375593530
    Image:         docker.elastic.co/elasticsearch/elasticsearch:5.6.0
    Image ID:      docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:f95e7d4256197a9bb866b166d9ad37963dc7c5764d6ae6400e551f4987a659d7
    Port:          <none>
    Host Port:     <none>
    Command:
      bin/elasticsearch
    Args:
      -Ehttp.host=0.0.0.0
      -Etransport.host=127.0.0.1
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Sat, 28 Apr 2018 16:50:57 +0800
      Finished:     Sat, 28 Apr 2018 16:50:57 +0800
    Ready:          False
    Restart Count:  6
    Readiness:      exec [curl --fail --silent --output /dev/null --user elastic:changeme localhost:9200] delay=5s timeout=4s period=5s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8l8qt (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  data:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  default-token-8l8qt:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-8l8qt
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age               From                 Message
  ----     ------                 ----              ----                 -------
  Normal   Scheduled              7m                default-scheduler    Successfully assigned elasticsearch-0 to node-1
  Normal   SuccessfulMountVolume  7m                kubelet, node-1  MountVolume.SetUp succeeded for volume "data"
  Normal   SuccessfulMountVolume  7m                kubelet, node-1  MountVolume.SetUp succeeded for volume "default-token-8l8qt"
  Normal   Pulling                6m (x4 over 7m)   kubelet, node-1  pulling image "docker.elastic.co/elasticsearch/elasticsearch:5.6.0"
  Normal   Pulled                 6m (x4 over 7m)   kubelet, node-1  Successfully pulled image "docker.elastic.co/elasticsearch/elasticsearch:5.6.0"
  Normal   Created                6m (x4 over 7m)   kubelet, node-1  Created container
  Normal   Started                6m (x4 over 7m)   kubelet, node-1  Started container
  Warning  BackOff                2m (x22 over 7m)  kubelet, node-1  Back-off restarting failed container

log

# kubectl logs -n jaeger  elasticsearch-0
# nothing shown.

pavolloffay · 2018-05-02T12:53:49Z

@chalvern hi, did you manage to solve it? As there are no log's it's hard to find out what caused the issue.

chalvern · 2018-05-02T15:24:28Z

@pavolloffay I am afraid not, but possibly be source limit, as my k8s cluster is setted on 2 vm machines, each with 2cpu/2Gmemory.
I will check it after in my free time.

chalvern · 2018-05-03T13:42:23Z

As what I said, out of memory...

May  3 21:27:03 xxx-1 kernel: [74354.386802] Out of memory: Kill process 35184 (java) score 1621 or sacrifice child
May  3 21:27:03 xxx-1 kernel: [74354.387300] Killed process 35184 (java) total-vm:2599788kB, anon-rss:1262648kB, file-rss:0kB

pavolloffay · 2018-05-03T13:51:21Z

Then it's environment issue, I will close it. If anything pops up up feel free to reopen.

chalvern · 2018-05-03T14:00:40Z

Finally, my solution is to add the following ENV config to elasticsearch.yml

env:
  - name: ES_JAVA_OPTS
     value: -Xms256m -Xmx512m
  - name: bootstrap.memory_lock
     value: "true"

jpkrohling · 2018-05-07T12:17:16Z

I'm reopening this, so that we apply @chalvern's env vars to elasticsearch.yml.

jpkrohling · 2018-07-11T08:38:36Z

@chalvern would you be interested in contributing a fix to this?

pavolloffay · 2018-07-12T07:54:45Z

-Xms256m -Xmx512m seems very low for Elasticsearch. For example openshift logging uses 8gb by default

pavolloffay · 2018-07-12T07:56:32Z

I am also making a pointer do docs for bootstrap.memory_lock https://www.elastic.co/guide/en/elasticsearch/reference/master/setup-configuration-memory.html#bootstrap-memory_lock

chalvern · 2018-07-12T15:43:14Z

@jpkrohling I worry the -Xms256m -Xmx512m is too low to use in production, just as @pavolloffay mentioned. The yaml of Elasticsearch in production seems actually like in test instead of in production.

What I suggest is to take it as test. In production, there should be replica of Elasticsearch or called cluster.

pavolloffay closed this as completed May 3, 2018

jpkrohling reopened this May 7, 2018

jpkrohling added easy good first issue labels Jul 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticsearch in production always Back-off restarting failed container #84

Elasticsearch in production always Back-off restarting failed container #84

chalvern commented Apr 28, 2018

pavolloffay commented May 2, 2018

chalvern commented May 2, 2018 •

edited

Loading

chalvern commented May 3, 2018

pavolloffay commented May 3, 2018

chalvern commented May 3, 2018

jpkrohling commented May 7, 2018

jpkrohling commented Jul 11, 2018

pavolloffay commented Jul 12, 2018

pavolloffay commented Jul 12, 2018

chalvern commented Jul 12, 2018 •

edited

Loading

Elasticsearch in production always Back-off restarting failed container #84

Elasticsearch in production always Back-off restarting failed container #84

Comments

chalvern commented Apr 28, 2018

elasticsearch version:

k8s cluster version

describe

log

pavolloffay commented May 2, 2018

chalvern commented May 2, 2018 • edited Loading

chalvern commented May 3, 2018

pavolloffay commented May 3, 2018

chalvern commented May 3, 2018

jpkrohling commented May 7, 2018

jpkrohling commented Jul 11, 2018

pavolloffay commented Jul 12, 2018

pavolloffay commented Jul 12, 2018

chalvern commented Jul 12, 2018 • edited Loading

chalvern commented May 2, 2018 •

edited

Loading

chalvern commented Jul 12, 2018 •

edited

Loading