Skip to content
This repository has been archived by the owner on May 18, 2020. It is now read-only.

Elasticsearch in production always Back-off restarting failed container #84

Open
chalvern opened this issue Apr 28, 2018 · 10 comments
Open

Comments

@chalvern
Copy link

elasticsearch version:

docker.elastic.co/elasticsearch/elasticsearch:5.6.0

k8s cluster version

1.10

describe

# kubectl describe pods -n jaeger  elasticsearch-0

Name:           elasticsearch-0
Namespace:      jaeger
Node:           node-1/192.168.205.128
Start Time:     Sat, 28 Apr 2018 16:44:35 +0800
Labels:         app=jaeger-elasticsearch
                controller-revision-hash=elasticsearch-8684f69799
                jaeger-infra=elasticsearch-replica
                statefulset.kubernetes.io/pod-name=elasticsearch-0
Annotations:    <none>
Status:         Running
IP:             192.168.3.197
Controlled By:  StatefulSet/elasticsearch
Containers:
  elasticsearch:
    Container ID:  docker://941824d0c9186862372c793d41d578a5e34c0972c877771d00629dc375593530
    Image:         docker.elastic.co/elasticsearch/elasticsearch:5.6.0
    Image ID:      docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:f95e7d4256197a9bb866b166d9ad37963dc7c5764d6ae6400e551f4987a659d7
    Port:          <none>
    Host Port:     <none>
    Command:
      bin/elasticsearch
    Args:
      -Ehttp.host=0.0.0.0
      -Etransport.host=127.0.0.1
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Sat, 28 Apr 2018 16:50:57 +0800
      Finished:     Sat, 28 Apr 2018 16:50:57 +0800
    Ready:          False
    Restart Count:  6
    Readiness:      exec [curl --fail --silent --output /dev/null --user elastic:changeme localhost:9200] delay=5s timeout=4s period=5s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8l8qt (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  data:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  default-token-8l8qt:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-8l8qt
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age               From                 Message
  ----     ------                 ----              ----                 -------
  Normal   Scheduled              7m                default-scheduler    Successfully assigned elasticsearch-0 to node-1
  Normal   SuccessfulMountVolume  7m                kubelet, node-1  MountVolume.SetUp succeeded for volume "data"
  Normal   SuccessfulMountVolume  7m                kubelet, node-1  MountVolume.SetUp succeeded for volume "default-token-8l8qt"
  Normal   Pulling                6m (x4 over 7m)   kubelet, node-1  pulling image "docker.elastic.co/elasticsearch/elasticsearch:5.6.0"
  Normal   Pulled                 6m (x4 over 7m)   kubelet, node-1  Successfully pulled image "docker.elastic.co/elasticsearch/elasticsearch:5.6.0"
  Normal   Created                6m (x4 over 7m)   kubelet, node-1  Created container
  Normal   Started                6m (x4 over 7m)   kubelet, node-1  Started container
  Warning  BackOff                2m (x22 over 7m)  kubelet, node-1  Back-off restarting failed container

log

# kubectl logs -n jaeger  elasticsearch-0
# nothing shown.
@pavolloffay
Copy link
Member

@chalvern hi, did you manage to solve it? As there are no log's it's hard to find out what caused the issue.

@chalvern
Copy link
Author

chalvern commented May 2, 2018

@pavolloffay I am afraid not, but possibly be source limit, as my k8s cluster is setted on 2 vm machines, each with 2cpu/2Gmemory.
I will check it after in my free time.

@chalvern
Copy link
Author

chalvern commented May 3, 2018

As what I said, out of memory...

May  3 21:27:03 xxx-1 kernel: [74354.386802] Out of memory: Kill process 35184 (java) score 1621 or sacrifice child
May  3 21:27:03 xxx-1 kernel: [74354.387300] Killed process 35184 (java) total-vm:2599788kB, anon-rss:1262648kB, file-rss:0kB

@pavolloffay
Copy link
Member

Then it's environment issue, I will close it. If anything pops up up feel free to reopen.

@chalvern
Copy link
Author

chalvern commented May 3, 2018

Finally, my solution is to add the following ENV config to elasticsearch.yml

env:
  - name: ES_JAVA_OPTS
     value: -Xms256m -Xmx512m
  - name: bootstrap.memory_lock
     value: "true"

@jpkrohling
Copy link
Collaborator

I'm reopening this, so that we apply @chalvern's env vars to elasticsearch.yml.

@jpkrohling jpkrohling reopened this May 7, 2018
@jpkrohling
Copy link
Collaborator

@chalvern would you be interested in contributing a fix to this?

@pavolloffay
Copy link
Member

-Xms256m -Xmx512m seems very low for Elasticsearch. For example openshift logging uses 8gb by default

@pavolloffay
Copy link
Member

@chalvern
Copy link
Author

chalvern commented Jul 12, 2018

@jpkrohling I worry the -Xms256m -Xmx512m is too low to use in production, just as @pavolloffay mentioned. The yaml of Elasticsearch in production seems actually like in test instead of in production.

What I suggest is to take it as test. In production, there should be replica of Elasticsearch or called cluster.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants