Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch K8s Operator not restarting the bootstarp, coordinators, nodes and master pods after upgrade the version #849

Open
rameshar16 opened this issue Jun 22, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@rameshar16
Copy link

What is the bug?

OpenSearch K8s Operator not restarting the bootstarp, coordinators, nodes and master pods after upgrade the version.

How can one reproduce the bug?

Change the version from 2.4.0 to 2.6.0 and run the helm upgrade to deploy the new version.

What is the expected behavior?

OpenSearch K8s Operator should restart the bootstarp, coordinators, nodes and master pods after upgrade the version.

What is your host/environment?

AWS EKS cluster and deployed OpenSearch K8s Operator. Deployed OpenSearch cluster using helm chart.

Do you have any screenshots?

No

Do you have any additional context?

No

@rameshar16 rameshar16 added bug Something isn't working untriaged Issues that have not yet been triaged labels Jun 22, 2024
@rameshar16
Copy link
Author

rameshar16 commented Jun 22, 2024

`>helm diff upgrade opensearch-cr ./charts/opensearch-cluster/ --values ./charts/opensearch-cluster/values.yaml -n opensearch
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
name: opensearch-cr
namespace: opensearch
spec:
general:
**- version: 2.4.0

  • version: 2.6.0**
    httpPort: 9200
    vendor: opensearch
    serviceName: opensearch-svc
    setVMMaxMapCount: true
    drainDataNodes: true
    
    dashboards:
    service:
    type: NodePort
    tolerations:
    - effect: NoSchedule
    key: dedicated
    operator: Equal
    value: opensearch
    version: 2.3.0
    enable: true
    replicas: 1
    resources:
    requests:
    memory: 1Gi
    cpu: 500m
    limits:
    memory: 1Gi
    cpu: 500m
    tls:
    enable: true # Configure TLS
    secret:
    name: ssl-secret
    nodePools:
    - component: masters
    tolerations:
    - effect: NoSchedule
    key: dedicated
    operator: Equal
    value: opensearch
    replicas: 3
    pdb: # Add pdb configuration
    enable: true
    minAvailable: 3
    diskSize: "30Gi"
    nodeSelector:
    eks.amazonaws.com/nodegroup: opensearch-cluster
    resources:
    requests:
    memory: 2Gi
    cpu: 500m
    limits:
    memory: 2Gi
    cpu: 500m
    roles:
    - cluster_manager
    persistence:
    pvc:
    storageClass: opensearch
    accessModes:
    - ReadWriteOnce
    - component: nodes
    tolerations:
    - effect: NoSchedule
    key: dedicated
    operator: Equal
    value: opensearch
    replicas: 3
    pdb: # Add pdb configuration
    enable: true
    minAvailable: 2
    diskSize: "30Gi"
    nodeSelector:
    eks.amazonaws.com/nodegroup: opensearch-cluster
    resources:
    requests:
    memory: 2Gi
    cpu: 500m
    limits:
    memory: 2Gi
    cpu: 500m
    roles:
    - data
    persistence:
    pvc:
    storageClass: opensearch
    accessModes:
    - ReadWriteOnce
    - component: coordinators
    tolerations:
    - effect: NoSchedule
    key: dedicated
    operator: Equal
    value: opensearch
    replicas: 3
    pdb: # Add pdb configuration
    enable: true
    minAvailable: 2
    diskSize: "30Gi"
    nodeSelector:
    eks.amazonaws.com/nodegroup: opensearch-cluster
    resources:
    requests:
    memory: 2Gi
    cpu: 500m
    limits:
    memory: 2Gi
    cpu: 500m
    roles:
    - ingest
    persistence:
    pvc:
    storageClass: opensearch
    accessModes:
    - ReadWriteOnce
    security:
    tls:
    transport:
    http:

helm upgrade opensearch-cr ./charts/opensearch-cluster/ --values ./charts/opensearch-cluster/values.yaml -n opensearch
Release "opensearch-cr" has been upgraded. Happy Helming!
NAME: opensearch-cr
LAST DEPLOYED: Sat Jun 22 11:45:33 2024
NAMESPACE: opensearch
STATUS: deployed
REVISION: 10
TEST SUITE: None`

@rameshar16
Copy link
Author

rameshar16 commented Jun 22, 2024

>k get po -n opensearch -w
NAME                                                      READY   STATUS    RESTARTS   AGE
opensearch-cr-coordinators-0                              1/1     Running   0          17h
opensearch-cr-coordinators-1                              1/1     Running   0          16h
opensearch-cr-coordinators-2                              1/1     Running   0          16h
opensearch-cr-dashboards-54d6ccb67c-nw9sp                 1/1     Running   0          17h
opensearch-cr-masters-0                                   1/1     Running   0          15m
opensearch-cr-masters-1                                   1/1     Running   0          16h
opensearch-cr-masters-2                                   1/1     Running   0          16h
opensearch-cr-nodes-0                                     1/1     Running   0          26m
opensearch-cr-nodes-1                                     1/1     Running   0          16h
opensearch-cr-nodes-2                                     1/1     Running   0          16h
opensearch-operator-controller-manager-68f76ffd94-knsw4   2/2     Running   0          13m

@prudhvigodithi
Copy link
Member

[Triage]
Adding @swoehrl-mw to please verify this, I assume we have seen similar issue in past and was fixed with PR #789 right?
Thank you

@prudhvigodithi prudhvigodithi removed the untriaged Issues that have not yet been triaged label Jun 24, 2024
@prudhvigodithi
Copy link
Member

Also @rameshar16 I assume you are using the latest version of the operator, can you please confirm ?
Thank you

@swoehrl-mw
Copy link
Collaborator

@prudhvigodithi This does not look like the parallel recovery bug.

@rameshar16 Please check the OpenSearchCluster CR status (kubectl describe opensearchcluster opensearch-cr) what the operator is reporting. Also please verify if your opensearch cluster is healthy (green status). Because it looks like the first pod of each pool was restarted but the operator is not continuing. And that points towards the cluster not being green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

3 participants