Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding extra precondition for replicas and antiAffinity #223

Open
Spedoske opened this issue Jun 9, 2023 · 4 comments
Open

Adding extra precondition for replicas and antiAffinity #223

Spedoske opened this issue Jun 9, 2023 · 4 comments
Assignees

Comments

@Spedoske
Copy link
Collaborator

Spedoske commented Jun 9, 2023

The test will fail because there are 5 replicas, and the podAntiAffinity enforce that there is at most 1 pod on each node. We need to add precondition to check if such situation happened.

apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: test-cluster
  namespace: rabbitmq-system
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution: null
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - test-cluster
        topologyKey: kubernetes.io/hostname
  image: null
  imagePullSecrets: null
  override:
    statefulSet:
      spec:
        template:
          spec:
            containers: []
            ephemeralContainers:
            - name: p
              resources:
                requests:
                  hugepages-2Mi: 1000m
  persistence:
    storage: 50Gi
  rabbitmq:
    additionalConfig: 'cluster_partition_handling = pause_minority

      vm_memory_high_watermark_paging_ratio = 0.99

      disk_free_limit.relative = 1.0

      collect_statistics_interval = 10000

      '
  replicas: 5
  resources:
    limits:
      cpu: 1
      memory: 4Gi
    requests:
      cpu: 1
      memory: 4Gi
  secretBackend: null
  service:
    type: ClusterIP
  skipPostDeploySteps: false
  terminationGracePeriodSeconds: 1024
  tls:
    caSecretName: null
    disableNonTLSListeners: false
    secretName: null
  tolerations: null

@tianyin
Copy link
Member

tianyin commented Jun 9, 2023

This is an interesting finding.

@tylergu why those were not surfaced earlier? because it happened to have the right resource configuration?

@tianyin
Copy link
Member

tianyin commented Jun 9, 2023

@Spedoske What should be right way for the fix?

@Spedoske
Copy link
Collaborator Author

Spedoske commented Jun 9, 2023

According to the kind documentation, we cannot add a node to the running cluster. Also, the rabbitmq operator states that it refuses to scale down. Therefore, I think it is impossible to test antiAffinity with replicas increment when the initial replicas count is 3. So before we testing scaling up, we should remove the podAntiAffinity section from the config.

@tylergu
Copy link
Member

tylergu commented Jun 10, 2023

why those were not surfaced earlier? because it happened to have the right resource configuration?

This was considered as misoperation before. The antiAffinity seemingly to be a valid one, but it leads to unhealthy state because the topology cannot be satisfied at the moment.

@Spedoske To fix this, we can consider configuring the cluster to have more nodes if we are going to run this test.
But there is tension between the number of nodes in each cluster and how many clusters we can run in parallel at the same time. Because each node would be one instance of Kubernetes, thus taking up a lot of resources. We can think about a smarter approach to only have larger size cluster when we need it for the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants