Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poddisruptionbudget is not allowing any disruptions #675

Open
balonik opened this issue Apr 24, 2023 · 1 comment
Open

poddisruptionbudget is not allowing any disruptions #675

balonik opened this issue Apr 24, 2023 · 1 comment

Comments

@balonik
Copy link

balonik commented Apr 24, 2023

I don't know the intentions of #353 and if it is supposed to be only for jobs or also for TaskManager or JobManager. In current setup, there is only one PodDisruptionBudget per cluster which includes all pods: jobs, taskmanager, jobmanager, ..., because the selector labels are not specific enough. Or the logic behind how desired number of pods is calculated is faulty.

spec of existing PodDisruptionBudget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  creationTimestamp: "2023-04-24T13:58:26Z"
  generation: 1
  labels:
    app: flink
    cluster: flink-cluster-new
  name: flink-flink-cluster-new
  namespace: gorr
  ownerReferences:
  - apiVersion: flinkoperator.k8s.io/v1beta1
    blockOwnerDeletion: false
    controller: true
    kind: FlinkCluster
    name: flink-cluster-new
    uid: 91f1c563-ca72-4539-aeaa-586d57942cd5
  resourceVersion: "2016776591"
  uid: aa825413-a554-4fa0-a154-67120afdc135
spec:
  maxUnavailable: 0%
  selector:
    matchLabels:
      app: flink
      cluster: flink-cluster-new
status:
  conditions:
  - lastTransitionTime: "2023-04-24T13:58:26Z"
    message: jobs.batch does not implement the scale subresource
    observedGeneration: 1
    reason: SyncFailed
    status: "False"
    type: DisruptionAllowed
  currentHealthy: 0
  desiredHealthy: 4
  disruptionsAllowed: 0
  expectedPods: 4
  observedGeneration: 1

and running pods:

$ kgpol app=flink,cluster=flink-cluster-new
NAME                                        READY   STATUS    RESTARTS   AGE
flink-cluster-new-job-submitter-8nh27   1/1     Running   0          25m
flink-cluster-new-jobmanager-0          1/1     Running   0          25m
flink-cluster-new-taskmanager-0         1/1     Running   0          25m
flink-cluster-new-taskmanager-1         1/1     Running   0          25m
flink-cluster-new-taskmanager-2         1/1     Running   0          25m

This means that Pod can never be safely evicted to another node and just dies after the node is removed from the cluster or shutdown. I would prefer to have PdB per Pod type.

I have version 0.4.0, I will try the 0.5.0 if there are any changes around this.

@regadas
Copy link
Contributor

regadas commented Apr 28, 2023

Hi @balonik yeah with 0.5.0 you will be able to customize the PDB further. Also note that the PDB is opt-in now and it's not created by default.

That said, I think it would be great if we support a PDB per type (JobManager / TaskManager) instead of global one. PR around this is very welcomed if you are interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants