poddisruptionbudget is not allowing any disruptions #675

balonik · 2023-04-24T14:28:39Z

I don't know the intentions of #353 and if it is supposed to be only for jobs or also for TaskManager or JobManager. In current setup, there is only one PodDisruptionBudget per cluster which includes all pods: jobs, taskmanager, jobmanager, ..., because the selector labels are not specific enough. Or the logic behind how desired number of pods is calculated is faulty.

spec of existing PodDisruptionBudget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  creationTimestamp: "2023-04-24T13:58:26Z"
  generation: 1
  labels:
    app: flink
    cluster: flink-cluster-new
  name: flink-flink-cluster-new
  namespace: gorr
  ownerReferences:
  - apiVersion: flinkoperator.k8s.io/v1beta1
    blockOwnerDeletion: false
    controller: true
    kind: FlinkCluster
    name: flink-cluster-new
    uid: 91f1c563-ca72-4539-aeaa-586d57942cd5
  resourceVersion: "2016776591"
  uid: aa825413-a554-4fa0-a154-67120afdc135
spec:
  maxUnavailable: 0%
  selector:
    matchLabels:
      app: flink
      cluster: flink-cluster-new
status:
  conditions:
  - lastTransitionTime: "2023-04-24T13:58:26Z"
    message: jobs.batch does not implement the scale subresource
    observedGeneration: 1
    reason: SyncFailed
    status: "False"
    type: DisruptionAllowed
  currentHealthy: 0
  desiredHealthy: 4
  disruptionsAllowed: 0
  expectedPods: 4
  observedGeneration: 1

and running pods:

$ kgpol app=flink,cluster=flink-cluster-new
NAME                                        READY   STATUS    RESTARTS   AGE
flink-cluster-new-job-submitter-8nh27   1/1     Running   0          25m
flink-cluster-new-jobmanager-0          1/1     Running   0          25m
flink-cluster-new-taskmanager-0         1/1     Running   0          25m
flink-cluster-new-taskmanager-1         1/1     Running   0          25m
flink-cluster-new-taskmanager-2         1/1     Running   0          25m

This means that Pod can never be safely evicted to another node and just dies after the node is removed from the cluster or shutdown. I would prefer to have PdB per Pod type.

I have version 0.4.0, I will try the 0.5.0 if there are any changes around this.

The text was updated successfully, but these errors were encountered:

regadas · 2023-04-28T10:14:24Z

Hi @balonik yeah with 0.5.0 you will be able to customize the PDB further. Also note that the PDB is opt-in now and it's not created by default.

That said, I think it would be great if we support a PDB per type (JobManager / TaskManager) instead of global one. PR around this is very welcomed if you are interested.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

poddisruptionbudget is not allowing any disruptions #675

poddisruptionbudget is not allowing any disruptions #675

balonik commented Apr 24, 2023 •

edited

Loading

regadas commented Apr 28, 2023

poddisruptionbudget is not allowing any disruptions #675

poddisruptionbudget is not allowing any disruptions #675

Comments

balonik commented Apr 24, 2023 • edited Loading

regadas commented Apr 28, 2023

balonik commented Apr 24, 2023 •

edited

Loading