Skip to content

Pod Disruption Budgets

A PodDisruptionBudget (PDB) limits how many Gateway pods can be voluntarily evicted at the same time — by kubectl drain, node autoscaler scale-down, or rolling updates. The operator supports creating a PDB per Gateway, but it is opt-in.

Why opt-in?

The default replica count for a Gateway Deployment is 1. A PDB with minAvailable: 1 on a single-replica Deployment blocks all voluntary evictions indefinitely — node drains hang, cluster autoscaler cannot reclaim the node, and rolling upgrades stall.

Only enable a PDB if you have also arranged for more than one replica, either via an HPA (see horizontal-pod-autoscaling.md) or by manually scaling the Deployment.

Enabling a PDB

Set spec.podDisruptionBudget on the GatewayClassParameters referenced by your GatewayClass. Exactly one of minAvailable or maxUnavailable must be set — the semantics mirror Kubernetes' PodDisruptionBudgetSpec.

yaml apiVersion: gateway.varnish-software.com/v1alpha1 kind: GatewayClassParameters metadata: name: varnish-params namespace: varnish-gateway-system spec: podDisruptionBudget: minAvailable: 1

The operator creates one PDB per Gateway in the Gateway's namespace, named after the Gateway and selecting the Gateway's pods. It is owned by the Gateway and garbage-collected on deletion.

There is no single correct value — it depends on your replica count and your tolerance for reduced capacity during disruptions. Some guidance:

  • minAvailable: N-1 (absolute) — keep all but one pod available. Safest for small replica counts (2–4). Works well when pods are roughly equal in cache fill.
  • maxUnavailable: "25%" (percentage) — scales naturally with HPA. Good default once replicas are in the double digits.
  • minAvailable: "50%" — conservative; keeps at least half the fleet serving. Use when each pod carries a meaningful share of the working set and cache loss on drain matters.

Avoid minAvailable: <replicas> or maxUnavailable: 0 — both disable voluntary disruption entirely, which is almost never what you want.

Interaction with HPA

PDB is evaluated against currently ready pods, not the HPA target. During a scale-down, the HPA removes pods one at a time and the PDB is not consulted. HPA-triggered removals are not voluntary disruptions from the PDB's perspective. PDB does, however, gate node drains and rolling restarts even while the HPA is active.

If your PDB uses a percentage, it is computed against the current replica count at the time of eviction, so it tracks HPA scaling automatically. Absolute minAvailable values do not — a minAvailable: 3 with a replica count of 3 blocks drains just like the single-replica case.

Rolling restarts

Rolling restarts — triggered by infrastructure changes such as listener edits, image bumps, or varnishdExtraArgs changes, as described in the architecture overview — respect the PDB. With minAvailable: 1 and two replicas, a rolling restart proceeds one pod at a time. Keep this in mind when sizing: a tight PDB on a small replica count extends upgrade duration.

Disabling a PDB

Remove spec.podDisruptionBudget from the GatewayClassParameters (or set it to null). The operator deletes the PDB it owns on the next reconcile. This is important: a stale minAvailable: 1 left behind after scaling down to a single replica would silently block future drains.

See also