Observability¶

This page covers the signals available from a running Varnish Gateway: health endpoints, Prometheus metrics, the built-in dashboard, and log access.

Health endpoints¶

Chaperone exposes HTTP endpoints on the health address. In operator-managed pods this is port 8081 (the operator sets HEALTH_ADDR=:8081 and points the readiness probe at it); standalone chaperone falls back to :8080 if the variable is unset.

Path	Method	Description
`/health`	GET	Returns 200 when the pod is ready, 503 when not ready or draining.
`/drain`	GET	Initiates graceful shutdown. The pod stops accepting new connections.
`/debug/backends`	GET	Exposes `varnishadm backend.list` output. Accepts `format=json` and `detailed=true` query parameters.
`/metrics`	GET	Prometheus metrics endpoint.

The /health endpoint is used by the Kubernetes readiness probe. A pod becomes ready after the initial VCL load and the first ghost reload both complete. On SIGTERM the pod enters draining state and /health returns 503, giving the Service time to remove the pod from the endpoint list before connections close.

Prometheus metrics¶

Chaperone metrics¶

Chaperone registers its own metrics on the /metrics endpoint alongside Go runtime and process collectors.

Counters¶

Metric	Description
`chaperone_ghost_reloads_total`	Ghost reload attempts
`chaperone_ghost_reload_errors_total`	Failed ghost reloads
`chaperone_vcl_reloads_total`	VCL hot-reload attempts
`chaperone_vcl_reload_errors_total`	Failed VCL reloads
`chaperone_tls_reloads_total`	TLS certificate reload attempts
`chaperone_tls_reload_errors_total`	Failed TLS reloads
`chaperone_endpoint_changes_total`	EndpointSlice change events

Gauges¶

Metric	Description
`chaperone_ready`	1 when the pod is ready, 0 otherwise
`chaperone_draining`	1 when the pod is draining

Varnish metrics¶

Chaperone runs varnishstat -1 -j periodically and exposes each counter as a Prometheus metric. Counter names are lowercased with dots replaced by underscores — for example, MAIN.cache_hit becomes varnish_main_cache_hit. Varnishstat counters flagged as cumulative are exposed as Prometheus counters; all others as gauges.

Operator metrics¶

The operator exposes controller-runtime metrics on its own metrics address, port 8080 by default. These include reconciliation latency, work queue depth, and error counts — standard controller-runtime metrics documented upstream.

Prometheus scraping¶

The Helm chart can create ServiceMonitor (operator) and PodMonitor (chaperone) resources for auto-discovery by the prometheus-operator. This requires the monitoring.coreos.com CRDs to be installed — for example, via kube-prometheus-stack.

# values.yaml
serviceMonitor:
  enabled: true
  interval: 30s
  scrapeTimeout: 10s
  # Match your Prometheus instance's serviceMonitorSelector / podMonitorSelector.
  # For kube-prometheus-stack the default selector is `release: <helm-release>`.
  labels:
    release: kube-prometheus-stack

The operator metrics Service ({release}-varnish-gateway-operator-metrics) is created unconditionally so you can port-forward it without enabling the monitor objects.

The chaperone PodMonitor selects all Gateway pods across namespaces by the app.kubernetes.io/managed-by: varnish-gateway-operator label, so a single monitor covers every Gateway the operator manages.

Grafana dashboards¶

The chart ships four Grafana dashboards under charts/varnish-gateway/dashboards/:

Dashboard	UID	Focus
Varnish	`varnish-gateway-varnish`	Client rate, hit ratio, backend errors, threads, storage, bans
Chaperone	`varnish-gateway-chaperone`	Ready/draining state, reload rates and errors, endpoint churn
Operator	`varnish-gateway-operator`	Reconcile latency, workqueue queue/work duration, client-go codes
Soak — Resources	`varnish-gateway-soak-resources`	Heap, RSS, goroutines, FDs, GC, CPU, restarts — side-by-side leak detector

The Varnish and Chaperone dashboards target day-to-day operations; the Operator and Soak dashboards target regression detection during soak tests and upgrades.

Two packaging options:

Helm (auto-discovery via the Grafana sidecar). Set dashboards.enabled=true. Each JSON file becomes a ConfigMap carrying the grafana_dashboard: "1" label, which kube-prometheus-stack's Grafana sidecar watches by default:

# values.yaml
dashboards:
  enabled: true
  # Optional folder hint for grafana-operator-style sidecars.
  # annotations:
  #   grafana_folder: Varnish Gateway

Manual import. Open Grafana → Dashboards → Import → paste the JSON from charts/varnish-gateway/dashboards/*.json.

Dashboards target Grafana v12 (schemaVersion: 41).

Dashboard¶

Chaperone includes an embedded dashboard, enabled by default on port 9000 inside the container. It is not exposed outside the pod via the Service, but you can reach it with kubectl port-forward:

kubectl port-forward <gateway-pod> 9000:9000

Then open http://localhost:9000 in a browser.

The dashboard exposes:

Path	Description
`/`	HTML dashboard UI
`/api/state`	JSON snapshot: ready/draining state, uptime, vhosts, services, recent events
`/api/events`	Server-Sent Events stream of reload, endpoint, and state-change events
`/api/varnishlog`	Server-Sent Events stream of live varnishlog-json output, with query parameter filtering

The /api/varnishlog endpoint accepts query parameters for filtering:

Parameter	Description
`q`	VSL query filter
`g`	Grouping: `request`, `vxid`, `session`, `raw`
`R`	Rate limit, e.g. `10/s`
`mode`	`b` for backend only, `c` for client only, empty for both
`i`	Include tags, comma-separated
`x`	Exclude tags

There is a limit of two concurrent varnishlog sessions per pod.

Logs¶

Operator logs¶

The operator writes structured logs to stdout via log/slog:

kubectl logs -n varnish-gateway-system -l app.kubernetes.io/component=operator -f

Chaperone logs¶

Chaperone also writes structured logs to stdout. These cover varnishd lifecycle events, reload successes and failures, and endpoint changes:

kubectl logs -f <gateway-pod> -c varnish-gateway

Varnish request logs¶

Varnish request logs are not emitted by default. There are three ways to access them:

Logging sidecar — enable spec.logging on your GatewayClassParameters to run a sidecar that streams varnishlog, varnishlog-json, or varnishncsa to stdout. See guides/logging.md for configuration and examples.

kubectl logs -f <gateway-pod> -c varnish-log

kubectl exec — for ad-hoc debugging, run varnishlog directly inside the chaperone container:

kubectl exec -it <gateway-pod> -c varnish-gateway -- \
  varnishlog -n /var/run/varnish/vsm -g request

Dashboard — if the dashboard is enabled, the /api/varnishlog endpoint streams filtered varnishlog-json output over Server-Sent Events.