Observability¶
This page covers the signals available from a running Varnish Gateway: health endpoints, Prometheus metrics, the built-in dashboard, and log access.
Health endpoints¶
Chaperone exposes HTTP endpoints on the health address. In
operator-managed pods this is port 8081 (the operator sets
HEALTH_ADDR=:8081 and points the readiness probe at it); standalone
chaperone falls back to :8080 if the variable is unset.
| Path | Method | Description |
|---|---|---|
/health |
GET | Returns 200 when the pod is ready, 503 when not ready or draining. |
/drain |
GET | Initiates graceful shutdown. The pod stops accepting new connections. |
/debug/backends |
GET | Exposes varnishadm backend.list output. Accepts format=json and detailed=true query parameters. |
/metrics |
GET | Prometheus metrics endpoint. |
The /health endpoint is used by the Kubernetes readiness probe. A pod
becomes ready after the initial VCL load and the first ghost reload both
complete. On SIGTERM the pod enters draining state and /health
returns 503, giving the Service time to remove the pod from the
endpoint list before connections close.
Prometheus metrics¶
Chaperone metrics¶
Chaperone registers its own metrics on the /metrics endpoint alongside
Go runtime and process collectors.
Counters¶
| Metric | Description |
|---|---|
chaperone_ghost_reloads_total |
Ghost reload attempts |
chaperone_ghost_reload_errors_total |
Failed ghost reloads |
chaperone_vcl_reloads_total |
VCL hot-reload attempts |
chaperone_vcl_reload_errors_total |
Failed VCL reloads |
chaperone_tls_reloads_total |
TLS certificate reload attempts |
chaperone_tls_reload_errors_total |
Failed TLS reloads |
chaperone_endpoint_changes_total |
EndpointSlice change events |
Gauges¶
| Metric | Description |
|---|---|
chaperone_ready |
1 when the pod is ready, 0 otherwise |
chaperone_draining |
1 when the pod is draining |
Varnish metrics¶
Chaperone runs varnishstat -1 -j periodically and exposes each counter
as a Prometheus metric. Counter names are lowercased with dots replaced
by underscores — for example, MAIN.cache_hit becomes
varnish_main_cache_hit. Varnishstat counters flagged as cumulative
are exposed as Prometheus counters; all others as gauges.
Operator metrics¶
The operator exposes controller-runtime metrics on its own metrics address, port 8080 by default. These include reconciliation latency, work queue depth, and error counts — standard controller-runtime metrics documented upstream.
Prometheus scraping¶
The Helm chart can create ServiceMonitor (operator) and PodMonitor
(chaperone) resources for auto-discovery by the
prometheus-operator.
This requires the monitoring.coreos.com CRDs to be installed — for
example, via kube-prometheus-stack.
# values.yaml
serviceMonitor:
enabled: true
interval: 30s
scrapeTimeout: 10s
# Match your Prometheus instance's serviceMonitorSelector / podMonitorSelector.
# For kube-prometheus-stack the default selector is `release: <helm-release>`.
labels:
release: kube-prometheus-stack
The operator metrics Service ({release}-varnish-gateway-operator-metrics)
is created unconditionally so you can port-forward it without enabling the
monitor objects.
The chaperone PodMonitor selects all Gateway pods across namespaces by the
app.kubernetes.io/managed-by: varnish-gateway-operator label, so a single
monitor covers every Gateway the operator manages.
Grafana dashboards¶
The chart ships four Grafana dashboards under
charts/varnish-gateway/dashboards/:
| Dashboard | UID | Focus |
|---|---|---|
| Varnish | varnish-gateway-varnish |
Client rate, hit ratio, backend errors, threads, storage, bans |
| Chaperone | varnish-gateway-chaperone |
Ready/draining state, reload rates and errors, endpoint churn |
| Operator | varnish-gateway-operator |
Reconcile latency, workqueue queue/work duration, client-go codes |
| Soak — Resources | varnish-gateway-soak-resources |
Heap, RSS, goroutines, FDs, GC, CPU, restarts — side-by-side leak detector |
The Varnish and Chaperone dashboards target day-to-day operations; the Operator and Soak dashboards target regression detection during soak tests and upgrades.
Two packaging options:
Helm (auto-discovery via the Grafana sidecar). Set
dashboards.enabled=true. Each JSON file becomes a ConfigMap carrying
the grafana_dashboard: "1" label, which kube-prometheus-stack's
Grafana sidecar watches by default:
# values.yaml
dashboards:
enabled: true
# Optional folder hint for grafana-operator-style sidecars.
# annotations:
# grafana_folder: Varnish Gateway
Manual import. Open Grafana → Dashboards → Import → paste the JSON
from charts/varnish-gateway/dashboards/*.json.
Dashboards target Grafana v12 (schemaVersion: 41).
Dashboard¶
Chaperone includes an embedded dashboard, enabled by default on port
9000 inside the container. It is not exposed outside the pod via the
Service, but you can reach it with kubectl port-forward:
Then open http://localhost:9000 in a browser.
The dashboard exposes:
| Path | Description |
|---|---|
/ |
HTML dashboard UI |
/api/state |
JSON snapshot: ready/draining state, uptime, vhosts, services, recent events |
/api/events |
Server-Sent Events stream of reload, endpoint, and state-change events |
/api/varnishlog |
Server-Sent Events stream of live varnishlog-json output, with query parameter filtering |
The /api/varnishlog endpoint accepts query parameters for filtering:
| Parameter | Description |
|---|---|
q |
VSL query filter |
g |
Grouping: request, vxid, session, raw |
R |
Rate limit, e.g. 10/s |
mode |
b for backend only, c for client only, empty for both |
i |
Include tags, comma-separated |
x |
Exclude tags |
There is a limit of two concurrent varnishlog sessions per pod.
Logs¶
Operator logs¶
The operator writes structured logs to stdout via log/slog:
Chaperone logs¶
Chaperone also writes structured logs to stdout. These cover varnishd lifecycle events, reload successes and failures, and endpoint changes:
Varnish request logs¶
Varnish request logs are not emitted by default. There are three ways to access them:
Logging sidecar — enable spec.logging on your GatewayClassParameters
to run a sidecar that streams varnishlog, varnishlog-json, or varnishncsa
to stdout. See guides/logging.md for configuration
and examples.
kubectl exec — for ad-hoc debugging, run varnishlog directly inside the chaperone container:
kubectl exec -it <gateway-pod> -c varnish-gateway -- \
varnishlog -n /var/run/varnish/vsm -g request
Dashboard — if the dashboard is enabled, the /api/varnishlog
endpoint streams filtered varnishlog-json output over Server-Sent Events.
See also¶
- Logging guide — sidecar configuration and varnishlog query examples
- Troubleshooting
- GatewayClassParameters reference