Skip to content

Verify a Velero backup

A backup that has never been restored isn’t a backup. The backup-verify operator runs scheduled or ad-hoc verification jobs in two modes - cheap checksum-only and the real full-restore mode that creates a sandbox namespace, applies the Velero Restore CR, diffs the result against the backup manifest, then tears the sandbox + Restore down.

This recipe walks through full-restore end-to-end.

  • ugallu umbrella chart installed with backup-verify enabled.
  • Velero installed in velero namespace, with a working BackupStorageLocation (“BSL”) and at least one completed Backup.
  • Permissions: the backup-verify ServiceAccount must be able to CRUD Restores in the velero namespace and read/delete the sandbox namespace cluster-wide. The umbrella chart provisions this RBAC by default.
Terminal window
kubectl -n velero get backups.velero.io
# nightly-2026-04-29 Completed ...

The sandbox namespace name must end with -bvsandbox - a ValidatingAdmissionPolicy enforces the suffix so accidental full-restores can’t target a production namespace.

bvr-nightly.yaml
apiVersion: security.ugallu.io/v1alpha1
kind: BackupVerifyRun
metadata:
name: nightly-velero-fullrestore
namespace: ugallu-system
spec:
backend: velero
backupRef:
name: nightly-2026-04-29
namespace: velero
mode: full-restore
sandboxNamespace: nightly-2026-04-29-bvsandbox
timeout: 10m
Terminal window
kubectl apply -f bvr-nightly.yaml
Terminal window
kubectl -n ugallu-system get backupverifyruns -w

You should see the Phase walk through:

Pending -> Running -> Succeeded

While Running, the controller:

  1. creates a velero.io/v1.Restore CR in velero namespace with namespaceMapping aimed at the sandbox namespace
  2. polls the Restore’s phase every 10s until terminal (Completed, Failed, PartiallyFailed)
  3. lists Pods, ConfigMaps, Secrets and ServiceAccounts in the sandbox and diffs counts against the Backup’s manifest
  4. emits BackupVerifyCompleted (or BackupVerifyMismatch if any finding crosses high severity)
  5. deletes the sandbox namespace and the Restore CR
Terminal window
kubectl -n ugallu-system get backupverifyresult nightly-velero-fullrestore-result -o yaml

Key fields:

  • status.worstSeverity - the headline grade. Anything medium or above is worth investigating.
  • findings[] - per-finding details: code, severity, detail, evidence. A velero-restore-completed finding at info severity means the restore ran clean.
  • restoredObjectCount - how many K8s objects materialised in the sandbox.
  • checksum - empty for full-restore mode (the diff is the validation, not a checksum).
Terminal window
kubectl get securityevents -A -l \
ugallu.io/source=ugallu-backup-verify \
--sort-by=.metadata.creationTimestamp

The SE class flips from Compliance to Detection if any finding crosses high severity. That’s the signal that drives forensics or paging - a clean run is Compliance / info, a real corruption is Detection / critical.

For nightly verification add a CronJob that creates a fresh CR each run (CRs are immutable after creation):

apiVersion: batch/v1
kind: CronJob
metadata:
name: nightly-bvr
namespace: ugallu-system
spec:
schedule: "30 2 * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: ugallu-backup-verify-cron
restartPolicy: OnFailure
containers:
- name: kubectl
image: bitnami/kubectl:1.31
command:
- /bin/sh
- -c
- |
TS=$(date -u +%Y%m%d%H%M%S)
cat <<EOF | kubectl apply -f -
apiVersion: security.ugallu.io/v1alpha1
kind: BackupVerifyRun
metadata:
name: nightly-${TS}
namespace: ugallu-system
spec:
backend: velero
backupRef:
name: nightly-${TS}
namespace: velero
mode: full-restore
sandboxNamespace: bv-${TS}-bvsandbox
timeout: 15m
EOF

(Adjust the bitnami/kubectl image to your preferred mirror - the project policy bans Bitnami in production.)

TTLConfig keeps BackupVerifyResult CRs for 30 days by default

  • enough to walk back through the last month of nightly runs without disk pressure.