Verify a Velero backup

A backup that has never been restored isn’t a backup. The backup-verify operator runs scheduled or ad-hoc verification jobs in two modes - cheap checksum-only and the real full-restore mode that creates a sandbox namespace, applies the Velero Restore CR, diffs the result against the backup manifest, then tears the sandbox + Restore down.

This recipe walks through full-restore end-to-end.

Prerequisites

ugallu umbrella chart installed with backup-verify enabled.
Velero installed in velero namespace, with a working BackupStorageLocation (“BSL”) and at least one completed Backup.
Permissions: the backup-verify ServiceAccount must be able to CRUD Restores in the velero namespace and read/delete the sandbox namespace cluster-wide. The umbrella chart provisions this RBAC by default.

kubectl -n velero get backups.velero.io
# nightly-2026-04-29   Completed   ...

Create the run

The sandbox namespace name must end with -bvsandbox - a ValidatingAdmissionPolicy enforces the suffix so accidental full-restores can’t target a production namespace.

apiVersion: security.ugallu.io/v1alpha1
kind: BackupVerifyRun
metadata:
  name: nightly-velero-fullrestore
  namespace: ugallu-system
spec:
  backend: velero
  backupRef:
    name: nightly-2026-04-29
    namespace: velero
  mode: full-restore
  sandboxNamespace: nightly-2026-04-29-bvsandbox
  timeout: 10m

kubectl apply -f bvr-nightly.yaml

Watch the pipeline

kubectl -n ugallu-system get backupverifyruns -w

You should see the Phase walk through:

Pending -> Running -> Succeeded

While Running, the controller:

creates a velero.io/v1.Restore CR in velero namespace with namespaceMapping aimed at the sandbox namespace
polls the Restore’s phase every 10s until terminal (Completed, Failed, PartiallyFailed)
lists Pods, ConfigMaps, Secrets and ServiceAccounts in the sandbox and diffs counts against the Backup’s manifest
emits BackupVerifyCompleted (or BackupVerifyMismatch if any finding crosses high severity)
deletes the sandbox namespace and the Restore CR

Inspect the result

kubectl -n ugallu-system get backupverifyresult nightly-velero-fullrestore-result -o yaml

Key fields:

status.worstSeverity - the headline grade. Anything medium or above is worth investigating.
findings[] - per-finding details: code, severity, detail, evidence. A velero-restore-completed finding at info severity means the restore ran clean.
restoredObjectCount - how many K8s objects materialised in the sandbox.
checksum - empty for full-restore mode (the diff is the validation, not a checksum).

Find the SecurityEvent

kubectl get securityevents -A -l \
  ugallu.io/source=ugallu-backup-verify \
  --sort-by=.metadata.creationTimestamp

The SE class flips from Compliance to Detection if any finding crosses high severity. That’s the signal that drives forensics or paging - a clean run is Compliance / info, a real corruption is Detection / critical.

Schedule it

For nightly verification add a CronJob that creates a fresh CR each run (CRs are immutable after creation):

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-bvr
  namespace: ugallu-system
spec:
  schedule: "30 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: ugallu-backup-verify-cron
          restartPolicy: OnFailure
          containers:
            - name: kubectl
              image: bitnami/kubectl:1.31
              command:
                - /bin/sh
                - -c
                - |
                  TS=$(date -u +%Y%m%d%H%M%S)
                  cat <<EOF | kubectl apply -f -
                  apiVersion: security.ugallu.io/v1alpha1
                  kind: BackupVerifyRun
                  metadata:
                    name: nightly-${TS}
                    namespace: ugallu-system
                  spec:
                    backend: velero
                    backupRef:
                      name: nightly-${TS}
                      namespace: velero
                    mode: full-restore
                    sandboxNamespace: bv-${TS}-bvsandbox
                    timeout: 15m
                  EOF

(Adjust the bitnami/kubectl image to your preferred mirror - the project policy bans Bitnami in production.)

TTLConfig keeps BackupVerifyResult CRs for 30 days by default

enough to walk back through the last month of nightly runs without disk pressure.