Skip to content

Hardening

The umbrella chart’s defaults optimise for “boots on a fresh Kubernetes cluster on first try”. For production you want a different set of switches flipped.

hardening:
strict: true

Setting hardening.strict=true is shorthand for the rest of this page. It flips on every preflight check the chart owns, and it makes the chart fail to install when a prerequisite is missing rather than degrade silently.

The preflights are runtime-evaluated, not just chart-rendered:

  • policy/v1beta1.PodSecurityPolicy is not present (PSPs are gone since 1.25; presence is a cluster mis-version signal).
  • the namespace ugallu-system-privileged carries the pod-security.kubernetes.io/enforce=privileged label - without it the privileged DaemonSets won’t admit.
  • the policy.sigstore.dev CRDs from sigstore-policy-controller are present (signature gating for ugallu’s own images).
  • spire.io CRDs are present when attestor.signingMode=fulcio-keyless - the operator needs a trusted SVID issuer.
  • a working DNS name for worm.endpoint resolves AND a TCP probe succeeds.
  • NTP skew at chronyd is < 250ms (the attestor’s Rekor inclusion proofs are timestamp-sensitive).

The CRD missing on the cluster yields a Reason + remediation link in the chart’s helm install --debug output, so the operator team knows what to install before retrying.

worm:
endpoint: https://worm.example.internal
bucket: ugallu-worm-prod
encryption:
mode: sse-kms
kmsKeyID: arn:aws:kms:eu-west-1:...:key/...
retention:
bundle: 10y
forensicsFs: 5y
forensicsMem: 1y

Object Lock must be on at the bucket level in COMPLIANCE mode (not GOVERNANCE - GOVERNANCE allows a privileged user to break retention). The chart can’t enforce this for you because it lives on the bucket, not the cluster; it WILL emit a WormBucketUnlocked SecurityEvent on every startup if it sees a non-locked bucket.

attestor:
signingMode: dual
fulcio:
issuer: https://kubernetes.default.svc.cluster.local
fulcioURL: https://fulcio.sigstore.dev
openbao:
address: https://openbao.openbao:8200
transitMount: transit
keyName: ugallu-attestor-prod
authRole: ugallu-attestor
rekor:
enabled: true
url: https://rekor.sigstore.dev

signingMode=dual is the recommended production posture: every bundle is co-signed by Fulcio (identity-rooted) and OpenBao (key-rooted). Either backend going down still produces signed bundles; both going down is the signal that something infrastructural is broken, not just a flaky external.

resolver:
tls:
mode: spire
spire:
trustDomain: example.internal

In mode=spire the resolver’s gRPC service mounts SPIRE-issued SVIDs and refuses connections without a verified peer certificate. The DNS-detect operator uses the same trust domain to authenticate

  • a misconfigured trust domain produces a clear refusal at startup rather than a silent fallback.

The chart ships a default NetworkPolicy per operator under charts/ugallu/charts/<op>/templates/04-networkpolicy.yaml. The default permits:

  • ingress from the Prometheus operator scrape namespace (configured via monitoring.prometheusNamespace)
  • egress to kube-apiserver, the configured WORM endpoint, the resolver service, and (where applicable) the audit-bus / bridge

For Cilium clusters a parallel CiliumNetworkPolicy with stricter identity-based selection ships under the same template. Toggle the backend with umbrella.networkPolicy.backend=cilium|coreV1.

ui:
auth:
mode: oidc
issuer: https://keycloak.example.internal/realms/ugallu
clientID: ugallu-ui
redirectURL: https://ugallu.example.internal/oauth/callback

The BFF supports OIDC + PKCE only - no password grant, no client credentials, no implicit flow. ServiceAccount impersonation (mode=sa-impersonation) is supported as a fallback for clusters without an OIDC issuer, but the audit log entries it produces will attribute every UI action to the BFF SA, not the human.

A few hardening responsibilities live outside the chart:

  • Container image signing for non-ugallu workloads. The chart signs ugallu’s own images and configures sigstore-policy-controller to gate them; gating workloads from other vendors is a policy you author.
  • Audit logging at the apiserver. The audit-detection operator consumes the audit log; it does not configure the apiserver to produce it. On managed control planes use the platform’s audit webhook output; on self-managed clusters add --audit-webhook-config-file.
  • Backup creation. backup-verify checks Velero / etcd snapshots; it doesn’t take them. Velero schedules and etcd snapshot CronJobs are the cluster operator’s responsibility.