Skip to content

tenant-escape

ugallu-tenant-escape answers: “did anything in tenant A do something it shouldn’t have, against tenant B?”. The operator maintains an in-memory index of TenantBoundary CRs (each declares the namespaces / SAs / network policies that make up a tenant) and runs four detectors against the audit-detection event bus + an optional tetragon-bridge stream.

SE typeSourceTrigger
CrossTenantSecretAccessaudit-busget / list on a Secret whose namespace belongs to a different tenant than the requester’s SA.
CrossTenantHostPathOverlapaudit-busA Pod create whose volumes mount a host path already mounted by a Pod in a different tenant.
CrossTenantNetworkPolicyaudit-busA NetworkPolicy / CiliumNetworkPolicy is created/updated whose ingress or egress crosses a tenant boundary.
CrossTenantExectetragon-bridgeA kubectl exec (or any process entering a Pod’s net namespace from outside) where caller and target are in different tenants. Optional - needs the bridge.

A boundary is a tuple of {namespaces, serviceAccounts, networkPolicies} that the operator considers “one tenant”. Boundaries can be a singleton (cluster-wide tenancy model) or per-namespace. The index rebuilds on every Add/Update/Delete - there’s no warm-up window because boundaries change rarely and the rebuild is O(n).

apiVersion: security.ugallu.io/v1alpha1
kind: TenantBoundary
metadata: { name: team-payments, namespace: ugallu-system }
spec:
namespaces: [payments, payments-staging]
serviceAccounts:
- { namespace: payments, name: deployer }
- { namespace: payments, name: ci-runner }
networkPolicies:
- { namespace: payments, name: deny-from-other-tenants }

TenantBoundary has no phase. The reconciler treats every CR write as an index rebuild request and refreshes status.matchedNamespaces plus status.matchedPods per tick. Detection itself is event-driven and runs as a manager Runnable.

on each TenantBoundary event:
boundary := Get(req)
index.Rebuild() # O(n) over all boundaries
patch Status.MatchedNamespaces, Status.MatchedPods, Status.LastReconcileAt
if overlapsAnotherBoundary(boundary):
emitSE(TenantBoundaryOverlap, critical)

Reconcile loop (audit-bus + bridge dispatchers)

Section titled “Reconcile loop (audit-bus + bridge dispatchers)”
audit-bus dispatcher (always on):
for ev in audit-bus stream:
for det in [Secrets, HostPathOverlap, NetworkPolicy]:
if det.Match(ev, index): emitSE(det.Type, critical)
bridge dispatcher (optional, requires --bridge-endpoint):
for ev in tetragon-bridge exec stream:
if ExecDetector.Match(ev, index): emitSE(CrossTenantExec, critical)

Operator restart: index rebuilt from scratch by re-Listing every TenantBoundary CR. Boundaries are durable, so no detection data is lost. Audit-bus + bridge gRPC streams reconnect with exponential backoff. Detector chain is stateless per event.

Pod killed mid-dispatch: the new pod reconnects to both streams and resumes from the bus’s current position (the bus does not replay missed events to a reconnecting consumer; the gap is visible as TenantEscapeSourceLagged if it crosses the threshold).

  • Empty selector defensive default. A TenantBoundary with an empty namespaceSelector matches nothing (not everything, which is the K8s default). Forces explicit tenancy modeling.
  • Cilium support is optional. Without ciliumNetworkPolicies enabled in the chart, only the v1 NetworkPolicy detector runs.
  • Bridge optional. Empty --bridge-endpoint disables the exec detector but the three audit-bus detectors keep working.
  • Cross-CR overlap reported, never silently merged. Two boundaries claiming the same namespace produce a TenantBoundaryOverlap SE.
# ClusterRole
rules:
- apiGroups: [security.ugallu.io]
resources: [tenantboundaries, tenantboundaries/status]
verbs: [get, list, watch, update, patch]
- apiGroups: [security.ugallu.io]
resources: [securityevents]
verbs: [create]
- apiGroups: [""]
resources: [namespaces, pods, secrets, serviceaccounts]
verbs: [get, list, watch]
- apiGroups: [networking.k8s.io]
resources: [networkpolicies]
verbs: [get, list, watch]
- apiGroups: [cilium.io]
resources: [ciliumnetworkpolicies]
verbs: [get, list, watch] # optional
# Namespaced Role
- apiGroups: [coordination.k8s.io]
resources: [leases]
verbs: [get, list, watch, create, update, patch, delete]
- apiGroups: [""]
resources: [events]
verbs: [create, patch]
  • TenantBoundary
    • singleton or per-namespace; status carries detector counts.

--cluster-id, --cluster-name, --audit-bus-endpoint, --audit-bus-token, --audit-bus-consumer-name (default tenant-escape), --bridge-endpoint (empty disables the exec detector), --bridge-token.

Deployment (2 replicas) in ugallu-system, leader election on, priorityClassName=system-cluster-critical.

ugallu_tenant_escape_detector_fires_total{detector}, ugallu_tenant_boundaries_active, ugallu_tenant_escape_audit_bus_events_total, ugallu_tenant_escape_bridge_events_total.

The exec detector is the only one that requires the tetragon-bridge - without it, you still get the three audit-driven detectors at full fidelity. The bridge mostly narrows down the gap between “an SA crossed a boundary according to the apiserver” and “a process actually entered another tenant’s net namespace”.