tenant-escape
ugallu-tenant-escape answers: “did anything in tenant A do
something it shouldn’t have, against tenant B?”. The operator
maintains an in-memory index of TenantBoundary CRs (each
declares the namespaces / SAs / network policies that make up a
tenant) and runs four detectors against the audit-detection event
bus + an optional tetragon-bridge
stream.
Detectors
Section titled “Detectors”| SE type | Source | Trigger |
|---|---|---|
CrossTenantSecretAccess | audit-bus | get / list on a Secret whose namespace belongs to a different tenant than the requester’s SA. |
CrossTenantHostPathOverlap | audit-bus | A Pod create whose volumes mount a host path already mounted by a Pod in a different tenant. |
CrossTenantNetworkPolicy | audit-bus | A NetworkPolicy / CiliumNetworkPolicy is created/updated whose ingress or egress crosses a tenant boundary. |
CrossTenantExec | tetragon-bridge | A kubectl exec (or any process entering a Pod’s net namespace from outside) where caller and target are in different tenants. Optional - needs the bridge. |
TenantBoundary
Section titled “TenantBoundary”A boundary is a tuple of {namespaces, serviceAccounts, networkPolicies}
that the operator considers “one tenant”. Boundaries can be a
singleton (cluster-wide tenancy model) or per-namespace. The index
rebuilds on every Add/Update/Delete - there’s no warm-up
window because boundaries change rarely and the rebuild is O(n).
Example
Section titled “Example”apiVersion: security.ugallu.io/v1alpha1kind: TenantBoundarymetadata: { name: team-payments, namespace: ugallu-system }spec: namespaces: [payments, payments-staging] serviceAccounts: - { namespace: payments, name: deployer } - { namespace: payments, name: ci-runner } networkPolicies: - { namespace: payments, name: deny-from-other-tenants }Internals
Section titled “Internals”State machine
Section titled “State machine”TenantBoundary has no phase. The reconciler treats every CR
write as an index rebuild request and refreshes
status.matchedNamespaces plus status.matchedPods per tick.
Detection itself is event-driven and runs as a manager Runnable.
Reconcile loop (boundary index)
Section titled “Reconcile loop (boundary index)”on each TenantBoundary event: boundary := Get(req) index.Rebuild() # O(n) over all boundaries patch Status.MatchedNamespaces, Status.MatchedPods, Status.LastReconcileAt if overlapsAnotherBoundary(boundary): emitSE(TenantBoundaryOverlap, critical)Reconcile loop (audit-bus + bridge dispatchers)
Section titled “Reconcile loop (audit-bus + bridge dispatchers)”audit-bus dispatcher (always on): for ev in audit-bus stream: for det in [Secrets, HostPathOverlap, NetworkPolicy]: if det.Match(ev, index): emitSE(det.Type, critical)
bridge dispatcher (optional, requires --bridge-endpoint): for ev in tetragon-bridge exec stream: if ExecDetector.Match(ev, index): emitSE(CrossTenantExec, critical)Error recovery
Section titled “Error recovery”Operator restart: index rebuilt from scratch by re-Listing every
TenantBoundary CR. Boundaries are durable, so no detection
data is lost. Audit-bus + bridge gRPC streams reconnect with
exponential backoff. Detector chain is stateless per event.
Crash recovery scenario
Section titled “Crash recovery scenario”Pod killed mid-dispatch: the new pod reconnects to both streams
and resumes from the bus’s current position (the bus does not
replay missed events to a reconnecting consumer; the gap is
visible as TenantEscapeSourceLagged if it crosses the
threshold).
Edge cases
Section titled “Edge cases”- Empty selector defensive default. A
TenantBoundarywith an emptynamespaceSelectormatches nothing (not everything, which is the K8s default). Forces explicit tenancy modeling. - Cilium support is optional. Without
ciliumNetworkPoliciesenabled in the chart, only the v1NetworkPolicydetector runs. - Bridge optional. Empty
--bridge-endpointdisables the exec detector but the three audit-bus detectors keep working. - Cross-CR overlap reported, never silently merged. Two
boundaries claiming the same namespace produce a
TenantBoundaryOverlapSE.
Full RBAC
Section titled “Full RBAC”# ClusterRolerules: - apiGroups: [security.ugallu.io] resources: [tenantboundaries, tenantboundaries/status] verbs: [get, list, watch, update, patch] - apiGroups: [security.ugallu.io] resources: [securityevents] verbs: [create] - apiGroups: [""] resources: [namespaces, pods, secrets, serviceaccounts] verbs: [get, list, watch] - apiGroups: [networking.k8s.io] resources: [networkpolicies] verbs: [get, list, watch] - apiGroups: [cilium.io] resources: [ciliumnetworkpolicies] verbs: [get, list, watch] # optional# Namespaced Role - apiGroups: [coordination.k8s.io] resources: [leases] verbs: [get, list, watch, create, update, patch, delete] - apiGroups: [""] resources: [events] verbs: [create, patch]CRDs owned
Section titled “CRDs owned”TenantBoundary- singleton or per-namespace; status carries detector counts.
Key flags
Section titled “Key flags”--cluster-id, --cluster-name, --audit-bus-endpoint,
--audit-bus-token, --audit-bus-consumer-name (default
tenant-escape), --bridge-endpoint (empty disables the exec
detector), --bridge-token.
Deployment
Section titled “Deployment”Deployment (2 replicas) in ugallu-system, leader election on,
priorityClassName=system-cluster-critical.
Telemetry
Section titled “Telemetry”ugallu_tenant_escape_detector_fires_total{detector},
ugallu_tenant_boundaries_active,
ugallu_tenant_escape_audit_bus_events_total,
ugallu_tenant_escape_bridge_events_total.
The exec detector is the only one that requires the tetragon-bridge - without it, you still get the three audit-driven detectors at full fidelity. The bridge mostly narrows down the gap between “an SA crossed a boundary according to the apiserver” and “a process actually entered another tenant’s net namespace”.