seccomp-gen
ugallu-seccomp-gen produces seccomp profiles from observation
rather than guesswork. Point a SeccompTrainingRun at a pod, give
it a duration, and the operator subscribes to the
tetragon-bridge syscall stream
filtered to that pod for the training window. At the end, an OCI
seccomp.json is emitted as a SeccompTrainingProfile CR.
- User creates a
SeccompTrainingRunwithtargetPodRef+duration. - Reconciler spawns a per-run goroutine; each opens an independent subscription to the bridge - concurrent training sessions don’t interfere.
- The engine accumulates the unique syscall surface, debounced with a 30-second sliding window so a flurry of fork/exec at workload start doesn’t drown out the steady-state surface.
- On
durationexpiry, the engine assembles the seccomp profile in the OCI format:defaultAction: SCMP_ACT_ERRNO, plus an allow rule for every syscall observed. SeccompTrainingProfileis created with the profile bytes embedded; an optionalConfigMapis also created for easysecurityContext.seccompProfile.localhostProfileconsumption.
Profile shape
Section titled “Profile shape”{ "defaultAction": "SCMP_ACT_ERRNO", "syscalls": [ { "names": ["read", "write", "openat", "close", "fstat", "mmap", "munmap", "brk", "rt_sigaction", "futex", "clone", "execve", "exit_group"], "action": "SCMP_ACT_ALLOW" } ]}The output is intentionally bare - no architecture filters, no syscall arg matchers. We trade granularity for portability: the profile loads on any architecture the kernel supports.
Example
Section titled “Example”apiVersion: security.ugallu.io/v1alpha1kind: SeccompTrainingRunmetadata: { name: payments-canary, namespace: ugallu-system }spec: targetPodRef: namespace: payments name: api-canary-7c4d duration: 10m emitConfigMap: trueInternals
Section titled “Internals”State machine
Section titled “State machine”SeccompTrainingRun.status.phase: “ (Pending) -> Running ->
Succeeded | Failed. No finalizer. The engine runs as a
detached goroutine bounded by spec.duration; the reconciler
polls engine.IsRunning(name) on a 30s requeue.
Reconcile loop
Section titled “Reconcile loop”on each SeccompTrainingRun event: run := Get(req) if run.Status.Phase in {Succeeded, Failed}: return if run.Status.Phase == "": pods := selectTargetPods(run.Spec.TargetSelector, replicaRatio) engine.Start(run.Name, pods, run.Spec.Duration) patch Status.Phase=Running, SelectedReplicas=len(pods), StartTime=now emitSE(SeccompTrainingStarted, info) RequeueAfter: 30s; return if engine.IsRunning(run.Name): RequeueAfter: 30s; return result := engine.Result(run.Name) if result == nil: # orphaned engine patch Status.Phase=Failed emitSE(SeccompTrainingFailed, high) return upsert SeccompTrainingProfile (carrying result.profileJSON) patch Status.Phase=Succeeded, ProfileRef, ObservedSyscallCount, CompletionTime emitSE(SeccompTrainingCompleted, info)Error recovery
Section titled “Error recovery”Engine ctx is detached from Reconcile ctx so a reconcile
completion does not stop the engine. On operator restart, the
engine in-memory state is lost. The new pod re-Gets the run; if
Phase=Running, engine.IsRunning(name) returns false (engine
was in the dead pod), so the run is marked Failed with
reason=engine-orphaned. The user re-applies a fresh
SeccompTrainingRun to retry.
Crash recovery scenario
Section titled “Crash recovery scenario”Pod killed during a 30-minute training window: the user creates a new run with a longer duration covering the missed time. The operator does not auto-resume the old run because the syscall surface during the gap is unknown - producing a profile from partial data could lock out a code path that was about to run.
Edge cases
Section titled “Edge cases”- ReplicaRatio. A bounded number of matched pods (default 50%) participate in the training; the rest run untraced as a control. Useful when training in production.
- Cancel on delete. Deleting the CR calls
engine.Cancel(name)synchronously - the bridge subscription is closed cleanly. - DefaultAction. Profile ships with
defaultAction=SCMP_ACT_ERRNO; can be overridden toSCMP_ACT_KILL/SCMP_ACT_LOG/SCMP_ACT_TRACEper run. - Profile is bare - no architecture filters, no syscall arg matchers. Trade granularity for portability.
Full RBAC (ClusterRole)
Section titled “Full RBAC (ClusterRole)”rules: - apiGroups: [security.ugallu.io] resources: - seccomptrainingruns - seccomptrainingruns/status - seccomptrainingprofiles - seccomptrainingprofiles/status verbs: [get, list, watch, create, update, patch] - apiGroups: [security.ugallu.io] resources: [securityevents] verbs: [create] - apiGroups: [""] resources: [pods] verbs: [get, list, watch] # target selection - apiGroups: [""] resources: [events] verbs: [create, patch] - apiGroups: [coordination.k8s.io] resources: [leases] verbs: [get, list, watch, create, update, patch, delete]CRDs owned
Section titled “CRDs owned”SeccompTrainingRun- per-run, immutable spec.
SeccompTrainingProfile- produced once when the training completes.
Key flags
Section titled “Key flags”--cluster-id, --cluster-name, --bridge-endpoint (default
ugallu-tetragon-bridge.ugallu-system-privileged.svc:50051),
--bridge-token.
Deployment
Section titled “Deployment”Deployment (1 replica) in ugallu-system-privileged - the bridge
endpoint is in the privileged namespace so the network policy is
intra-namespace; the operator process itself doesn’t need
privileged caps. Leader election on.
Telemetry
Section titled “Telemetry”ugallu_seccomp_runs_active,
ugallu_seccomp_runs_total{outcome},
ugallu_seccomp_syscalls_observed_total{run},
ugallu_seccomp_bridge_disconnects_total.
Note on safety
Section titled “Note on safety”The profile is deny-by-default with an allowlist of observed syscalls - applying it to a workload that executes a code path not exercised during training will SIGSYS. Treat training duration like a coverage exercise: the longer the run + broader the workload, the safer the resulting profile. The operator doesn’t deploy the profile for you; that’s an explicit human step, by design.