webhook-auditor
ugallu-webhook-auditor watches every
MutatingWebhookConfiguration and ValidatingWebhookConfiguration
in the cluster, computes a risk score per webhook, and fires a
SecurityEvent when the score crosses the configured threshold.
Admission webhooks are the highest-leverage targets in a cluster -
a compromised webhook can rewrite every request to the apiserver -
and they tend to drift unnoticed.
Risk dimensions
Section titled “Risk dimensions”The score is a weighted sum across these axes:
- Reach. A webhook with
*inrules[].resources/apiGroups/operationsscores higher than a narrowly-scoped one. - Failure policy.
failurePolicy: Ignoreis higher risk thanFail- an attacker who DoSes the webhook gets traffic to bypass it entirely, so “Ignore” is paradoxically the dangerous setting. - CA bundle. Self-signed (no
caBundle.cert-manager.io/inject-ca-fromannotation, no recognised issuer) scores higher than a properly rotated one. - Endpoint. Service refs in
kube-systemor unannotated namespaces score lower; URL-based endpoints (especially off-cluster) score higher. - Side effects.
sideEffects: NoneOnDryRun/Somerather thanNone.
The exact weights live in WebhookAuditorConfig.spec.weights. The
breakdown is published on each evaluation so dashboards can show
“why is this webhook a 78”.
CA bundle resolution
Section titled “CA bundle resolution”To distinguish “the caBundle is cert-manager rotated” from “the
caBundle is hard-coded base64 from 2022”, the auditor follows the
cert-manager.io/inject-ca-from annotation and reads the
referenced Secret - but only from a configured allowlist of
namespaces (spec.trustedCASources), so a tenant can’t lure the
auditor into reading their Secret by setting the annotation.
Example
Section titled “Example”apiVersion: security.ugallu.io/v1alpha1kind: WebhookAuditorConfigmetadata: { name: default, namespace: ugallu-system }spec: thresholds: alertOn: 70 weights: wildcardResource: 20 failurePolicyIgnore: 15 selfSignedCA: 25 offClusterURL: 20 sideEffectsNone: 5 trustedCASources: - { namespace: cert-manager, namePrefix: webhook-cert- } - { namespace: kube-system, namePrefix: ugallu- }Internals
Section titled “Internals”State machine
Section titled “State machine”WebhookAuditorConfig is a singleton with no phase. The
reconciler treats every Mutating/ValidatingWebhookConfiguration
event as a re-evaluation request, debounced across rapid
mutations. Status fields (observedWebhooks,
lastConfigLoadAt, per-namespace caBundle resolution counters)
are refreshed on a 30s tick.
Reconcile loop (status)
Section titled “Reconcile loop (status)”on each WebhookAuditorConfig event or 30s tick: cfg := Get("default") patch Status.ObservedWebhooks = listMWC().Count + listVWC().Count patch Status.LastConfigLoadAt = now RequeueAfter: 30sReconcile loop (per-webhook scoring)
Section titled “Reconcile loop (per-webhook scoring)”on each MWC / VWC event: if cfg.Ignore.Match(name): metric Skipped[ignored]++; return if debounce.Skip(name, spec): metric Skipped[debounced]++; return caScore := resolveCABundle(spec.caBundle, cfg.TrustedCASources) reach := scoreReach(spec.rules) fp := scoreFailurePolicy(spec.failurePolicy) endpoint := scoreEndpoint(spec.clientConfig) side := scoreSideEffects(spec.sideEffects) total := reach + fp + caScore + endpoint + side if total >= cfg.Thresholds.AlertOn: emitSE(MutatingWebhookHighRisk | ValidatingWebhookHighRisk) metric ScoreDistribution.Observe(total)Error recovery
Section titled “Error recovery”Stateless evaluator: each MWC/VWC event re-runs the full scoring
pass. Operator restart simply re-Lists all MWCs/VWCs and
re-evaluates each. The debounce cache is rebuilt on the fly.
caBundle resolution failures are counted per-reason
(annotation_parse_error / namespace_forbidden /
resolve_error / resolver_disabled) and recorded in status.
Crash recovery scenario
Section titled “Crash recovery scenario”Pod killed during a per-MWC evaluation: the new pod observes the MWC again on its next informer sync, re-runs the scoring, emits the SE if it still crosses the threshold. No state to recover.
Edge cases
Section titled “Edge cases”- Debounce. Rapid mutations to the same MWC (e.g. cert-manager hot-reload of the caBundle) collapse into a single evaluation.
- CA allowlist. caBundle resolution via
cert-manager.io/inject-ca-fromis gated by namespace allowlist (trustedCASources); a tenant cannot lure the auditor into reading their Secret by setting the annotation on a webhook outside the allowlist. - Eval timeout budget. Per-MWC eval has a hard timeout that
surfaces on
ugallu_webhook_eval_timeouts_totaland aWebhookEvalFailedSE. - Risk score histogram lets dashboards show the distribution across all webhooks (“how many MWCs sit between 60 and 80?”).
Full RBAC
Section titled “Full RBAC”# ClusterRolerules: - apiGroups: [admissionregistration.k8s.io] resources: [mutatingwebhookconfigurations, validatingwebhookconfigurations] verbs: [get, list, watch] - apiGroups: [apiextensions.k8s.io] resources: [customresourcedefinitions] verbs: [get, list, watch] # subject kind mapping - apiGroups: [security.ugallu.io] resources: [webhookauditorconfigs, webhookauditorconfigs/status] verbs: [get, list, watch, update, patch] - apiGroups: [security.ugallu.io] resources: [securityevents] verbs: [create] - apiGroups: [""] resources: [events] verbs: [create, patch]# Per-namespace Role(s) gated by trustedCASources - apiGroups: [""] resources: [secrets] verbs: [get, list, watch]# Namespaced Role - apiGroups: [coordination.k8s.io] resources: [leases] verbs: [get, list, watch, create, update, patch, delete]Telemetry (full)
Section titled “Telemetry (full)”ugallu_webhook_score_total{type, severity}ugallu_webhook_eval_totalugallu_webhook_eval_skipped_total{reason}(ignored/debounced/missing_secret)ugallu_webhook_drop_total{reason}(rate-limited)ugallu_webhook_score_distribution(histogram 0-100)ugallu_webhook_observed_countugallu_webhook_eval_timeouts_totalugallu_webhook_ca_resolve_fallback_total{reason}
CRDs owned
Section titled “CRDs owned”WebhookAuditorConfig- singleton; status carries per-webhook scores + last evaluation.
Key flags
Section titled “Key flags”--cluster-id, --cluster-name, --config-name (default
default).
Deployment
Section titled “Deployment”Deployment (2 replicas) in ugallu-system, leader election on,
priorityClassName=system-cluster-critical. RBAC: cluster-wide
read on MutatingWebhookConfiguration /
ValidatingWebhookConfiguration / CustomResourceDefinition,
namespace-scoped read on Secret (limited to
trustedCASources).
Telemetry
Section titled “Telemetry”ugallu_webhook_auditor_score{name,kind},
ugallu_webhook_auditor_breaches_total,
ugallu_webhook_auditor_ca_resolves_total{outcome}.