webhook-auditor

ugallu-webhook-auditor watches every MutatingWebhookConfiguration and ValidatingWebhookConfiguration in the cluster, computes a risk score per webhook, and fires a SecurityEvent when the score crosses the configured threshold. Admission webhooks are the highest-leverage targets in a cluster - a compromised webhook can rewrite every request to the apiserver - and they tend to drift unnoticed.

Risk dimensions

The score is a weighted sum across these axes:

Reach. A webhook with * in rules[].resources / apiGroups / operations scores higher than a narrowly-scoped one.
Failure policy. failurePolicy: Ignore is higher risk than Fail - an attacker who DoSes the webhook gets traffic to bypass it entirely, so “Ignore” is paradoxically the dangerous setting.
CA bundle. Self-signed (no caBundle.cert-manager.io/inject-ca-from annotation, no recognised issuer) scores higher than a properly rotated one.
Endpoint. Service refs in kube-system or unannotated namespaces score lower; URL-based endpoints (especially off-cluster) score higher.
Side effects. sideEffects: NoneOnDryRun / Some rather than None.

The exact weights live in WebhookAuditorConfig.spec.weights. The breakdown is published on each evaluation so dashboards can show “why is this webhook a 78”.

CA bundle resolution

To distinguish “the caBundle is cert-manager rotated” from “the caBundle is hard-coded base64 from 2022”, the auditor follows the cert-manager.io/inject-ca-from annotation and reads the referenced Secret - but only from a configured allowlist of namespaces (spec.trustedCASources), so a tenant can’t lure the auditor into reading their Secret by setting the annotation.

Example

apiVersion: security.ugallu.io/v1alpha1
kind: WebhookAuditorConfig
metadata: { name: default, namespace: ugallu-system }
spec:
  thresholds:
    alertOn: 70
  weights:
    wildcardResource: 20
    failurePolicyIgnore: 15
    selfSignedCA: 25
    offClusterURL: 20
    sideEffectsNone: 5
  trustedCASources:
    - { namespace: cert-manager, namePrefix: webhook-cert- }
    - { namespace: kube-system, namePrefix: ugallu- }

Internals

State machine

WebhookAuditorConfig is a singleton with no phase. The reconciler treats every Mutating/ValidatingWebhookConfiguration event as a re-evaluation request, debounced across rapid mutations. Status fields (observedWebhooks, lastConfigLoadAt, per-namespace caBundle resolution counters) are refreshed on a 30s tick.

Reconcile loop (status)

on each WebhookAuditorConfig event or 30s tick:
  cfg := Get("default")
  patch Status.ObservedWebhooks   = listMWC().Count + listVWC().Count
  patch Status.LastConfigLoadAt   = now
  RequeueAfter: 30s

Reconcile loop (per-webhook scoring)

on each MWC / VWC event:
  if cfg.Ignore.Match(name): metric Skipped[ignored]++; return
  if debounce.Skip(name, spec):  metric Skipped[debounced]++; return
  caScore  := resolveCABundle(spec.caBundle, cfg.TrustedCASources)
  reach    := scoreReach(spec.rules)
  fp       := scoreFailurePolicy(spec.failurePolicy)
  endpoint := scoreEndpoint(spec.clientConfig)
  side     := scoreSideEffects(spec.sideEffects)
  total    := reach + fp + caScore + endpoint + side
  if total >= cfg.Thresholds.AlertOn:
    emitSE(MutatingWebhookHighRisk | ValidatingWebhookHighRisk)
  metric ScoreDistribution.Observe(total)

Error recovery

Stateless evaluator: each MWC/VWC event re-runs the full scoring pass. Operator restart simply re-Lists all MWCs/VWCs and re-evaluates each. The debounce cache is rebuilt on the fly. caBundle resolution failures are counted per-reason (annotation_parse_error / namespace_forbidden / resolve_error / resolver_disabled) and recorded in status.

Crash recovery scenario

Pod killed during a per-MWC evaluation: the new pod observes the MWC again on its next informer sync, re-runs the scoring, emits the SE if it still crosses the threshold. No state to recover.

Edge cases

Debounce. Rapid mutations to the same MWC (e.g. cert-manager hot-reload of the caBundle) collapse into a single evaluation.
CA allowlist. caBundle resolution via cert-manager.io/inject-ca-from is gated by namespace allowlist (trustedCASources); a tenant cannot lure the auditor into reading their Secret by setting the annotation on a webhook outside the allowlist.
Eval timeout budget. Per-MWC eval has a hard timeout that surfaces on ugallu_webhook_eval_timeouts_total and a WebhookEvalFailed SE.
Risk score histogram lets dashboards show the distribution across all webhooks (“how many MWCs sit between 60 and 80?”).

Full RBAC

# ClusterRole
rules:
  - apiGroups: [admissionregistration.k8s.io]
    resources: [mutatingwebhookconfigurations,
                validatingwebhookconfigurations]
    verbs: [get, list, watch]
  - apiGroups: [apiextensions.k8s.io]
    resources: [customresourcedefinitions]
    verbs: [get, list, watch]   # subject kind mapping
  - apiGroups: [security.ugallu.io]
    resources: [webhookauditorconfigs, webhookauditorconfigs/status]
    verbs: [get, list, watch, update, patch]
  - apiGroups: [security.ugallu.io]
    resources: [securityevents]
    verbs: [create]
  - apiGroups: [""]
    resources: [events]
    verbs: [create, patch]
# Per-namespace Role(s) gated by trustedCASources
  - apiGroups: [""]
    resources: [secrets]
    verbs: [get, list, watch]
# Namespaced Role
  - apiGroups: [coordination.k8s.io]
    resources: [leases]
    verbs: [get, list, watch, create, update, patch, delete]

Telemetry (full)

ugallu_webhook_score_total{type, severity}
ugallu_webhook_eval_total
ugallu_webhook_eval_skipped_total{reason} (ignored / debounced / missing_secret)
ugallu_webhook_drop_total{reason} (rate-limited)
ugallu_webhook_score_distribution (histogram 0-100)
ugallu_webhook_observed_count
ugallu_webhook_eval_timeouts_total
ugallu_webhook_ca_resolve_fallback_total{reason}

CRDs owned

WebhookAuditorConfig
- singleton; status carries per-webhook scores + last evaluation.

Key flags

--cluster-id, --cluster-name, --config-name (default default).

Deployment

Deployment (2 replicas) in ugallu-system, leader election on, priorityClassName=system-cluster-critical. RBAC: cluster-wide read on MutatingWebhookConfiguration / ValidatingWebhookConfiguration / CustomResourceDefinition, namespace-scoped read on Secret (limited to trustedCASources).

Telemetry

ugallu_webhook_auditor_score{name,kind}, ugallu_webhook_auditor_breaches_total, ugallu_webhook_auditor_ca_resolves_total{outcome}.