What is PSP? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide)


Quick Definition (30–60 words)

Pod Security Policy (PSP) is a Kubernetes admission control mechanism that enforced pod-level security constraints. Analogy: PSP is like airport security rules for containers. Formal: PSP defines allowed pod spec features and validates pods at admission time against policy objects.


What is PSP?

What it is / what it is NOT

  • PSP is a Kubernetes admission control resource model used to restrict pod capabilities, e.g., privileged mode, hostPath, running as root.
  • PSP is NOT a runtime enforcement engine for already-running containers; it prevents creation rather than introspecting existing pods.
  • PSP is NOT a replacement for broader cluster security like network policies, workload identity, or image scanning.

Key properties and constraints

  • Admission-time enforcement: evaluates pod requests before creation.
  • Policy granularity: operates on pod spec fields and security context attributes.
  • RBAC binding: policies are applied via role or clusterrole bindings to service accounts and users.
  • Deprecated upstream: the built-in PSP API was deprecated and removed in recent Kubernetes versions; many clusters use PodSecurity admission or third-party controllers.
  • Compatibility constraints: behavior varies by Kubernetes version and vendor managed control planes.

Where it fits in modern cloud/SRE workflows

  • Preventive security gate in CI/CD pipeline and admission control.
  • Complement to runtime detection, image scanning, and network controls.
  • Integrated into shift-left security: AC policies are tested in pre-prod to avoid CI failures.
  • Used by platform teams to enforce organizational minimal privileges.

A text-only “diagram description” readers can visualize

  • Developer -> CI builds image -> Developer submits deployment -> API server admission chain: first webhook checks -> PSP evaluates pod spec -> If allowed, write to etcd -> Scheduler places pod -> Kubelet runs pod -> Observability and runtime security tools monitor.

PSP in one sentence

PSP is an admission-time policy model for validating Kubernetes pod specs to enforce security constraints before pods are created.

PSP vs related terms (TABLE REQUIRED)

ID Term How it differs from PSP Common confusion
T1 PodSecurity admission New builtin policy enforcement model Often assumed identical to PSP
T2 Gatekeeper Policy engine using OPA not PSP People think Gatekeeper modifies PSP
T3 PodSecurityPolicy API The deprecated PSP API object Confused with current admission models
T4 NetworkPolicy Controls networking not pod security Some expect it blocks privileged containers
T5 Runtime security Detects behavior post-start Assumed to prevent pod creation like PSP
T6 Image scanning Examines images not pod specs Expected to block hostPath like PSP
T7 RBAC Authz for subjects not pod constraints Mistaken for policy application method
T8 Admission webhook Mechanism not policy model Believed to be a PSP replacement

Row Details (only if any cell says “See details below”)

  • None

Why does PSP matter?

Business impact (revenue, trust, risk)

  • Prevents privilege escalation and data exfiltration risks that can lead to breaches and regulatory fines.
  • Reduces blast radius from attacks, protecting customer trust and uptime.
  • Enables consistent enforcement across teams, lowering compliance audit costs.

Engineering impact (incident reduction, velocity)

  • Reduces production incidents due to insecure pod configurations.
  • Improves developer velocity by preventing security rework earlier in the lifecycle.
  • Lowers on-call load by removing a class of configuration-induced failures.

SRE framing (SLIs/SLOs/error budgets/toil/on-call) where applicable

  • SLIs: percent of pods compliant with baseline security policy; time-to-detect policy violations in CI.
  • SLOs: maintain compliance SLO versus audit requirements, e.g., 99.9% of production pods compliant.
  • Error budget: violations consume policy compliance budget; repeated violations trigger remediation.
  • Toil: manual review of pod specs becomes toil; automation via admission reduces it.
  • On-call: alerts for policy admission failures should be routed to platform or CI owners, not app on-call.

3–5 realistic “what breaks in production” examples

  • A deployment uses hostPath to mount host directories leading to data corruption across nodes.
  • Containers run as root and write to node filesystems, enabling escape vectors.
  • Privileged containers granted CAP_SYS_ADMIN break security assumptions in multi-tenant clusters.
  • Use of hostNetwork unexpectedly exposes sensitive service endpoints to external traffic.
  • Misconfigured seccomp/profile absent causes noisy kernel logs and performance degradation.

Where is PSP used? (TABLE REQUIRED)

ID Layer/Area How PSP appears Typical telemetry Common tools
L1 Edge / Ingress Prevents hostNetwork hostPort usage Admission deny logs Admission webhooks
L2 Node / Kubelet Disallow privileged pods Kube-apiserver audit kube-apiserver audit
L3 Service / App Block hostPath and runAsRoot Pod creation failures PodSecurity admission
L4 Data / Storage Restrict volume types PVC bind failures StorageClass policies
L5 Kubernetes control plane Enforce RBAC-bound policies Authz audit events OPA Gatekeeper
L6 Serverless / FaaS Limit container capabilities Platform invocation errors Platform admission hooks
L7 CI/CD pipeline Pre-commit or admission testing CI job pass/fail rates Policy-as-code in CI
L8 Observability / Security Feed to SIEM for compliance Alert counts and dashboards Falco, Kyverno

Row Details (only if needed)

  • None

When should you use PSP?

When it’s necessary

  • Multi-tenant clusters where isolation is required.
  • Regulated environments with compliance requirements.
  • Platform teams enforcing minimal privileges across teams.

When it’s optional

  • Single-team clusters with trusted developers and tight review processes.
  • Short-lived experimental clusters that are isolated and ephemeral.

When NOT to use / overuse it

  • Avoid overly strict global policies that block legitimate Dev workflows.
  • Don’t use PSP as the only security control; combine with runtime and network controls.
  • Avoid per-pod micromanagement that creates constant friction for developers.

Decision checklist

  • If multi-tenant AND compliance required -> enforce baseline policies at admission.
  • If single-team AND rapid experimentation -> start with advisory policies in CI.
  • If many legacy workloads break on first rollout -> use graduated enforcement (audit -> enforce).

Maturity ladder: Beginner -> Intermediate -> Advanced

  • Beginner: Add an admission gate that denies privileged and hostPath.
  • Intermediate: Implement policy-as-code in CI and enforce minimal runAsUser and seccomp.
  • Advanced: Combine PodSecurity, OPA/Gatekeeper, runtime enforcement, and automated remediations.

How does PSP work?

Explain step-by-step

  • Policy authoring: Define constraints (e.g., allowPrivilegeEscalation: false).
  • Policy binding: Bind policy to service accounts, groups or namespaces via RBAC.
  • Admission-time evaluation: API server or admission controller evaluates pod spec against policies.
  • Decision: Admit, deny, or mutate (depending on controller capability).
  • Audit and reporting: Log admission decisions to kube-apiserver audit and SIEM.
  • Remediation: CI tests or automated tools fix violations or notify owners.

Components and workflow

  • Policy storage: Policy objects stored in etcd or external Git (policy-as-code).
  • Admission chain: kube-apiserver calls controllers/webhooks in order.
  • Matchers: Rules match namespaces, service accounts, labels.
  • Action: deny, audit, or mutate pod specs.
  • Observability: Audit logs, metrics, and dashboards feed SRE workflows.

Data flow and lifecycle

  • Developer pushes manifest -> CI runs policy checks -> Developer deploys -> API server admission checks -> Pod admitted/denied -> Runtime monitoring observes behavior.

Edge cases and failure modes

  • Admission webhook outage can block all pod creations if webhook is synchronous and misconfigured.
  • Version skew: older PSP objects may not be honored in newer clusters.
  • RBAC misconfiguration leads to over- or under-enforcement.
  • Exceptions: Some system pods require elevated privileges; misclassifying them breaks control plane.

Typical architecture patterns for PSP

  1. Baseline enforcement pattern – Use for: quick minimal security across all namespaces. – Implementation: deny privileged, enforce non-root.
  2. Namespace-tiered pattern – Use for: multi-tenant clusters with dev/prod tiers. – Implementation: different policies per namespace tier.
  3. GitOps policy-as-code pattern – Use for: teams using GitOps and automated reviews. – Implementation: policies stored in Git, validated by CI, applied via controllers.
  4. Advisory-to-enforce pattern – Use for: migrations from permissive to strict enforcement. – Implementation: audit first, then enforce after remediation windows.
  5. Mutating + validating pattern – Use for: automatic hardening (e.g., adding seccomp profiles). – Implementation: mutating webhook injects defaults, validating webhook enforces.

Failure modes & mitigation (TABLE REQUIRED)

ID Failure mode Symptom Likely cause Mitigation Observability signal
F1 Webhook outage Pod creation blocked Synchronous webhook down Use timeout and fail-open audit Increased admission errors
F2 Overly strict policy Many deployment failures Broad deny rules Audit mode then incrementally enforce Spike in deny audit logs
F3 RBAC misbind Policy not applied Incorrect role binding Correct bindings and test in staging Discrepancy in expected vs actual denies
F4 Version incompatibility PSP ignored or errors Kubernetes API removal Migrate to PodSecurity or OPA API errors in controller logs
F5 Privileged system pods blocked Control plane degraded Policy applied to system ns Exclude system namespaces Control plane pod restarts
F6 Silent drift Policies diverge from Git Manual edits in-cluster Enforce GitOps reconciliation Config drift alerts

Row Details (only if needed)

  • None

Key Concepts, Keywords & Terminology for PSP

Glossary entries (40+ terms). Each entry: Term — 1–2 line definition — why it matters — common pitfall

  1. PodSecurityPolicy — Deprecated Kubernetes API for pod admission controls — Central historic model — Pitfall: removed in newer K8s.
  2. PodSecurity admission — Replacement builtin admission controller enforcing pod security standards — Important as current recommended model — Pitfall: behavior differs from PSP.
  3. Admission controller — Component that intercepts API requests — Core enforcement point — Pitfall: misconfigured webhook can block cluster.
  4. Admission webhook — External service called during admission — Enables custom policies — Pitfall: availability impacts pod creation.
  5. OPA Gatekeeper — Policy engine using Open Policy Agent — Flexible policy-as-code — Pitfall: complexity and performance considerations.
  6. Kyverno — Kubernetes native policy engine — Simpler policy syntax for K8s — Pitfall: version compatibility.
  7. RBAC — Role-based access control for subjects — Defines who can create pods — Pitfall: over-permissive roles.
  8. Namespace — K8s logical partition — Allows per-namespace policies — Pitfall: forgetting system namespaces.
  9. ServiceAccount — Identity for workloads — Bind policies to SA for least privilege — Pitfall: default SA surprises.
  10. seccomp — Kernel syscall filtering for containers — Reduces attack surface — Pitfall: missing profile causes permissive syscalls.
  11. runAsUser — Security context setting to avoid root — Prevents privilege escalation — Pitfall: legacy images require root.
  12. runAsNonRoot — Enforce non-root container processes — Simple safety check — Pitfall: false positives in init containers.
  13. allowPrivilegeEscalation — Controls setuid usage — Prevents kernel privilege escalation — Pitfall: needed for some debuggers.
  14. hostPath — Mount host filesystem into pod — Dangerous for isolation — Pitfall: used for convenience in prod.
  15. hostNetwork — Shares node network namespace — Exposes node ports — Pitfall: unexpected external exposure.
  16. hostPID — Shares node process namespace — Security risk for node introspection — Pitfall: needed by some debugging tools.
  17. capabilities — Linux capabilities granting fine-grained privileges — Controls powerful ops like NET_ADMIN — Pitfall: granting CAP_SYS_ADMIN is near-root.
  18. privileged container — Full host access like root — Highest risk — Pitfall: used for convenience in init workloads.
  19. SELinux — Mandatory access control for processes — Adds defense layer — Pitfall: complex labels and policy tuning.
  20. AppArmor — Kernel security module for confinement — Reduces program actions — Pitfall: profile maintenance overhead.
  21. Mutating webhook — Alters requests, e.g., inject seccomp — Used for auto-hardening — Pitfall: unexpected changes to manifests.
  22. Validating webhook — Accept/deny admission requests — Enforces policies — Pitfall: blocks without clear remediation.
  23. GitOps — Policy-as-code workflows stored in Git — Enables reproducibility — Pitfall: delayed reconciliation can cause drift.
  24. Policy-as-code — Express policies in versioned code — Improves reviewability — Pitfall: overcomplex rules.
  25. Audit logs — Records of admission decisions — Required for compliance — Pitfall: noisy logs if policy too verbose.
  26. SIEM — Security information and event management — Centralizes alerts — Pitfall: high signal-to-noise if unfiltered.
  27. Least privilege — Principle to minimize permissions — Core security idea — Pitfall: too strict may break apps.
  28. Mutate-and-validate pattern — Inject defaults then enforce — Reduces friction — Pitfall: order of webhooks matters.
  29. Admission latency — Time added by webhooks — Affects deployment speed — Pitfall: slow webhooks slow CI.
  30. Fail-open vs fail-closed — Webhook failure behavior — Decides blocking behavior — Pitfall: fail-open may permit bad pods.
  31. PodSecurity standard levels — e.g., privileged, baseline, restricted — Defines graded constraints — Pitfall: mislabeling namespaces.
  32. Scanning vs enforcement — Image scanning looks at images, PSP checks pod specs — Complementary controls — Pitfall: relying on one alone.
  33. Runtime security (Falco) — Detects behavioral anomalies — Covers runtime gaps — Pitfall: alerts without context.
  34. Immutable infrastructure — Avoid manual in-cluster edits — Promotes reproducibility — Pitfall: manual fixes create drift.
  35. Canary policies — Gradual enforcement approach — Useful for migration — Pitfall: partial enforcement complexity.
  36. Policy templates — Reusable rule patterns — Aid consistency — Pitfall: hidden complexity in templates.
  37. Compliance baseline — Organization policy requirements — Guides PSP design — Pitfall: baselines too generic.
  38. Policy reconciliation — Ensure desired state applied — Keeps clusters consistent — Pitfall: reconciliation lag.
  39. Cluster-wide vs namespace policies — Different scope impacts — Pitfall: cluster policies can break system components.
  40. Emergency allowlist — Temporary exemptions for critical fixes — Operational necessity — Pitfall: abused and left in place.
  41. Capability bounding — Limit set of Linux capabilities — Prevent escalation — Pitfall: misidentifying required caps.
  42. Pod security context — Aggregated security settings per pod — Central to PSP checks — Pitfall: omissions cause denials.

How to Measure PSP (Metrics, SLIs, SLOs) (TABLE REQUIRED)

ID Metric/SLI What it tells you How to measure Starting target Gotchas
M1 Pod compliance rate Percent pods meeting policy Count compliant pods / total pods 99% for prod Some system pods excluded
M2 Admission deny rate Fraction of admissions denied Deny events / total admissions <1% after rollout Deny spikes indicate dev friction
M3 Time to remediate violation Time from deny to fix Time tracked in ticketing <48 hours for prod Long lead due to cross-team handoffs
M4 Audit denial alerts Number of denied events alerting ops Count denies from audit logs Configurable threshold High noise if policy verbose
M5 Policy drift frequency Number of in-cluster edits not in Git Drift events per week 0 for GitOps Requires detection tooling
M6 Admission latency Extra ms added by policy checks Median webhook latency <200ms Long latencies slow CI
M7 Unauthorized privilege escalations Runtime detections post-admit Runtime alerts correlated to pod 0 for prod Runtime tools needed
M8 Exceptions count Number of emergency allowlist uses Count per time window Low and audited Abuse of allowlist possible
M9 CI policy failure rate CI jobs failing policy checks Failures / CI policy jobs <2% post stabilization Early migration may spike
M10 Coverage of namespaces Percent namespaces covered by PSP CoveredNamespaces / totalNamespaces 100% for regulated clusters System namespaces may be exempt

Row Details (only if needed)

  • None

Best tools to measure PSP

Tool — Prometheus + kube-state-metrics

  • What it measures for PSP: Pod counts, admission events, webhook metrics.
  • Best-fit environment: Kubernetes clusters with metrics stack.
  • Setup outline:
  • Deploy kube-state-metrics.
  • Instrument admission controllers to expose metrics.
  • Create Prometheus rules to compute compliance rates.
  • Configure Alertmanager with alarms for deny spikes.
  • Strengths:
  • Flexible queries and alerting.
  • Widely used in cloud-native stacks.
  • Limitations:
  • Requires metric exposition from webhooks.
  • Not a SIEM replacement.

Tool — Fluentd / Fluent Bit + ELK

  • What it measures for PSP: Collects audit logs and denial events.
  • Best-fit environment: Clusters with centralized logging.
  • Setup outline:
  • Enable kube-apiserver audit logs.
  • Forward logs to Elasticsearch.
  • Create dashboards for deny events.
  • Strengths:
  • Rich search across logs.
  • Good for compliance audits.
  • Limitations:
  • Storage costs for large logs.
  • Requires field normalization.

Tool — OPA Gatekeeper

  • What it measures for PSP: Policy violations and audit reports.
  • Best-fit environment: Policy-as-code users.
  • Setup outline:
  • Install Gatekeeper & ConstraintTemplates.
  • Create Constraints for desired rules.
  • Use audit mode and capture reports.
  • Strengths:
  • Expressive Rego policies.
  • Audit capabilities.
  • Limitations:
  • Rego learning curve.
  • Performance tuning may be needed.

Tool — Kyverno

  • What it measures for PSP: Validation, mutation, and policy audit events.
  • Best-fit environment: Kubernetes-native policy needs.
  • Setup outline:
  • Install Kyverno.
  • Define policies in YAML.
  • Use mutate to inject defaults and validate to enforce.
  • Strengths:
  • K8s-like policy syntax.
  • Easier onboarding.
  • Limitations:
  • May lack some advanced Rego features.

Tool — Falco

  • What it measures for PSP: Runtime violations that indicate admission gaps.
  • Best-fit environment: Runtime security observability.
  • Setup outline:
  • Deploy Falco as DaemonSet.
  • Configure rules for privilege escalation patterns.
  • Forward alerts to SIEM/Alertmanager.
  • Strengths:
  • Detects behavioral anomalies.
  • Complements admission controls.
  • Limitations:
  • False positives if rules not tuned.

Recommended dashboards & alerts for PSP

Executive dashboard

  • Panels:
  • Pod compliance rate (trend).
  • Number of denied admissions by namespace.
  • Time-to-remediate median.
  • Policy drift count.
  • Why:
  • Shows compliance health to leadership.

On-call dashboard

  • Panels:
  • Recent admission denies with stacktrace.
  • Admission webhook latency and error rate.
  • Namespaces with repeated denies.
  • Active exceptions/allowlist entries.
  • Why:
  • Rapid triage during incidents and deployment failures.

Debug dashboard

  • Panels:
  • Raw audit log stream filtered for policy events.
  • Per-webhook latency and error logs.
  • Pod spec differences between requested and mutated.
  • Timeline of CI fail rate for policy checks.
  • Why:
  • Deep troubleshooting for policy failures.

Alerting guidance

  • What should page vs ticket:
  • Page: Admission webhook down or high error rate impacting pod creation.
  • Ticket: Individual deployment denies for developers.
  • Burn-rate guidance:
  • If deny rate consumes more than 25% of weekly change-related tolerance, trigger review.
  • Noise reduction tactics:
  • Deduplicate identical denies.
  • Group alerts by namespace or service account.
  • Suppress during maintenance windows and known rollouts.

Implementation Guide (Step-by-step)

1) Prerequisites – Cluster admin privileges or platform team involvement. – CI and GitOps pipelines in place. – Observability stack capturing audit logs and metrics.

2) Instrumentation plan – Add metrics to admission controllers. – Ensure audit logging is enabled on kube-apiserver. – Plan policies in Git with review workflows.

3) Data collection – Forward audit logs to central logging. – Export metrics to Prometheus. – Store policy state in Git.

4) SLO design – Define SLOs for compliance rate and time-to-remediate. – Map SLOs to services and namespaces.

5) Dashboards – Build executive, on-call, and debug dashboards. – Provide drill-down links from exec panels to debug panels.

6) Alerts & routing – Create alerts for webhook health and deny spikes. – Route to platform on-call for blocking issues. – Route policy violations to application owners via ticketing.

7) Runbooks & automation – Create runbooks for webhook outages, policy denial investigations, and emergency allowlist processes. – Automate remediation for common fixes (e.g., add runAsUser where safe).

8) Validation (load/chaos/game days) – Run load tests for admission latency impacts. – Conduct chaos tests that simulate webhook failure. – Run game days to validate incident response.

9) Continuous improvement – Weekly policy reviews. – Quarterly audits and SLO reviews. – Postmortem-driven policy adjustments.

Pre-production checklist

  • Policies authored and stored in Git.
  • CI policy tests added to pipeline.
  • Staging cluster mirrors production policy enforcement.
  • Observability capturing admission and audit logs.
  • Runbooks for expected failures.

Production readiness checklist

  • Canary rollout with audit-mode first.
  • Metrics and alerts configured and tested.
  • Emergency allowlist process documented and limited.
  • Training for app teams on common fixes.

Incident checklist specific to PSP

  • Confirm scope: which namespaces and service accounts affected.
  • Check webhook health and API server logs.
  • Determine if deny is expected or due to policy drift.
  • If webhook down, assess fail-open configuration and restore service.
  • Apply temporary allowlist if safe and document.

Use Cases of PSP

Provide 8–12 use cases:

1) Multi-tenant SaaS cluster – Context: Multiple customers share cluster. – Problem: Isolation breaches risk data leaks. – Why PSP helps: Enforce least privilege for tenants. – What to measure: Pod compliance rate, unauthorized privileges. – Typical tools: PodSecurity, Gatekeeper, Prometheus.

2) Regulated environment (PCI/ISO) – Context: Compliance auditing required. – Problem: Inconsistent security posture across teams. – Why PSP helps: Standardize enforcement and produce audit logs. – What to measure: Policy drift, compliance SLOs. – Typical tools: PodSecurity, audit logging, SIEM.

3) Platform-as-a-Service team – Context: Platform team provides managed namespaces. – Problem: Developers bypassing guidelines. – Why PSP helps: Prevent risky pods before they run. – What to measure: CI policy failure rates. – Typical tools: Kyverno, GitOps.

4) CI/CD hardening – Context: Deployments automated via pipelines. – Problem: Broken deployments due to runtime privilege assumptions. – Why PSP helps: Fail early in CI to avoid prod incidents. – What to measure: CI policy failures, remediation time. – Typical tools: Policy-as-code, CI plugins.

5) Securing edge workloads – Context: Edge nodes run untrusted workloads. – Problem: Attack on edge node affects fleet. – Why PSP helps: Block hostNetwork and hostPath on edge pods. – What to measure: HostPath denies, hostNetwork usage. – Typical tools: PodSecurity admission, Falco.

6) Legacy migration – Context: Moving older workloads to K8s. – Problem: Many containers require root. – Why PSP helps: Gradual enforcement to modernize apps. – What to measure: Number of exemptions and trend. – Typical tools: Audit-mode policies, canary enforcement.

7) Serverless platform constraints – Context: Managed FaaS on K8s underneath. – Problem: Function runtimes gaining unintended capabilities. – Why PSP helps: Enforce minimal syscall surfaces. – What to measure: Runtime detections and denials. – Typical tools: Kyverno, seccomp profiles.

8) Incident containment automation – Context: Post-breach containment required. – Problem: Need to quickly limit new risky pods. – Why PSP helps: Quickly apply stricter policies cluster-wide. – What to measure: Time to apply emergency policy, deny rate. – Typical tools: GitOps for fast policy deployment.

9) Cost control (indirect) – Context: Privileged pods accessing node-level resources. – Problem: Unintended resource reserves and scheduling inefficiencies. – Why PSP helps: Prevent hostResource claims that remove capacity. – What to measure: Host-bound deployments and node utilization. – Typical tools: Admission policies and scheduler metrics.

10) Platform onboarding – Context: New team joining shared cluster. – Problem: Lack of standardized practices increases risk. – Why PSP helps: Provide baseline constraints and onboarding templates. – What to measure: First-week compliance rate and ROX. – Typical tools: Templates in Git, CI tests.


Scenario Examples (Realistic, End-to-End)

Scenario #1 — Kubernetes: Multi-tenant baseline enforcement

Context: A SaaS provider runs multiple customer namespaces in one cluster.
Goal: Enforce baseline security without breaking existing workloads.
Why PSP matters here: Prevents privilege escalation and protects shared nodes.
Architecture / workflow: GitOps for policy definitions, Kyverno for mutation/validation, Prometheus for metrics, Fluentd for audits.
Step-by-step implementation:

  1. Inventory current pod specs in prod.
  2. Create baseline policies denying privileged, hostPath, hostNetwork.
  3. Deploy policies in audit mode for 2 weeks.
  4. Fix violations and provide developer guidance.
  5. Switch to enforce mode for non-system namespaces. What to measure: Pod compliance rate, deny events per namespace, time-to-remediate.
    Tools to use and why: Kyverno for easy K8s-style policies; Prometheus for metrics; Fluentd for audit logs.
    Common pitfalls: Not exempting kube-system causing control plane failures.
    Validation: Run canary deployments with known-good manifests.
    Outcome: Baseline enforced with minimal disruptions and measurable compliance.

Scenario #2 — Serverless / Managed-PaaS: Function sandboxing

Context: A company operates an internal FaaS platform on K8s.
Goal: Ensure functions cannot use host resources or escalate privileges.
Why PSP matters here: Functions are highly dynamic and riskier if permissive.
Architecture / workflow: Platform admission webhooks that mutate function pods to include seccomp and drop capabilities; Gatekeeper validates.
Step-by-step implementation:

  1. Define seccomp and capability baselines.
  2. Mutating webhook injects defaults into function pods.
  3. Gatekeeper validates no hostPath or privileged flags.
  4. CI validates functions against policies before deployment. What to measure: Function pod compliance and runtime detections.
    Tools to use and why: Mutating webhook for injection, Gatekeeper for validation, Falco for runtime.
    Common pitfalls: Breakage of native libs requiring specific capabilities.
    Validation: Canary with synthetic functions and runtime checks.
    Outcome: Functions run in tighter sandboxes with reduced blast radius.

Scenario #3 — Incident response / Postmortem: Privilege exploit mitigation

Context: A container escape vulnerability exploited by an attacker to read node files.
Goal: Contain ongoing exploitation and prevent new risky pods.
Why PSP matters here: Quickly restrict new pods from using hostPath or privileged modes.
Architecture / workflow: Emergency policy pushed via GitOps; admission validates new pods; Falco watches for post-admit anomalies.
Step-by-step implementation:

  1. Declare incident and notify platform on-call.
  2. Apply emergency deny policy cluster-wide excluding kube-system.
  3. Monitor for new deny events and retroactive runtime alerts.
  4. Remediate running risky pods via orchestration.
  5. Postmortem to remove allowlists and refine policies. What to measure: Time to apply emergency policy and reduction in risky pods.
    Tools to use and why: GitOps for quick policy rollouts, Falco for runtime monitoring.
    Common pitfalls: Overly broad emergency rules breaking legitimate jobs.
    Validation: Confirm new pods are denied and runtime anomalies decline.
    Outcome: Containment of the exploit vector while follow-up patches are deployed.

Scenario #4 — Cost / Performance trade-off: Limiting node-affinity privileged workloads

Context: Privileged workloads were allowed to reserve host devices causing scheduling hotspots.
Goal: Reduce node contention and improve cost efficiency.
Why PSP matters here: Prevents pods from requesting host resources unnecessarily.
Architecture / workflow: Enforcement via policy to forbid hostPath and hostNetwork for non-admin namespaces; scheduler metrics track node load.
Step-by-step implementation:

  1. Identify pods using hostPath/hostNetwork.
  2. Create policy disallowing host bindings in app namespaces.
  3. Educate teams on alternatives (CSI drivers, local PVs with eviction).
  4. Enforce policy and monitor node utilization and costs. What to measure: Host-bound pod count, node utilization, cost per workload.
    Tools to use and why: Prometheus for node metrics, Gatekeeper for enforcement.
    Common pitfalls: Legacy storage needs needing migration effort.
    Validation: Observe reduced node saturation and lower costs.
    Outcome: Improved packing and cost reduction while maintaining app functionality.

Common Mistakes, Anti-patterns, and Troubleshooting

List of mistakes with Symptom -> Root cause -> Fix (15–25 items)

  1. Symptom: Many deployments suddenly fail. -> Root cause: Enforced policy applied cluster-wide without audit stage. -> Fix: Roll back to audit mode and stage enforcement.
  2. Symptom: Control plane pods restart. -> Root cause: Policy applied to kube-system namespaces. -> Fix: Exempt kube-system or bind policy selectively.
  3. Symptom: CI pipeline fails for legacy apps. -> Root cause: No migration path or advisory checks. -> Fix: Add migration tasks and config transforms in CI.
  4. Symptom: Webhook timeout blocking deploys. -> Root cause: Synchronous webhook slow response. -> Fix: Increase timeouts, optimize webhook, use caching, fail-open if acceptable.
  5. Symptom: High alert noise for denies. -> Root cause: Broad deny rules catching benign patterns. -> Fix: Tweak rules, add exceptions, group alerts.
  6. Symptom: Runtime escape detected after admission. -> Root cause: Admission misses runtime behavior. -> Fix: Add runtime security tools like Falco and correlate events.
  7. Symptom: Policy drift between Git and cluster. -> Root cause: Manual in-cluster edits. -> Fix: Enforce GitOps reconciliation and audit.
  8. Symptom: Developers request privileges frequently. -> Root cause: Missing capabilities/incompatible images. -> Fix: Provide developer guidance, alternative images, or safe allowlists.
  9. Symptom: Slow troubleshooting for denied pods. -> Root cause: Poor audit log indexing. -> Fix: Improve logging pipeline and searchable fields.
  10. Symptom: Too many exceptions. -> Root cause: Emergency allowlist overused. -> Fix: Time-bound allowlists and post-incident review.
  11. Symptom: False positives in runtime alerts. -> Root cause: Un-tuned rules. -> Fix: Tune Falco/IDS rules for environment.
  12. Symptom: Admission controller memory pressure. -> Root cause: Complex policy evaluation. -> Fix: Simplify policies or scale controller replicas.
  13. Symptom: Unauthorized privilege escapes not detected. -> Root cause: No runtime coverage. -> Fix: Deploy additional runtime sensors and process baselines.
  14. Symptom: Policy regressions after upgrade. -> Root cause: API behavior changes across K8s versions. -> Fix: Test policies during cluster upgrades in staging.
  15. Symptom: Too many one-off policies. -> Root cause: Lack of reuse and templates. -> Fix: Create reusable policy templates.
  16. Symptom: Missing seccomp profiles. -> Root cause: OS/container runtime mismatch. -> Fix: Standardize runtimes and maintain profiles.
  17. Symptom: App failures masked by admission denies. -> Root cause: Poor error messaging in deny responses. -> Fix: Provide detailed deny messages and remediation steps.
  18. Symptom: Observability blind spots. -> Root cause: Not collecting admission metrics. -> Fix: Instrument webhooks and export metrics.
  19. Symptom: Performance regressions with mutation. -> Root cause: Mutating webhook injects heavy sidecars. -> Fix: Re-evaluate injected artifacts and tune.
  20. Symptom: Security policy conflicts. -> Root cause: Multiple policy engines with overlapping rules. -> Fix: Consolidate or document precedence.
  21. Symptom: Unmonitored allowlist usage. -> Root cause: Lack of audit for exemptions. -> Fix: Log and review all allowlist entries periodically.
  22. Symptom: Poor developer adoption. -> Root cause: No training and unclear guidance. -> Fix: Provide examples, templates, and office hours.
  23. Symptom: Excessive manual remediation work. -> Root cause: No automation for common fixes. -> Fix: Create automation playbooks and PR bots.

Observability pitfalls (at least 5 included above): noisy logs, missing metrics, indexing gaps, false positives, lack of correlation between admission and runtime.


Best Practices & Operating Model

Ownership and on-call

  • Platform team owns policy lifecycle and on-call for blocking webhook failures.
  • Application teams own remediation for their violations.
  • Create a clear escalation path when enforcement blocks critical business workflows.

Runbooks vs playbooks

  • Runbooks: Step-by-step operational procedures for common issues (webhook down, emergency allowlist).
  • Playbooks: Higher-level decision guides for incidents requiring judgment (policy rollback vs enforce).

Safe deployments (canary/rollback)

  • Start in audit mode, then small-namespace canary, then full enforce.
  • Keep quick rollback paths and automated tests in CI to detect breakages.

Toil reduction and automation

  • Automate remediation PRs for simple fixes (e.g., inject runAsUser).
  • Use policy templates and GitOps to avoid manual edits.

Security basics

  • Default deny for capabilities and privileged flags.
  • Enforce non-root where possible.
  • Apply seccomp and AppArmor profiles.
  • Monitor runtime for deviations.

Weekly/monthly routines

  • Weekly: Review deny spikes and open remediation tickets.
  • Monthly: Audit allowlist entries and drift reports.
  • Quarterly: SLO review and policy effectiveness report.

What to review in postmortems related to PSP

  • Whether policy prevented or caused the incident.
  • Time to detect and remediate policy violations.
  • Any changes to allowlists and their justification.
  • Lessons to tighten or relax policies.

Tooling & Integration Map for PSP (TABLE REQUIRED)

ID Category What it does Key integrations Notes
I1 Policy engine Validates and mutates pod specs Kubernetes admission, GitOps Gatekeeper and Kyverno common choices
I2 Audit logging Collects admission decisions SIEM, ELK, cloud logging kube-apiserver audit must be enabled
I3 Metrics store Stores compliance and latency metrics Prometheus, Alertmanager Needs metrics from controllers
I4 Runtime security Detects runtime anomalies Falco, runtime scanners Complements admission controls
I5 GitOps Manages policy-as-code ArgoCD, Flux Ensures reconciliation
I6 CI integration Runs policy checks pre-deploy Jenkins, GitHub Actions Prevents violations before admission
I7 Dashboarding Visualizes compliance Grafana, Kibana Executive and debug views
I8 Identity / AuthN Maps service accounts and users OIDC, IAM Critical for correct policy binding
I9 Secrets & config Securely store seccomp/AppArmor files Vault, K8s Secrets Sensitive artifacts storage
I10 Incident mgmt Routes alerts and tickets PagerDuty, Opsgenie On-call routing for platform team

Row Details (only if needed)

  • None

Frequently Asked Questions (FAQs)

H3: What does PSP stand for in Kubernetes?

Pod Security Policy in historical Kubernetes context; replaced by PodSecurity and third-party engines in newer K8s.

H3: Is PSP still supported in Kubernetes 1.27+?

No, the built-in PSP API was deprecated earlier and removed in later releases. Use PodSecurity, OPA Gatekeeper, or Kyverno.

H3: What’s the difference between PodSecurity and PSP?

PodSecurity is the newer builtin admission mode with standard levels; PSP was a more flexible but deprecated API.

H3: Can PSP mutate pod specs?

The original PSP was validating-only; mutation requires mutating webhooks like Kyverno or custom controllers.

H3: Should I use Gatekeeper or Kyverno?

Depends on team skills: Gatekeeper is powerful with Rego, Kyverno is Kubernetes-native and simpler for many use cases.

H3: How do I migrate from PSP to PodSecurity?

Inventory PSP usage, map rules to PodSecurity levels or Gatekeeper constraints, test in staging, and roll out audit-first.

H3: What are common PSP-equivalent policy rules?

Disallow privileged, disallow hostPath, enforce runAsNonRoot, require seccomp/AppArmor.

H3: How do I test policies without breaking prod?

Use audit mode and CI policy checks, and deploy to staging mirroring prod.

H3: Who should own PSP policies?

Platform or security team owns policies; application teams own remediation and exceptions.

H3: How do I measure the impact of PSP?

Track compliance rate, deny rate, remediation time, and runtime anomalies.

H3: What happens if admission webhook fails?

If configured fail-closed it will block creations; fail-open allows through. Choose based on risk.

H3: Are PSPs enough for security?

No, they are preventive controls; combine with image scanning, network policy, and runtime security.

H3: How to handle legacy workloads requiring root?

Provide an exemption path with strict auditing and time-bound allowlists while modernizing.

H3: Can I auto-remediate policy violations?

Yes, for safe deterministic changes like injecting non-root users, but vet consequences.

H3: How to avoid noisy deny alerts?

Tune policies, aggregate alerts, use grouping and suppression windows.

H3: What telemetry is most valuable for PSP?

Admission deny events, webhook latencies, policy drift, and runtime detections.

H3: How to handle cross-cluster policies?

Use GitOps and central policy templates with per-cluster overrides.

H3: Do managed Kubernetes providers enforce PSP?

Varies / depends.


Conclusion

PSP historically provided pod-level admission controls in Kubernetes and remains a core concept for enforcing pod security. By 2026, upstream PSP is deprecated, but the principles persist via PodSecurity, OPA Gatekeeper, Kyverno, mutating/admission webhooks, and runtime tools. A robust approach combines preventive admission checks, policy-as-code in GitOps, runtime detection, and clear SRE ownership and observability.

Next 7 days plan (5 bullets)

  • Day 1: Inventory current pod specs and identify risky pod attributes.
  • Day 2: Enable kube-apiserver audit logging and forward to central logs.
  • Day 3: Create baseline policies in Git in audit mode.
  • Day 4: Add CI checks to run policy validations for merge requests.
  • Day 5: Build Prometheus/Grafana panels for basic compliance metrics.
  • Day 6: Run a small canary enforcement in a non-critical namespace.
  • Day 7: Review results, open remediation tickets, and plan next-week enforcement.

Appendix — PSP Keyword Cluster (SEO)

  • Primary keywords
  • Pod Security Policy
  • PSP Kubernetes
  • PodSecurity admission
  • Kubernetes pod security
  • Pod security best practices

  • Secondary keywords

  • Kubernetes admission controllers
  • PodSecurityPolicy deprecation
  • OPA Gatekeeper policies
  • Kyverno pod policies
  • seccomp profiles Kubernetes
  • AppArmor Kubernetes
  • runAsNonRoot enforcement
  • hostPath policy Kubernetes

  • Long-tail questions

  • How to migrate from PSP to PodSecurity
  • What replaces PodSecurityPolicy in Kubernetes
  • How to enforce non-root containers in Kubernetes
  • How to audit PSP in Kubernetes clusters
  • How to prevent privileged containers in K8s
  • How to measure pod security compliance
  • How to design pod admission policies
  • How to use Gatekeeper for pod validation
  • How to use Kyverno to mutate pod specs
  • How to integrate pod security with CI/CD
  • What is the impact of admission webhook latency
  • How to handle legacy apps with PSP rules
  • How to author seccomp profiles for pods
  • How to monitor admission deny rates
  • How to create policy-as-code for Kubernetes

  • Related terminology

  • Admission webhook
  • Mutating webhook
  • Validating webhook
  • Audit logs
  • Policy-as-code
  • GitOps policy management
  • Runtime security
  • Falco rules
  • kube-apiserver audit
  • Prometheus metrics for admission
  • Alertmanager deny alerts
  • Emergency allowlist
  • Policy drift
  • Compliance baseline
  • Cluster role binding
  • ServiceAccount policies
  • Canary policy rollout
  • Fail-open webhook
  • Fail-closed webhook
  • Policy reconciliation
  • Seccomp profile injection
  • AppArmor profile injection
  • Capability bounding
  • Least privilege enforcement
  • Pod security context
  • Node hostPath restrictions
  • HostNetwork prevention
  • Privileged container prevention
  • Mutate-and-validate pattern
  • Admission latency monitoring
  • Policy audit reports
  • SIEM integration for denies
  • Kube-state-metrics compliance
  • Kubernetes policy templates
  • Policy testing in CI
  • Postmortem policy review
  • Emergency policy rollout
  • Policy exclusion lists
  • Policy coverage by namespace
  • Policy SLOs and SLIs

Leave a Comment