What is PSP? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide)

Quick Definition (30–60 words)

Pod Security Policy (PSP) is a Kubernetes admission control mechanism that enforced pod-level security constraints. Analogy: PSP is like airport security rules for containers. Formal: PSP defines allowed pod spec features and validates pods at admission time against policy objects.

What is PSP?

What it is / what it is NOT

PSP is a Kubernetes admission control resource model used to restrict pod capabilities, e.g., privileged mode, hostPath, running as root.
PSP is NOT a runtime enforcement engine for already-running containers; it prevents creation rather than introspecting existing pods.
PSP is NOT a replacement for broader cluster security like network policies, workload identity, or image scanning.

Key properties and constraints

Admission-time enforcement: evaluates pod requests before creation.
Policy granularity: operates on pod spec fields and security context attributes.
RBAC binding: policies are applied via role or clusterrole bindings to service accounts and users.
Deprecated upstream: the built-in PSP API was deprecated and removed in recent Kubernetes versions; many clusters use PodSecurity admission or third-party controllers.
Compatibility constraints: behavior varies by Kubernetes version and vendor managed control planes.

Where it fits in modern cloud/SRE workflows

Preventive security gate in CI/CD pipeline and admission control.
Complement to runtime detection, image scanning, and network controls.
Integrated into shift-left security: AC policies are tested in pre-prod to avoid CI failures.
Used by platform teams to enforce organizational minimal privileges.

A text-only “diagram description” readers can visualize

Developer -> CI builds image -> Developer submits deployment -> API server admission chain: first webhook checks -> PSP evaluates pod spec -> If allowed, write to etcd -> Scheduler places pod -> Kubelet runs pod -> Observability and runtime security tools monitor.

PSP in one sentence

PSP is an admission-time policy model for validating Kubernetes pod specs to enforce security constraints before pods are created.

PSP vs related terms (TABLE REQUIRED)

ID	Term	How it differs from PSP	Common confusion
T1	PodSecurity admission	New builtin policy enforcement model	Often assumed identical to PSP
T2	Gatekeeper	Policy engine using OPA not PSP	People think Gatekeeper modifies PSP
T3	PodSecurityPolicy API	The deprecated PSP API object	Confused with current admission models
T4	NetworkPolicy	Controls networking not pod security	Some expect it blocks privileged containers
T5	Runtime security	Detects behavior post-start	Assumed to prevent pod creation like PSP
T6	Image scanning	Examines images not pod specs	Expected to block hostPath like PSP
T7	RBAC	Authz for subjects not pod constraints	Mistaken for policy application method
T8	Admission webhook	Mechanism not policy model	Believed to be a PSP replacement

Row Details (only if any cell says “See details below”)

None

Why does PSP matter?

Business impact (revenue, trust, risk)

Prevents privilege escalation and data exfiltration risks that can lead to breaches and regulatory fines.
Reduces blast radius from attacks, protecting customer trust and uptime.
Enables consistent enforcement across teams, lowering compliance audit costs.

Engineering impact (incident reduction, velocity)

Reduces production incidents due to insecure pod configurations.
Improves developer velocity by preventing security rework earlier in the lifecycle.
Lowers on-call load by removing a class of configuration-induced failures.

SRE framing (SLIs/SLOs/error budgets/toil/on-call) where applicable

SLIs: percent of pods compliant with baseline security policy; time-to-detect policy violations in CI.
SLOs: maintain compliance SLO versus audit requirements, e.g., 99.9% of production pods compliant.
Error budget: violations consume policy compliance budget; repeated violations trigger remediation.
Toil: manual review of pod specs becomes toil; automation via admission reduces it.
On-call: alerts for policy admission failures should be routed to platform or CI owners, not app on-call.

3–5 realistic “what breaks in production” examples

A deployment uses hostPath to mount host directories leading to data corruption across nodes.
Containers run as root and write to node filesystems, enabling escape vectors.
Privileged containers granted CAP_SYS_ADMIN break security assumptions in multi-tenant clusters.
Use of hostNetwork unexpectedly exposes sensitive service endpoints to external traffic.
Misconfigured seccomp/profile absent causes noisy kernel logs and performance degradation.

Where is PSP used? (TABLE REQUIRED)

ID	Layer/Area	How PSP appears	Typical telemetry	Common tools
L1	Edge / Ingress	Prevents hostNetwork hostPort usage	Admission deny logs	Admission webhooks
L2	Node / Kubelet	Disallow privileged pods	Kube-apiserver audit	kube-apiserver audit
L3	Service / App	Block hostPath and runAsRoot	Pod creation failures	PodSecurity admission
L4	Data / Storage	Restrict volume types	PVC bind failures	StorageClass policies
L5	Kubernetes control plane	Enforce RBAC-bound policies	Authz audit events	OPA Gatekeeper
L6	Serverless / FaaS	Limit container capabilities	Platform invocation errors	Platform admission hooks
L7	CI/CD pipeline	Pre-commit or admission testing	CI job pass/fail rates	Policy-as-code in CI
L8	Observability / Security	Feed to SIEM for compliance	Alert counts and dashboards	Falco, Kyverno

Row Details (only if needed)

None

When should you use PSP?

When it’s necessary

Multi-tenant clusters where isolation is required.
Regulated environments with compliance requirements.
Platform teams enforcing minimal privileges across teams.

When it’s optional

Single-team clusters with trusted developers and tight review processes.
Short-lived experimental clusters that are isolated and ephemeral.

When NOT to use / overuse it

Avoid overly strict global policies that block legitimate Dev workflows.
Don’t use PSP as the only security control; combine with runtime and network controls.
Avoid per-pod micromanagement that creates constant friction for developers.

Decision checklist

If multi-tenant AND compliance required -> enforce baseline policies at admission.
If single-team AND rapid experimentation -> start with advisory policies in CI.
If many legacy workloads break on first rollout -> use graduated enforcement (audit -> enforce).

Maturity ladder: Beginner -> Intermediate -> Advanced

Beginner: Add an admission gate that denies privileged and hostPath.
Intermediate: Implement policy-as-code in CI and enforce minimal runAsUser and seccomp.
Advanced: Combine PodSecurity, OPA/Gatekeeper, runtime enforcement, and automated remediations.

How does PSP work?

Explain step-by-step

Policy authoring: Define constraints (e.g., allowPrivilegeEscalation: false).
Policy binding: Bind policy to service accounts, groups or namespaces via RBAC.
Admission-time evaluation: API server or admission controller evaluates pod spec against policies.
Decision: Admit, deny, or mutate (depending on controller capability).
Audit and reporting: Log admission decisions to kube-apiserver audit and SIEM.
Remediation: CI tests or automated tools fix violations or notify owners.

Components and workflow

Policy storage: Policy objects stored in etcd or external Git (policy-as-code).
Admission chain: kube-apiserver calls controllers/webhooks in order.
Matchers: Rules match namespaces, service accounts, labels.
Action: deny, audit, or mutate pod specs.
Observability: Audit logs, metrics, and dashboards feed SRE workflows.

Data flow and lifecycle

Developer pushes manifest -> CI runs policy checks -> Developer deploys -> API server admission checks -> Pod admitted/denied -> Runtime monitoring observes behavior.

Edge cases and failure modes

Admission webhook outage can block all pod creations if webhook is synchronous and misconfigured.
Version skew: older PSP objects may not be honored in newer clusters.
RBAC misconfiguration leads to over- or under-enforcement.
Exceptions: Some system pods require elevated privileges; misclassifying them breaks control plane.

Typical architecture patterns for PSP

Baseline enforcement pattern – Use for: quick minimal security across all namespaces. – Implementation: deny privileged, enforce non-root.
Namespace-tiered pattern – Use for: multi-tenant clusters with dev/prod tiers. – Implementation: different policies per namespace tier.
GitOps policy-as-code pattern – Use for: teams using GitOps and automated reviews. – Implementation: policies stored in Git, validated by CI, applied via controllers.
Advisory-to-enforce pattern – Use for: migrations from permissive to strict enforcement. – Implementation: audit first, then enforce after remediation windows.
Mutating + validating pattern – Use for: automatic hardening (e.g., adding seccomp profiles). – Implementation: mutating webhook injects defaults, validating webhook enforces.

Failure modes & mitigation (TABLE REQUIRED)

ID	Failure mode	Symptom	Likely cause	Mitigation	Observability signal
F1	Webhook outage	Pod creation blocked	Synchronous webhook down	Use timeout and fail-open audit	Increased admission errors
F2	Overly strict policy	Many deployment failures	Broad deny rules	Audit mode then incrementally enforce	Spike in deny audit logs
F3	RBAC misbind	Policy not applied	Incorrect role binding	Correct bindings and test in staging	Discrepancy in expected vs actual denies
F4	Version incompatibility	PSP ignored or errors	Kubernetes API removal	Migrate to PodSecurity or OPA	API errors in controller logs
F5	Privileged system pods blocked	Control plane degraded	Policy applied to system ns	Exclude system namespaces	Control plane pod restarts
F6	Silent drift	Policies diverge from Git	Manual edits in-cluster	Enforce GitOps reconciliation	Config drift alerts

Row Details (only if needed)

None

Key Concepts, Keywords & Terminology for PSP

Glossary entries (40+ terms). Each entry: Term — 1–2 line definition — why it matters — common pitfall

PodSecurityPolicy — Deprecated Kubernetes API for pod admission controls — Central historic model — Pitfall: removed in newer K8s.
PodSecurity admission — Replacement builtin admission controller enforcing pod security standards — Important as current recommended model — Pitfall: behavior differs from PSP.
Admission controller — Component that intercepts API requests — Core enforcement point — Pitfall: misconfigured webhook can block cluster.
Admission webhook — External service called during admission — Enables custom policies — Pitfall: availability impacts pod creation.
OPA Gatekeeper — Policy engine using Open Policy Agent — Flexible policy-as-code — Pitfall: complexity and performance considerations.
Kyverno — Kubernetes native policy engine — Simpler policy syntax for K8s — Pitfall: version compatibility.
RBAC — Role-based access control for subjects — Defines who can create pods — Pitfall: over-permissive roles.
Namespace — K8s logical partition — Allows per-namespace policies — Pitfall: forgetting system namespaces.
ServiceAccount — Identity for workloads — Bind policies to SA for least privilege — Pitfall: default SA surprises.
seccomp — Kernel syscall filtering for containers — Reduces attack surface — Pitfall: missing profile causes permissive syscalls.
runAsUser — Security context setting to avoid root — Prevents privilege escalation — Pitfall: legacy images require root.
runAsNonRoot — Enforce non-root container processes — Simple safety check — Pitfall: false positives in init containers.
allowPrivilegeEscalation — Controls setuid usage — Prevents kernel privilege escalation — Pitfall: needed for some debuggers.
hostPath — Mount host filesystem into pod — Dangerous for isolation — Pitfall: used for convenience in prod.
hostNetwork — Shares node network namespace — Exposes node ports — Pitfall: unexpected external exposure.
hostPID — Shares node process namespace — Security risk for node introspection — Pitfall: needed by some debugging tools.
capabilities — Linux capabilities granting fine-grained privileges — Controls powerful ops like NET_ADMIN — Pitfall: granting CAP_SYS_ADMIN is near-root.
privileged container — Full host access like root — Highest risk — Pitfall: used for convenience in init workloads.
SELinux — Mandatory access control for processes — Adds defense layer — Pitfall: complex labels and policy tuning.
AppArmor — Kernel security module for confinement — Reduces program actions — Pitfall: profile maintenance overhead.
Mutating webhook — Alters requests, e.g., inject seccomp — Used for auto-hardening — Pitfall: unexpected changes to manifests.
Validating webhook — Accept/deny admission requests — Enforces policies — Pitfall: blocks without clear remediation.
GitOps — Policy-as-code workflows stored in Git — Enables reproducibility — Pitfall: delayed reconciliation can cause drift.
Policy-as-code — Express policies in versioned code — Improves reviewability — Pitfall: overcomplex rules.
Audit logs — Records of admission decisions — Required for compliance — Pitfall: noisy logs if policy too verbose.
SIEM — Security information and event management — Centralizes alerts — Pitfall: high signal-to-noise if unfiltered.
Least privilege — Principle to minimize permissions — Core security idea — Pitfall: too strict may break apps.
Mutate-and-validate pattern — Inject defaults then enforce — Reduces friction — Pitfall: order of webhooks matters.
Admission latency — Time added by webhooks — Affects deployment speed — Pitfall: slow webhooks slow CI.
Fail-open vs fail-closed — Webhook failure behavior — Decides blocking behavior — Pitfall: fail-open may permit bad pods.
PodSecurity standard levels — e.g., privileged, baseline, restricted — Defines graded constraints — Pitfall: mislabeling namespaces.
Scanning vs enforcement — Image scanning looks at images, PSP checks pod specs — Complementary controls — Pitfall: relying on one alone.
Runtime security (Falco) — Detects behavioral anomalies — Covers runtime gaps — Pitfall: alerts without context.
Immutable infrastructure — Avoid manual in-cluster edits — Promotes reproducibility — Pitfall: manual fixes create drift.
Canary policies — Gradual enforcement approach — Useful for migration — Pitfall: partial enforcement complexity.
Policy templates — Reusable rule patterns — Aid consistency — Pitfall: hidden complexity in templates.
Compliance baseline — Organization policy requirements — Guides PSP design — Pitfall: baselines too generic.
Policy reconciliation — Ensure desired state applied — Keeps clusters consistent — Pitfall: reconciliation lag.
Cluster-wide vs namespace policies — Different scope impacts — Pitfall: cluster policies can break system components.
Emergency allowlist — Temporary exemptions for critical fixes — Operational necessity — Pitfall: abused and left in place.
Capability bounding — Limit set of Linux capabilities — Prevent escalation — Pitfall: misidentifying required caps.
Pod security context — Aggregated security settings per pod — Central to PSP checks — Pitfall: omissions cause denials.

How to Measure PSP (Metrics, SLIs, SLOs) (TABLE REQUIRED)

ID	Metric/SLI	What it tells you	How to measure	Starting target	Gotchas
M1	Pod compliance rate	Percent pods meeting policy	Count compliant pods / total pods	99% for prod	Some system pods excluded
M2	Admission deny rate	Fraction of admissions denied	Deny events / total admissions	<1% after rollout	Deny spikes indicate dev friction
M3	Time to remediate violation	Time from deny to fix	Time tracked in ticketing	<48 hours for prod	Long lead due to cross-team handoffs
M4	Audit denial alerts	Number of denied events alerting ops	Count denies from audit logs	Configurable threshold	High noise if policy verbose
M5	Policy drift frequency	Number of in-cluster edits not in Git	Drift events per week	0 for GitOps	Requires detection tooling
M6	Admission latency	Extra ms added by policy checks	Median webhook latency	<200ms	Long latencies slow CI
M7	Unauthorized privilege escalations	Runtime detections post-admit	Runtime alerts correlated to pod	0 for prod	Runtime tools needed
M8	Exceptions count	Number of emergency allowlist uses	Count per time window	Low and audited	Abuse of allowlist possible
M9	CI policy failure rate	CI jobs failing policy checks	Failures / CI policy jobs	<2% post stabilization	Early migration may spike
M10	Coverage of namespaces	Percent namespaces covered by PSP	CoveredNamespaces / totalNamespaces	100% for regulated clusters	System namespaces may be exempt

Row Details (only if needed)

None

Best tools to measure PSP

Tool — Prometheus + kube-state-metrics

What it measures for PSP: Pod counts, admission events, webhook metrics.
Best-fit environment: Kubernetes clusters with metrics stack.
Setup outline:
Deploy kube-state-metrics.
Instrument admission controllers to expose metrics.
Create Prometheus rules to compute compliance rates.
Configure Alertmanager with alarms for deny spikes.
Strengths:
Flexible queries and alerting.
Widely used in cloud-native stacks.
Limitations:
Requires metric exposition from webhooks.
Not a SIEM replacement.

Tool — Fluentd / Fluent Bit + ELK

What it measures for PSP: Collects audit logs and denial events.
Best-fit environment: Clusters with centralized logging.
Setup outline:
Enable kube-apiserver audit logs.
Forward logs to Elasticsearch.
Create dashboards for deny events.
Strengths:
Rich search across logs.
Good for compliance audits.
Limitations:
Storage costs for large logs.
Requires field normalization.

Tool — OPA Gatekeeper

What it measures for PSP: Policy violations and audit reports.
Best-fit environment: Policy-as-code users.
Setup outline:
Install Gatekeeper & ConstraintTemplates.
Create Constraints for desired rules.
Use audit mode and capture reports.
Strengths:
Expressive Rego policies.
Audit capabilities.
Limitations:
Rego learning curve.
Performance tuning may be needed.

Tool — Kyverno

What it measures for PSP: Validation, mutation, and policy audit events.
Best-fit environment: Kubernetes-native policy needs.
Setup outline:
Install Kyverno.
Define policies in YAML.
Use mutate to inject defaults and validate to enforce.
Strengths:
K8s-like policy syntax.
Easier onboarding.
Limitations:
May lack some advanced Rego features.

Tool — Falco

What it measures for PSP: Runtime violations that indicate admission gaps.
Best-fit environment: Runtime security observability.
Setup outline:
Deploy Falco as DaemonSet.
Configure rules for privilege escalation patterns.
Forward alerts to SIEM/Alertmanager.
Strengths:
Detects behavioral anomalies.
Complements admission controls.
Limitations:
False positives if rules not tuned.

Recommended dashboards & alerts for PSP

Executive dashboard

Panels:
Pod compliance rate (trend).
Number of denied admissions by namespace.
Time-to-remediate median.
Policy drift count.
Why:
Shows compliance health to leadership.

On-call dashboard

Panels:
Recent admission denies with stacktrace.
Admission webhook latency and error rate.
Namespaces with repeated denies.
Active exceptions/allowlist entries.
Why:
Rapid triage during incidents and deployment failures.

Debug dashboard

Panels:
Raw audit log stream filtered for policy events.
Per-webhook latency and error logs.
Pod spec differences between requested and mutated.
Timeline of CI fail rate for policy checks.
Why:
Deep troubleshooting for policy failures.

Alerting guidance

What should page vs ticket:
Page: Admission webhook down or high error rate impacting pod creation.
Ticket: Individual deployment denies for developers.
Burn-rate guidance:
If deny rate consumes more than 25% of weekly change-related tolerance, trigger review.
Noise reduction tactics:
Deduplicate identical denies.
Group alerts by namespace or service account.
Suppress during maintenance windows and known rollouts.

Implementation Guide (Step-by-step)

1) Prerequisites – Cluster admin privileges or platform team involvement. – CI and GitOps pipelines in place. – Observability stack capturing audit logs and metrics.

2) Instrumentation plan – Add metrics to admission controllers. – Ensure audit logging is enabled on kube-apiserver. – Plan policies in Git with review workflows.

3) Data collection – Forward audit logs to central logging. – Export metrics to Prometheus. – Store policy state in Git.

4) SLO design – Define SLOs for compliance rate and time-to-remediate. – Map SLOs to services and namespaces.

5) Dashboards – Build executive, on-call, and debug dashboards. – Provide drill-down links from exec panels to debug panels.

6) Alerts & routing – Create alerts for webhook health and deny spikes. – Route to platform on-call for blocking issues. – Route policy violations to application owners via ticketing.

7) Runbooks & automation – Create runbooks for webhook outages, policy denial investigations, and emergency allowlist processes. – Automate remediation for common fixes (e.g., add runAsUser where safe).

8) Validation (load/chaos/game days) – Run load tests for admission latency impacts. – Conduct chaos tests that simulate webhook failure. – Run game days to validate incident response.

9) Continuous improvement – Weekly policy reviews. – Quarterly audits and SLO reviews. – Postmortem-driven policy adjustments.

Pre-production checklist

Policies authored and stored in Git.
CI policy tests added to pipeline.
Staging cluster mirrors production policy enforcement.
Observability capturing admission and audit logs.
Runbooks for expected failures.

Production readiness checklist

Canary rollout with audit-mode first.
Metrics and alerts configured and tested.
Emergency allowlist process documented and limited.
Training for app teams on common fixes.

Incident checklist specific to PSP

Confirm scope: which namespaces and service accounts affected.
Check webhook health and API server logs.
Determine if deny is expected or due to policy drift.
If webhook down, assess fail-open configuration and restore service.
Apply temporary allowlist if safe and document.

Use Cases of PSP

Provide 8–12 use cases:

1) Multi-tenant SaaS cluster – Context: Multiple customers share cluster. – Problem: Isolation breaches risk data leaks. – Why PSP helps: Enforce least privilege for tenants. – What to measure: Pod compliance rate, unauthorized privileges. – Typical tools: PodSecurity, Gatekeeper, Prometheus.

2) Regulated environment (PCI/ISO) – Context: Compliance auditing required. – Problem: Inconsistent security posture across teams. – Why PSP helps: Standardize enforcement and produce audit logs. – What to measure: Policy drift, compliance SLOs. – Typical tools: PodSecurity, audit logging, SIEM.

3) Platform-as-a-Service team – Context: Platform team provides managed namespaces. – Problem: Developers bypassing guidelines. – Why PSP helps: Prevent risky pods before they run. – What to measure: CI policy failure rates. – Typical tools: Kyverno, GitOps.

4) CI/CD hardening – Context: Deployments automated via pipelines. – Problem: Broken deployments due to runtime privilege assumptions. – Why PSP helps: Fail early in CI to avoid prod incidents. – What to measure: CI policy failures, remediation time. – Typical tools: Policy-as-code, CI plugins.

5) Securing edge workloads – Context: Edge nodes run untrusted workloads. – Problem: Attack on edge node affects fleet. – Why PSP helps: Block hostNetwork and hostPath on edge pods. – What to measure: HostPath denies, hostNetwork usage. – Typical tools: PodSecurity admission, Falco.

6) Legacy migration – Context: Moving older workloads to K8s. – Problem: Many containers require root. – Why PSP helps: Gradual enforcement to modernize apps. – What to measure: Number of exemptions and trend. – Typical tools: Audit-mode policies, canary enforcement.

7) Serverless platform constraints – Context: Managed FaaS on K8s underneath. – Problem: Function runtimes gaining unintended capabilities. – Why PSP helps: Enforce minimal syscall surfaces. – What to measure: Runtime detections and denials. – Typical tools: Kyverno, seccomp profiles.

8) Incident containment automation – Context: Post-breach containment required. – Problem: Need to quickly limit new risky pods. – Why PSP helps: Quickly apply stricter policies cluster-wide. – What to measure: Time to apply emergency policy, deny rate. – Typical tools: GitOps for fast policy deployment.

9) Cost control (indirect) – Context: Privileged pods accessing node-level resources. – Problem: Unintended resource reserves and scheduling inefficiencies. – Why PSP helps: Prevent hostResource claims that remove capacity. – What to measure: Host-bound deployments and node utilization. – Typical tools: Admission policies and scheduler metrics.

10) Platform onboarding – Context: New team joining shared cluster. – Problem: Lack of standardized practices increases risk. – Why PSP helps: Provide baseline constraints and onboarding templates. – What to measure: First-week compliance rate and ROX. – Typical tools: Templates in Git, CI tests.

Scenario Examples (Realistic, End-to-End)

Scenario #1 — Kubernetes: Multi-tenant baseline enforcement

Context: A SaaS provider runs multiple customer namespaces in one cluster.
Goal: Enforce baseline security without breaking existing workloads.
Why PSP matters here: Prevents privilege escalation and protects shared nodes.
Architecture / workflow: GitOps for policy definitions, Kyverno for mutation/validation, Prometheus for metrics, Fluentd for audits.
Step-by-step implementation:

Inventory current pod specs in prod.
Create baseline policies denying privileged, hostPath, hostNetwork.
Deploy policies in audit mode for 2 weeks.
Fix violations and provide developer guidance.
Switch to enforce mode for non-system namespaces. What to measure: Pod compliance rate, deny events per namespace, time-to-remediate.
Tools to use and why: Kyverno for easy K8s-style policies; Prometheus for metrics; Fluentd for audit logs.
Common pitfalls: Not exempting kube-system causing control plane failures.
Validation: Run canary deployments with known-good manifests.
Outcome: Baseline enforced with minimal disruptions and measurable compliance.

Scenario #2 — Serverless / Managed-PaaS: Function sandboxing

Context: A company operates an internal FaaS platform on K8s.
Goal: Ensure functions cannot use host resources or escalate privileges.
Why PSP matters here: Functions are highly dynamic and riskier if permissive.
Architecture / workflow: Platform admission webhooks that mutate function pods to include seccomp and drop capabilities; Gatekeeper validates.
Step-by-step implementation:

Define seccomp and capability baselines.
Mutating webhook injects defaults into function pods.
Gatekeeper validates no hostPath or privileged flags.
CI validates functions against policies before deployment. What to measure: Function pod compliance and runtime detections.
Tools to use and why: Mutating webhook for injection, Gatekeeper for validation, Falco for runtime.
Common pitfalls: Breakage of native libs requiring specific capabilities.
Validation: Canary with synthetic functions and runtime checks.
Outcome: Functions run in tighter sandboxes with reduced blast radius.

Scenario #3 — Incident response / Postmortem: Privilege exploit mitigation

Context: A container escape vulnerability exploited by an attacker to read node files.
Goal: Contain ongoing exploitation and prevent new risky pods.
Why PSP matters here: Quickly restrict new pods from using hostPath or privileged modes.
Architecture / workflow: Emergency policy pushed via GitOps; admission validates new pods; Falco watches for post-admit anomalies.
Step-by-step implementation:

Declare incident and notify platform on-call.
Apply emergency deny policy cluster-wide excluding kube-system.
Monitor for new deny events and retroactive runtime alerts.
Remediate running risky pods via orchestration.
Postmortem to remove allowlists and refine policies. What to measure: Time to apply emergency policy and reduction in risky pods.
Tools to use and why: GitOps for quick policy rollouts, Falco for runtime monitoring.
Common pitfalls: Overly broad emergency rules breaking legitimate jobs.
Validation: Confirm new pods are denied and runtime anomalies decline.
Outcome: Containment of the exploit vector while follow-up patches are deployed.

Scenario #4 — Cost / Performance trade-off: Limiting node-affinity privileged workloads

Context: Privileged workloads were allowed to reserve host devices causing scheduling hotspots.
Goal: Reduce node contention and improve cost efficiency.
Why PSP matters here: Prevents pods from requesting host resources unnecessarily.
Architecture / workflow: Enforcement via policy to forbid hostPath and hostNetwork for non-admin namespaces; scheduler metrics track node load.
Step-by-step implementation:

Identify pods using hostPath/hostNetwork.
Create policy disallowing host bindings in app namespaces.
Educate teams on alternatives (CSI drivers, local PVs with eviction).
Enforce policy and monitor node utilization and costs. What to measure: Host-bound pod count, node utilization, cost per workload.
Tools to use and why: Prometheus for node metrics, Gatekeeper for enforcement.
Common pitfalls: Legacy storage needs needing migration effort.
Validation: Observe reduced node saturation and lower costs.
Outcome: Improved packing and cost reduction while maintaining app functionality.

Common Mistakes, Anti-patterns, and Troubleshooting

List of mistakes with Symptom -> Root cause -> Fix (15–25 items)

Symptom: Many deployments suddenly fail. -> Root cause: Enforced policy applied cluster-wide without audit stage. -> Fix: Roll back to audit mode and stage enforcement.
Symptom: Control plane pods restart. -> Root cause: Policy applied to kube-system namespaces. -> Fix: Exempt kube-system or bind policy selectively.
Symptom: CI pipeline fails for legacy apps. -> Root cause: No migration path or advisory checks. -> Fix: Add migration tasks and config transforms in CI.
Symptom: Webhook timeout blocking deploys. -> Root cause: Synchronous webhook slow response. -> Fix: Increase timeouts, optimize webhook, use caching, fail-open if acceptable.
Symptom: High alert noise for denies. -> Root cause: Broad deny rules catching benign patterns. -> Fix: Tweak rules, add exceptions, group alerts.
Symptom: Runtime escape detected after admission. -> Root cause: Admission misses runtime behavior. -> Fix: Add runtime security tools like Falco and correlate events.
Symptom: Policy drift between Git and cluster. -> Root cause: Manual in-cluster edits. -> Fix: Enforce GitOps reconciliation and audit.
Symptom: Developers request privileges frequently. -> Root cause: Missing capabilities/incompatible images. -> Fix: Provide developer guidance, alternative images, or safe allowlists.
Symptom: Slow troubleshooting for denied pods. -> Root cause: Poor audit log indexing. -> Fix: Improve logging pipeline and searchable fields.
Symptom: Too many exceptions. -> Root cause: Emergency allowlist overused. -> Fix: Time-bound allowlists and post-incident review.
Symptom: False positives in runtime alerts. -> Root cause: Un-tuned rules. -> Fix: Tune Falco/IDS rules for environment.
Symptom: Admission controller memory pressure. -> Root cause: Complex policy evaluation. -> Fix: Simplify policies or scale controller replicas.
Symptom: Unauthorized privilege escapes not detected. -> Root cause: No runtime coverage. -> Fix: Deploy additional runtime sensors and process baselines.
Symptom: Policy regressions after upgrade. -> Root cause: API behavior changes across K8s versions. -> Fix: Test policies during cluster upgrades in staging.
Symptom: Too many one-off policies. -> Root cause: Lack of reuse and templates. -> Fix: Create reusable policy templates.
Symptom: Missing seccomp profiles. -> Root cause: OS/container runtime mismatch. -> Fix: Standardize runtimes and maintain profiles.
Symptom: App failures masked by admission denies. -> Root cause: Poor error messaging in deny responses. -> Fix: Provide detailed deny messages and remediation steps.
Symptom: Observability blind spots. -> Root cause: Not collecting admission metrics. -> Fix: Instrument webhooks and export metrics.
Symptom: Performance regressions with mutation. -> Root cause: Mutating webhook injects heavy sidecars. -> Fix: Re-evaluate injected artifacts and tune.
Symptom: Security policy conflicts. -> Root cause: Multiple policy engines with overlapping rules. -> Fix: Consolidate or document precedence.
Symptom: Unmonitored allowlist usage. -> Root cause: Lack of audit for exemptions. -> Fix: Log and review all allowlist entries periodically.
Symptom: Poor developer adoption. -> Root cause: No training and unclear guidance. -> Fix: Provide examples, templates, and office hours.
Symptom: Excessive manual remediation work. -> Root cause: No automation for common fixes. -> Fix: Create automation playbooks and PR bots.

Observability pitfalls (at least 5 included above): noisy logs, missing metrics, indexing gaps, false positives, lack of correlation between admission and runtime.

Best Practices & Operating Model

Ownership and on-call

Platform team owns policy lifecycle and on-call for blocking webhook failures.
Application teams own remediation for their violations.
Create a clear escalation path when enforcement blocks critical business workflows.

Runbooks vs playbooks

Runbooks: Step-by-step operational procedures for common issues (webhook down, emergency allowlist).
Playbooks: Higher-level decision guides for incidents requiring judgment (policy rollback vs enforce).

Safe deployments (canary/rollback)

Start in audit mode, then small-namespace canary, then full enforce.
Keep quick rollback paths and automated tests in CI to detect breakages.

Toil reduction and automation

Automate remediation PRs for simple fixes (e.g., inject runAsUser).
Use policy templates and GitOps to avoid manual edits.

Security basics

Default deny for capabilities and privileged flags.
Enforce non-root where possible.
Apply seccomp and AppArmor profiles.
Monitor runtime for deviations.

Weekly/monthly routines

Weekly: Review deny spikes and open remediation tickets.
Monthly: Audit allowlist entries and drift reports.
Quarterly: SLO review and policy effectiveness report.

What to review in postmortems related to PSP

Whether policy prevented or caused the incident.
Time to detect and remediate policy violations.
Any changes to allowlists and their justification.
Lessons to tighten or relax policies.

Tooling & Integration Map for PSP (TABLE REQUIRED)

ID	Category	What it does	Key integrations	Notes
I1	Policy engine	Validates and mutates pod specs	Kubernetes admission, GitOps	Gatekeeper and Kyverno common choices
I2	Audit logging	Collects admission decisions	SIEM, ELK, cloud logging	kube-apiserver audit must be enabled
I3	Metrics store	Stores compliance and latency metrics	Prometheus, Alertmanager	Needs metrics from controllers
I4	Runtime security	Detects runtime anomalies	Falco, runtime scanners	Complements admission controls
I5	GitOps	Manages policy-as-code	ArgoCD, Flux	Ensures reconciliation
I6	CI integration	Runs policy checks pre-deploy	Jenkins, GitHub Actions	Prevents violations before admission
I7	Dashboarding	Visualizes compliance	Grafana, Kibana	Executive and debug views
I8	Identity / AuthN	Maps service accounts and users	OIDC, IAM	Critical for correct policy binding
I9	Secrets & config	Securely store seccomp/AppArmor files	Vault, K8s Secrets	Sensitive artifacts storage
I10	Incident mgmt	Routes alerts and tickets	PagerDuty, Opsgenie	On-call routing for platform team

Row Details (only if needed)

None

Frequently Asked Questions (FAQs)

H3: What does PSP stand for in Kubernetes?

Pod Security Policy in historical Kubernetes context; replaced by PodSecurity and third-party engines in newer K8s.

H3: Is PSP still supported in Kubernetes 1.27+?

No, the built-in PSP API was deprecated earlier and removed in later releases. Use PodSecurity, OPA Gatekeeper, or Kyverno.

H3: What’s the difference between PodSecurity and PSP?

PodSecurity is the newer builtin admission mode with standard levels; PSP was a more flexible but deprecated API.

H3: Can PSP mutate pod specs?

The original PSP was validating-only; mutation requires mutating webhooks like Kyverno or custom controllers.

H3: Should I use Gatekeeper or Kyverno?

Depends on team skills: Gatekeeper is powerful with Rego, Kyverno is Kubernetes-native and simpler for many use cases.

H3: How do I migrate from PSP to PodSecurity?

Inventory PSP usage, map rules to PodSecurity levels or Gatekeeper constraints, test in staging, and roll out audit-first.

H3: What are common PSP-equivalent policy rules?

Disallow privileged, disallow hostPath, enforce runAsNonRoot, require seccomp/AppArmor.

H3: How do I test policies without breaking prod?

Use audit mode and CI policy checks, and deploy to staging mirroring prod.

H3: Who should own PSP policies?

Platform or security team owns policies; application teams own remediation and exceptions.

H3: How do I measure the impact of PSP?

Track compliance rate, deny rate, remediation time, and runtime anomalies.

H3: What happens if admission webhook fails?

If configured fail-closed it will block creations; fail-open allows through. Choose based on risk.

H3: Are PSPs enough for security?

No, they are preventive controls; combine with image scanning, network policy, and runtime security.

H3: How to handle legacy workloads requiring root?

Provide an exemption path with strict auditing and time-bound allowlists while modernizing.

H3: Can I auto-remediate policy violations?

Yes, for safe deterministic changes like injecting non-root users, but vet consequences.

H3: How to avoid noisy deny alerts?

Tune policies, aggregate alerts, use grouping and suppression windows.

H3: What telemetry is most valuable for PSP?

Admission deny events, webhook latencies, policy drift, and runtime detections.

H3: How to handle cross-cluster policies?

Use GitOps and central policy templates with per-cluster overrides.

H3: Do managed Kubernetes providers enforce PSP?

Varies / depends.

Conclusion

PSP historically provided pod-level admission controls in Kubernetes and remains a core concept for enforcing pod security. By 2026, upstream PSP is deprecated, but the principles persist via PodSecurity, OPA Gatekeeper, Kyverno, mutating/admission webhooks, and runtime tools. A robust approach combines preventive admission checks, policy-as-code in GitOps, runtime detection, and clear SRE ownership and observability.

Next 7 days plan (5 bullets)

Day 1: Inventory current pod specs and identify risky pod attributes.
Day 2: Enable kube-apiserver audit logging and forward to central logs.
Day 3: Create baseline policies in Git in audit mode.
Day 4: Add CI checks to run policy validations for merge requests.
Day 5: Build Prometheus/Grafana panels for basic compliance metrics.
Day 6: Run a small canary enforcement in a non-critical namespace.
Day 7: Review results, open remediation tickets, and plan next-week enforcement.

Appendix — PSP Keyword Cluster (SEO)

Primary keywords
Pod Security Policy
PSP Kubernetes
PodSecurity admission
Kubernetes pod security
Pod security best practices
Secondary keywords
Kubernetes admission controllers
PodSecurityPolicy deprecation
OPA Gatekeeper policies
Kyverno pod policies
seccomp profiles Kubernetes
AppArmor Kubernetes
runAsNonRoot enforcement
hostPath policy Kubernetes
Long-tail questions
How to migrate from PSP to PodSecurity
What replaces PodSecurityPolicy in Kubernetes
How to enforce non-root containers in Kubernetes
How to audit PSP in Kubernetes clusters
How to prevent privileged containers in K8s
How to measure pod security compliance
How to design pod admission policies
How to use Gatekeeper for pod validation
How to use Kyverno to mutate pod specs
How to integrate pod security with CI/CD
What is the impact of admission webhook latency
How to handle legacy apps with PSP rules
How to author seccomp profiles for pods
How to monitor admission deny rates
How to create policy-as-code for Kubernetes
Related terminology
Admission webhook
Mutating webhook
Validating webhook
Audit logs
Policy-as-code
GitOps policy management
Runtime security
Falco rules
kube-apiserver audit
Prometheus metrics for admission
Alertmanager deny alerts
Emergency allowlist
Policy drift
Compliance baseline
Cluster role binding
ServiceAccount policies
Canary policy rollout
Fail-open webhook
Fail-closed webhook
Policy reconciliation
Seccomp profile injection
AppArmor profile injection
Capability bounding
Least privilege enforcement
Pod security context
Node hostPath restrictions
HostNetwork prevention
Privileged container prevention
Mutate-and-validate pattern
Admission latency monitoring
Policy audit reports
SIEM integration for denies
Kube-state-metrics compliance
Kubernetes policy templates
Policy testing in CI
Postmortem policy review
Emergency policy rollout
Policy exclusion lists
Policy coverage by namespace
Policy SLOs and SLIs

Quick Definition (30–60 words)

What is PSP?

PSP in one sentence

PSP vs related terms (TABLE REQUIRED)

Row Details (only if any cell says “See details below”)

Why does PSP matter?

Where is PSP used? (TABLE REQUIRED)

Row Details (only if needed)

When should you use PSP?

How does PSP work?

Typical architecture patterns for PSP

Failure modes & mitigation (TABLE REQUIRED)

Row Details (only if needed)

Key Concepts, Keywords & Terminology for PSP

How to Measure PSP (Metrics, SLIs, SLOs) (TABLE REQUIRED)

Row Details (only if needed)

Best tools to measure PSP

Tool — Prometheus + kube-state-metrics

Tool — Fluentd / Fluent Bit + ELK

Tool — OPA Gatekeeper

Tool — Kyverno

Tool — Falco

Recommended dashboards & alerts for PSP

Implementation Guide (Step-by-step)

Use Cases of PSP

Scenario Examples (Realistic, End-to-End)

Scenario #1 — Kubernetes: Multi-tenant baseline enforcement

Scenario #2 — Serverless / Managed-PaaS: Function sandboxing

Scenario #3 — Incident response / Postmortem: Privilege exploit mitigation

Scenario #4 — Cost / Performance trade-off: Limiting node-affinity privileged workloads

Common Mistakes, Anti-patterns, and Troubleshooting

Best Practices & Operating Model

Tooling & Integration Map for PSP (TABLE REQUIRED)

Row Details (only if needed)

Frequently Asked Questions (FAQs)

H3: What does PSP stand for in Kubernetes?

H3: Is PSP still supported in Kubernetes 1.27+?

H3: What’s the difference between PodSecurity and PSP?

H3: Can PSP mutate pod specs?

H3: Should I use Gatekeeper or Kyverno?

H3: How do I migrate from PSP to PodSecurity?

H3: What are common PSP-equivalent policy rules?

H3: How do I test policies without breaking prod?

H3: Who should own PSP policies?

H3: How do I measure the impact of PSP?

H3: What happens if admission webhook fails?

H3: Are PSPs enough for security?

H3: How to handle legacy workloads requiring root?

H3: Can I auto-remediate policy violations?

H3: How to avoid noisy deny alerts?

H3: What telemetry is most valuable for PSP?

H3: How to handle cross-cluster policies?

H3: Do managed Kubernetes providers enforce PSP?

Conclusion

Appendix — PSP Keyword Cluster (SEO)

Leave a Comment Cancel reply