What is Pod Security Policy? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide)

Quick Definition (30–60 words)

Pod Security Policy is a Kubernetes admission control mechanism that enforces security constraints on pod specifications. Analogy: a building code that inspectors use to accept or reject new construction. Formal: an API-driven policy object evaluated at pod admission to allow or deny pod creation.

What is Pod Security Policy?

Pod Security Policy (PSP) is a Kubernetes mechanism designed to control the security-sensitive aspects of pod specification at admission time. It governs capabilities, privilege escalation, host namespaces, volumes, Linux capabilities, seccomp, SELinux contexts, and more. PSP itself is an admission controller type implemented as an API resource paired with RBAC bindings to grant the ability to use specific policies.

What it is NOT:

Not a runtime enforcement agent for live processes.
Not a replacement for network policies, image scanning, or node hardening.
Not a fine-grained workload runtime monitor.

Key properties and constraints:

Declarative policy objects applied at admission.
RBAC controls which service accounts or users can use which PSP.
Evaluated during pod creation and certain pod updates.
Coexists with other admission controllers but ordering and priorities matter.
PSP is deprecated in some Kubernetes versions and replaced by newer mechanisms in many distributions; implementations vary.

Where it fits in modern cloud/SRE workflows:

Policy-as-code pipeline point: enforced during CI/CD as admission test and in-cluster admission.
Part of defense-in-depth: complements image policy, runtime detection, network controls.
Useful for reducing blast radius and enforcing least privilege across multi-tenant clusters.
Integrated with GitOps workflows to validate manifests before merge.

Diagram description (text-only):

Cluster control plane receives pod create request -> Admission controller chain runs -> PSP evaluation step checks pod spec vs matched PSPs -> If permitted and user/serviceaccount has use permission -> Admission allows pod to be persisted -> Pod scheduled to node -> Runtime and CNI enforce runtime constraints.

Pod Security Policy in one sentence

Pod Security Policy is an admission-time policy object in Kubernetes that enforces pod-level security constraints to ensure pods run with approved privileges and environment settings.

Pod Security Policy vs related terms (TABLE REQUIRED)

ID	Term	How it differs from Pod Security Policy	Common confusion
T1	PodSecurityAdmission	Controls pods via namespaced profiles not PSP objects	Many think it’s PSP replacement
T2	OPA Gatekeeper	Policy engine for admission decisions not native PSP	People assume identical APIs
T3	Kyverno	Kubernetes-native policy controller that mutates and validates	Often confused with static PSP behavior
T4	NetworkPolicy	Controls traffic not pod privileges	Mistaken as pod-level security control
T5	RuntimeSecurity	Monitors running processes rather than admission	Confused with admission enforcement
T6	ImagePolicy	Validates images not pod spec privileges	Often conflated with PSP scope
T7	RBAC	Controls who can use PSP but not the pod constraints	People think RBAC enforces pod settings
T8	SELinux	OS-level labeling not admission object	Mistaken as admission-only control
T9	Seccomp	Kernel sandboxing profile not full PSP replacement	Believed to be equivalent to PSP
T10	AdmissionController	Framework hosting PSP among others	People call PSP an admission controller itself

Row Details (only if any cell says “See details below”)

None

Why does Pod Security Policy matter?

Business impact:

Reduces risk of data breaches by preventing privileged workloads from running on shared clusters.
Protects revenue and trust by limiting attack surfaces that could lead to downtime, data loss, or compliance failures.
Helps meet regulatory expectations for least privilege and controlled execution environments.

Engineering impact:

Decreases incident frequency from misconfigured pods (e.g., hostPath mounts exposing secrets).
Reduces blast radius when privilege escalation is prevented.
Slightly increases initial development friction but enables safer velocity at scale.

SRE framing:

SLIs: percentage of pods compliant at admission, mean time to remediate noncompliant manifests.
SLOs: e.g., 99.9% of production pods match approved policies.
Error budget: use to justify exceptions or temporary policy loosening during launches.
Toil: centralize policy to reduce manual reviews; automate exceptions.
On-call: reduces noisy security incidents from misconfigurations but introduces admission-failure incidents if policies are too strict.

What breaks in production (realistic examples):

CI deploys a pod with hostPath to mount node secrets -> attacker escalates -> data exfiltration.
An application requires NET_ADMIN capability but policy denies it, causing service outage due to lack of permissions.
Overly permissive PSP allows privileged containers that compromise node kernel, leading to cluster-wide impact.
Mistyped PSP RBAC blocks a system component service account, preventing pod creations and causing cascading failures.

Where is Pod Security Policy used? (TABLE REQUIRED)

ID	Layer/Area	How Pod Security Policy appears	Typical telemetry	Common tools
L1	Cluster control	Admission-time policy objects enforced at API server	Admission failure logs	kube-apiserver
L2	CI/CD pipeline	Validation tests rejecting noncompliant manifests	Pre-merge test results	CI runners
L3	Platform teams	Standardized PSP templates for tenants	Policy drift alerts	GitOps tools
L4	Multi-tenant isolation	Per-namespace PSP bindings	Tenant violation events	RBAC systems
L5	Compliance	Audit trails of denied pods	Audit logs and reports	Audit log aggregators
L6	Runtime defense	Supports runtime constraints but not runtime monitoring	Runtime mismatches	Runtime security agents
L7	Managed Kubernetes	Provider may replace PSP with similar controls	Provider audit events	Provider control plane
L8	Serverless/PaaS	Restricts allowed container features for functions	Invocation failures	Platform controllers

Row Details (only if needed)

None

When should you use Pod Security Policy?

When it’s necessary:

Multi-tenant clusters where lateral movement risk must be mitigated.
Regulated environments requiring controlled execution contexts.
Environments with shared node resources or privileged CI workloads.

When it’s optional:

Single-team clusters with strict network isolation and strong image controls.
Development environments where speed is prioritized and there are compensating controls.

When NOT to use / overuse:

Avoid over-restricting early-stage developer clusters; it increases rework.
Don’t rely solely on PSP to solve runtime detection or image supply-chain issues.
Avoid complex per-pod exceptions unless automated; they become unmaintainable.

Decision checklist:

If multi-tenant AND shared nodes -> enforce strict PSP.
If you have GitOps pipeline AND automated tests -> validate PSP in CI.
If serverless provider enforces runtime constraints -> evaluate provider controls instead of PSP.

Maturity ladder:

Beginner: Enforce basic non-root, forbidden hostPath, no privileged containers.
Intermediate: Add seccomp/SELinux policies, restrict capabilities, automate CI checks.
Advanced: Fine-grained per-service account policies, integration with OPA/Gatekeeper, automated exception workflows, and telemetry with SLIs.

How does Pod Security Policy work?

Components and workflow:

PSP objects: declarative YAML/JSON policy definitions specifying allowed fields.
Admission controller: PSP plugin evaluates incoming pod specs against available PSPs.
RBAC bindings: Grant subjects the permission to use a PSP via use verbs.
Matching logic: admission checks find all PSPs the subject can use, and pod must satisfy at least one.
Admission result: allow or deny; if denied, API response contains explanation.

Data flow and lifecycle:

User requests pod creation.
API server runs admission chain.
PSP admission step obtains subject info and pod spec.
It finds PSPs the subject may use based on RBAC.
Each PSP is evaluated for compliance; success on any -> allowed.
Pod persists; scheduler and kubelet handle runtime.

Edge cases and failure modes:

No PSP matched -> default deny if PSP admission required, causing blocked pod creation.
Multiple PSPs with conflicting allowances -> pod allowed if any PSP permits; can result in unintentional permissiveness.
RBAC misconfiguration where system components lose use permissions -> critical components fail.
Deprecated/removed PSP implementations in distributions -> behavior varies; migration required.

Typical architecture patterns for Pod Security Policy

Centralized PSP catalog: platform team maintains PSPs enforced cluster-wide; use for consistent baseline.
Namespace-per-tenant PSPs: distinct PSPs per namespace mapped to tenants; use when tenant needs differ.
Service-account-scoped PSPs: tie policies to service accounts for per-application granularity.
CI-first validation: PSP rules enforced in CI via policy engines before deployment; use to prevent rejections in prod.
Hybrid with OPA/Gatekeeper: PSP used for basic constraints, OPA for complex policies and audit.
PSP as a policy fallback: PSP disabled for mutation controllers but used as last-line defense.

Failure modes & mitigation (TABLE REQUIRED)

ID	Failure mode	Symptom	Likely cause	Mitigation	Observability signal
F1	Deployment blocked	Pods stuck Pending from admission denial	No matching PSP allowed	Create appropriate PSP or adjust RBAC	API server audit deny events
F2	Overly permissive	Pods run privileged unexpectedly	Multiple permissive PSPs allow it	Tighten PSPs and reduce overlaps	Spike in privileged pods metric
F3	Critical SA denied	System components fail to create pods	RBAC removed use on PSP	Restore RBAC or exempt system SAs	Error logs from controllers
F4	Policy drift	Manifests diverge from enforced PSPs	Changes in repo not synced	Enforce GitOps sync and CI checks	Policy drift alerts
F5	Migration breaks	PSP removed by distro upgrade	Deprecated PSP behavior change	Migrate to supported controls	Increased admission errors
F6	False negatives	Malicious pod uses allowed fields	PSP too coarse-grained	Add capability and host path restrictions	Security incident alerts
F7	No visibility	Teams unaware of denials	Missing audit aggregation	Centralize logs and normalize events	Missing policy metrics

Row Details (only if needed)

None

Key Concepts, Keywords & Terminology for Pod Security Policy

Glossary (40+ terms)

Term — definition — why it matters — common pitfall

Pod Security Policy — Admission object defining pod constraints — Primary enforcement at admission — Treating as runtime monitor
Admission Controller — Plugin executing on API operations — Hosts PSP evaluation — Confusing it with PSP object
RBAC — Role-based access control — Grants use permissions for PSPs — Misbinding service accounts
ServiceAccount — Identity for workloads — PSPs often bound to SAs — Using user accounts instead
seccomp — Kernel syscall filtering profile — Limits system calls pods can make — Not applied by default
SELinux — Linux security labels and policies — Adds MAC enforcement — Complex labeling mistakes
Capabilities — Fine-grained Linux privileges — Controls capability additions to containers — Granting broad capabilities like SYS_ADMIN
Privileged container — Container with full privileges — High risk, often unnecessary — Overuse for quick debugging
HostPath — Volume mount into node filesystem — Can expose host secrets — Allowing wide hostPath access
HostNetwork — Pod uses host network namespace — Risks port and network conflicts — Enabling for convenience
HostPID — Pod uses host PID namespace — Allows process access to node — Rarely needed and dangerous
RunAsUser — UID the container runs as — Enforce non-root execution — Using root for legacy apps
AllowPrivilegeEscalation — Controls execve privilege escalation — Prevents setuid exploits — Ignoring for processes needing suid
ReadOnlyRootFilesystem — Marks root fs read-only — Limits write-based attacks — Apps may require write paths
Volume types — Allowed volume plugin list — Controls what mounts pods can use — Allowing unsafe volume plugins
SELinuxOptions — SELinux context settings — Restricts file and process access — Misconfigured labels block app
AppArmor — Kernel LSM for process confinement — Adds runtime restrictions — Not universally available
ImmutablePods — When admission prevents mutable edits — Protects runtime integrity — Limits live updates
OPA Gatekeeper — Policy engine for Kubernetes — Powerful for complex policies — Requires policy management skills
Kyverno — Kubernetes policy engine focused on mutate/validate — Simplifies policy-as-code — Different semantics than PSP
PodSecurityAdmission — Built-in admission mode using profiles — Modern replacement in many setups — Not identical to PSP objects
GitOps — Declarative sync workflow — Ensures PSP configs in repo match cluster — Drift if not enforced
Audit logs — Recorded API events — Crucial for compliance and debugging — Large volume needs aggregation
AdmissionReview — API object exchanged during admission — Carries pod spec for evaluation — Complex for custom controllers
PSP Deprecated — Status in Kubernetes upstream varies — Plan migration — Assuming continued support is risky
ClusterRole — Grant cluster-wide permissions — Used to give PSP use permissions cluster-wide — Overbroad grants increase risk
NamespaceRole — Role limited to namespace — Allows per-namespace PSP use — Requires careful mapping
MutatingAdmissionWebhook — Can alter incoming objects — Can be used to set compliant fields — Risk of conflicting mutations
ValidatingAdmissionWebhook — Validates incoming objects — Can complement PSPs — Adds policy complexity
AdmissionOrder — Order controllers run — Affects which controller sees original vs mutated pod — Misordering causes surprises
LeastPrivilege — Principle limiting rights — Core rationale for PSP — Trade-off with developer ergonomics
Policy-as-Code — Defining policies declaratively/versioned — Enables auditability — Requires governance
Exception workflow — Process for temporary allowlisting — Needed to reduce friction — Manual exceptions become permanent
Telemetry — Metrics and logs about policy enforcement — Enables SLIs/SLOs — Often missing by default
DenyReason — Message returned on denial — Helps developers debug — Can be cryptic without mapping
EnforcementMode — Whether policy is dry-run or enforced — Useful for rollout — Forgetting to flip from dry-run causes gaps
Auto-remediation — Automated fixes for noncompliant pods — Reduces toil — Can inadvertently change app behavior
DefaultDeny — Fallback behavior when no PSP applies — Can cause sudden outages — Need cautious rollout
CompatibilityMatrix — Mapping of PSP support across distros — Affects migration planning — Often not publicly standardized
BlastRadius — Scope of impact when compromise happens — PSP reduces blast radius — Overreliance on PSP gives false security

How to Measure Pod Security Policy (Metrics, SLIs, SLOs) (TABLE REQUIRED)

ID	Metric/SLI	What it tells you	How to measure	Starting target	Gotchas
M1	PodAdmissionComplianceRate	Percent pods compliant at admission	compliant creations / total creations	99% for prod	CI blocks may inflate failures
M2	DeniedPodCount	Count of pods denied by PSP	aggregate audit deny events	0 for prod critical namespaces	Noise from dev namespaces
M3	PrivilegedPodPercentage	Percent of pods running privileged	privileged pods / total pods	<0.1%	Some system pods need privilege
M4	HostPathUsageRate	Percent pods using hostPath	hostPath mounts detected in pod specs	<1%	Legitimate infra apps may need hostPath
M5	NonRootEnforcementRate	Percent of pods running as non-root	pods with RunAsUser!=0 / total pods	98%	Legacy apps may require root
M6	CapabilityAdditionsRate	Rate of pods adding NET_ADMIN or SYS_ADMIN	capability additions / total pods	<0.5%	Network plugins may need NET_ADMIN
M7	PSPExceptionRequests	Number of exception requests filed	ticket or PR count against PSP	Track trend	Exceptions may be informal
M8	TimeToRemediatePolicyViolation	Median time to fix noncompliant manifest	time from deny to fix commit	<1 day for prod	Complex apps require longer
M9	PolicyDriftEvents	Number of times cluster differs from repo	drift detection runs	0	GitOps lag causes transient drift
M10	AdmissionLatencyIncrease	Added ms to admission chain	time difference with/without PSP	<10ms	Webhook slowness can spike

Row Details (only if needed)

None

Best tools to measure Pod Security Policy

(Exact structure for each tool)

Tool — Prometheus

What it measures for Pod Security Policy: Metrics about admission controller latency and custom exported PSP metrics.
Best-fit environment: Kubernetes clusters with metric scraping.
Setup outline:
Add exporters for admission metrics.
Scrape audit logs via fluentd to metrics.
Define recording rules for compliance ratios.
Strengths:
Flexible query language.
Many integrations and dashboard options.
Limitations:
Requires instrumentation; not out-of-the-box PSP metrics.
Long-term storage requires extra components.

Tool — Grafana

What it measures for Pod Security Policy: Visualization of Prometheus metrics for PSP compliance and denials.
Best-fit environment: Teams using Prometheus.
Setup outline:
Import dashboards or create panels for SLI metrics.
Configure alerts based on Prometheus rules.
Strengths:
Rich dashboarding.
Alerting and annotations.
Limitations:
Relies on upstream metric quality.

Tool — Elasticsearch + Kibana

What it measures for Pod Security Policy: Aggregation and search of audit logs and deny events.
Best-fit environment: Large clusters with log aggregation.
Setup outline:
Ship Kubernetes audit logs to Elasticsearch.
Create Kibana visualizations for deny patterns.
Strengths:
Powerful search and drill-down.
Limitations:
Storage and scaling costs.

Tool — OPA Gatekeeper

What it measures for Pod Security Policy: Policy violations as constraint template metrics and audit reports.
Best-fit environment: Clusters needing complex policy-as-code.
Setup outline:
Deploy Gatekeeper and define ConstraintTemplates mirroring PSP intent.
Enable audit mode for continuous reporting.
Strengths:
Rich policy language (Rego).
Audit capabilities.
Limitations:
Learning curve for Rego; performance tuning needed.

Tool — Kyverno

What it measures for Pod Security Policy: Validation and mutation results and policy violation metrics.
Best-fit environment: Kubernetes-native policy workflows.
Setup outline:
Install Kyverno and create policies for non-root, hostPath, capabilities.
Use generate policies to auto-fix common issues.
Strengths:
Easier policy authoring for Kubernetes YAML.
Mutation simplifies adoption.
Limitations:
Less expressive than Rego for complex logic.

Recommended dashboards & alerts for Pod Security Policy

Executive dashboard:

Panels: Compliance rate (M1), Trend of denied pods (M2), Number of exceptions (M7), Time to remediate (M8).
Why: Provides leadership view of security posture and operational trends.

On-call dashboard:

Panels: Recent denied pod events, admission latency, impacted namespaces, top denied reasons.
Why: Helps SREs triage blocked deployments and rolling incidents.

Debug dashboard:

Panels: Raw admission audit logs filtered by deny reason, recent mutations, per-SA PSP matches, scheduler events.
Why: For deep investigation into denial causes and race conditions.

Alerting guidance:

Page (pager) alerts for: sudden spike in denied pods in production namespaces that blocks deployments, system service accounts denied.
Ticket alerts for: non-critical policy violation trends, exceptions pending review.
Burn-rate guidance: use error budget style for exceptions; alert if exception rate consumes >50% of allowed deviation.
Noise reduction: group alerts by namespace and reason, deduplicate similar events, use suppression windows during planned maintenance.

Implementation Guide (Step-by-step)

1) Prerequisites – Kubernetes cluster with admission controller capability. – RBAC governance model and inventory of service accounts. – GitOps repo for PSP objects and policy-as-code. – Audit log aggregation and metrics pipeline.

2) Instrumentation plan – Export admission denies as metrics. – Add labels to deny events (namespace, serviceAccount, reason). – Create recording rules for compliance SLIs.

3) Data collection – Route audit logs to central store. – Scrape metrics for admission latency and denial counts. – Capture CI validation results for policy checks.

4) SLO design – Define production SLOs for compliance rate, remediation time, and privileged pod percentage. – Set error budgets tied to exception workflows.

5) Dashboards – Create executive, on-call, and debug dashboards as above. – Add burn-rate panel to track exception consumption.

6) Alerts & routing – Configure paging for critical denials; ticketing for trend alerts. – Route to platform/owner groups based on namespace mapping.

7) Runbooks & automation – Document runbooks for common denials (hostPath, privileged, non-root). – Automate exception requests via templated PRs and approval workflows.

8) Validation (load/chaos/game days) – Run game days that include policy rollouts to validate emergency rollback paths. – Use chaos tests that create pods violating PSP to validate alerting and remediation.

9) Continuous improvement – Review exception trends weekly. – Tighten PSPs incrementally with developer feedback loop.

Pre-production checklist:

PSP objects in Git and peer-reviewed.
CI tests reject noncompliant manifests.
Audit logs routed to analysis pipeline.
Exception request mechanism in place.
Dry-run enforcement validated.

Production readiness checklist:

SLI dashboards active and alerting configured.
Emergency rollback playbook documented.
Owners and on-call rota assigned for PSP incidents.
Exception governance policy enforced.

Incident checklist specific to Pod Security Policy:

Identify denied pods and affected namespaces.
Confirm whether denial is expected or misconfiguration.
If misconfiguration: apply hotfix PSP or RBAC change with rollback plan.
Post-incident: add test to CI to prevent recurrence.

Use Cases of Pod Security Policy

Provide 8–12 use cases:

Multi-tenant SaaS cluster – Context: Multiple customers share cluster. – Problem: Tenant workloads could access node or other tenants. – Why PSP helps: Prevents privileged pods and host mounts. – What to measure: PrivilegedPodPercentage, HostPathUsageRate. – Typical tools: RBAC, PSP, audit logs, Prometheus.
Regulated environment (PCI/ISO) – Context: Compliance requirements for least privilege. – Problem: Need auditable controls on pod capabilities. – Why PSP helps: Enforce non-root and limited volumes. – What to measure: PodAdmissionComplianceRate, TimeToRemediatePolicyViolation. – Typical tools: Audit aggregation, compliance reporting.
Platform standardization – Context: Platform team provides opinionated runtime settings. – Problem: Teams diverge in configs causing maintainability issues. – Why PSP helps: Enforce standard runtime constraints. – What to measure: PolicyDriftEvents. – Typical tools: GitOps, PSP, CI validation.
Secure CI runners – Context: Self-hosted CI runners on cluster. – Problem: Build jobs gaining host access. – Why PSP helps: Prevent privileged build containers. – What to measure: DeniedPodCount, PrivilegedPodPercentage. – Typical tools: PSP, namespace-bound RBAC.
Immutable infrastructure pipelines – Context: Immutable deployment practice. – Problem: Pod mutation or privilege drift during rollout. – Why PSP helps: Enforce immutable expectations at admission. – What to measure: AdmissionLatencyIncrease, PolicyDriftEvents. – Typical tools: PSP, MutatingAdmissionWebhook (careful coordination).
HostPath-dependent legacy apps – Context: Some apps require hostPath but risky. – Problem: Blanket allow exposes other apps. – Why PSP helps: Restrict hostPath by service account. – What to measure: HostPathUsageRate, PSPExceptionRequests. – Typical tools: PSP, service account mappings.
Serverless platform on Kubernetes – Context: Functions run in containers on shared nodes. – Problem: Functions could request excessive privileges. – Why PSP helps: Limit capabilities for function pods. – What to measure: NonRootEnforcementRate, CapabilityAdditionsRate. – Typical tools: PSP, function controller.
Incident containment – Context: Compromised workload detected. – Problem: Need to prevent similar pods from being created. – Why PSP helps: Temporarily tighten PSP to block certain features. – What to measure: DeniedPodCount change, TimeToRemediatePolicyViolation. – Typical tools: PSP, CI rollback, emergency runbook.
Progressive hardening program – Context: Long-term security posture improvement. – Problem: How to tighten without breaking delivery. – Why PSP helps: Roll out dry-run and then enforce. – What to measure: PolicyDriftEvents, RemediationTime. – Typical tools: PSP dry-run mode, dashboards.
Controlled capability rollout – Context: New networking features require NET_ADMIN. – Problem: Limit NET_ADMIN to specific workloads. – Why PSP helps: Grant capability only to named service accounts. – What to measure: CapabilityAdditionsRate, PSPExceptionRequests. – Typical tools: PSP, RBAC.

Scenario Examples (Realistic, End-to-End)

Scenario #1 — Kubernetes multi-tenant SaaS

Context: A SaaS platform hosts multiple customers in a single Kubernetes cluster. Goal: Prevent tenant pods from using hostPath and running privileged containers. Why Pod Security Policy matters here: Minimizes risk of one tenant compromising node or other tenants. Architecture / workflow: Platform defines PSPs in GitOps repo; CI validates manifests; RBAC binds PSPs to tenant service accounts. Step-by-step implementation:

Inventory service accounts per tenant.
Create PSP disallowing privileged and hostPath.
Define ClusterRole granting use of PSP.
Bind ClusterRole to tenant namespaces’ default SAs.
Add CI checks rejecting hostPath/privileged pods.
Monitor deny events and iterate. What to measure: PrivilegedPodPercentage, HostPathUsageRate, DeniedPodCount. Tools to use and why: PSP for admission, Prometheus/Grafana for metrics, GitOps for sync. Common pitfalls: Overbroad RBAC bindings grant PSP use cluster-wide. Validation: Deploy tenant app; CI simulates noncompliant manifest to ensure rejection. Outcome: Tenants cannot create privileged pods; risky volume patterns blocked.

Scenario #2 — Serverless function platform (managed PaaS)

Context: A managed PaaS runs user functions in containers on shared nodes. Goal: Ensure functions run with minimal capabilities and no host access. Why Pod Security Policy matters here: Limits risk from third-party function code. Architecture / workflow: Platform controller assigns specific service account to function pods; PSP bound to that SA enforces constraints. Step-by-step implementation:

Create a function-specific PSP allowing minimal volumes and non-root.
Ensure function controller uses a dedicated SA.
Bind PSP use to the SA.
Run CI tests for function images to validate non-root execution.
Apply runtime monitoring for anomalies. What to measure: NonRootEnforcementRate, CapabilityAdditionsRate, DeniedPodCount. Tools to use and why: PSP, Kyverno for mutation, Prometheus for metrics. Common pitfalls: Platform updates accidentally change SA assignment causing denials. Validation: Deploy sample function and intentionally try privileged settings to see rejection. Outcome: Functions constrained; platform reduces attack surface.

Scenario #3 — Incident-response/postmortem

Context: An incident shows a compromised pod used hostPath to access host secrets. Goal: Prevent recurrence and close the vulnerability. Why Pod Security Policy matters here: Blocks hostPath usage cluster-wide or by specific namespaces. Architecture / workflow: Emergency PSP created and applied, then refined into longer-term policy. Step-by-step implementation:

Triage incident and identify exploit vector.
Create an emergency PSP denying hostPath and apply to affected namespaces.
Monitor the denial events and ensure no critical services break.
Update CI and GitOps repo with PSP changes for permanence.
Conduct postmortem and add tests. What to measure: HostPathUsageRate, TimeToRemediatePolicyViolation, DeniedPodCount. Tools to use and why: PSP, audit logs, incident management tools. Common pitfalls: Blocking legitimate infrastructure components that need hostPath. Validation: Run remediation tests and schedule a follow-up game day. Outcome: HostPath usage reduced and control loops added to prevent recurrence.

Scenario #4 — Cost/performance trade-off

Context: A high-throughput service requires NET_ADMIN capability for SR-IOV networking but NET_ADMIN is risky. Goal: Limit NET_ADMIN usage without impacting performance-sensitive workloads. Why Pod Security Policy matters here: Allows capability to approved workloads while denying general use. Architecture / workflow: Create PSP that allows NET_ADMIN only for a service account used by the high-performance app. Step-by-step implementation:

Identify network-capable workloads and assign dedicated SA.
Create PSP permitting NET_ADMIN and hostNetwork for that SA.
Bind PSP use to the SA only.
Monitor privilege use and performance metrics. What to measure: CapabilityAdditionsRate, PrivilegedPodPercentage, application latency. Tools to use and why: PSP for targeted allowance, Prometheus for performance metrics. Common pitfalls: Broadly granting NET_ADMIN via namespace-level RBAC. Validation: Deploy app and run performance benchmark vs controls. Outcome: Performance retained for approved workloads while overall cluster risk reduced.

Common Mistakes, Anti-patterns, and Troubleshooting

List of mistakes with symptom -> root cause -> fix (15–25 entries, include observability pitfalls)

Symptom: Sudden pod creation failures across namespaces -> Root cause: PSP denies with no matching allow -> Fix: Check RBAC bindings and create appropriate PSP.
Symptom: System controllers failing to start -> Root cause: system service accounts lost PSP use -> Fix: Restore ClusterRole bindings for system SAs.
Symptom: High privileged pod percentage -> Root cause: Overly permissive PSPs or misapplied legacy templates -> Fix: Audit PSPs and tighten capability lists.
Symptom: Developers repeatedly requesting exceptions -> Root cause: Policies too strict or no automation for exceptions -> Fix: Provide templated exception workflows and migration guidance.
Symptom: No metrics about denials -> Root cause: Audit logs not routed or not instrumented -> Fix: Configure audit log shipping and create metrics.
Symptom: Denials only visible in API server logs -> Root cause: Lack of centralized observability -> Fix: Aggregate logs into searchable store and build dashboards.
Symptom: Policy rollout causes mass failures -> Root cause: Enforced mode without dry-run testing -> Fix: Use dry-run and staggered rollout.
Symptom: Conflicting policy results -> Root cause: Multiple PSPs with overlapping allows and RBAC misconfig -> Fix: Rationalize PSP catalog and tighten RBAC.
Symptom: PSP removed after distro upgrade -> Root cause: Deprecation/migration path not followed -> Fix: Plan migration to supported admission mechanisms.
Symptom: False-negative security assumptions -> Root cause: Belief PSP prevents runtime exploits -> Fix: Add runtime security and monitoring layers.
Symptom: No audit trail for exception approvals -> Root cause: Manual chat approvals outside ticketing -> Fix: Integrate exception process into tracked PR/ticket systems.
Symptom: Admission latency spikes -> Root cause: Slow validating/mutating webhooks or metrics exporters -> Fix: Profile webhooks and add timeouts and retries.
Symptom: Overly complex PSPs per namespace -> Root cause: Micro-policy proliferation -> Fix: Consolidate into fewer policy classes mapped by SA or label.
Symptom: PSP blocking critical upgrades -> Root cause: Policy denies new kube-system pods -> Fix: Tag system components and create safe PSP allowances.
Symptom: Metrics show high policy drift -> Root cause: GitOps not enforced or broken sync -> Fix: Fix sync pipeline and alert on drift.
Symptom: Developers bypassing PSP via sidecar -> Root cause: Not validating mutated sidecars and admission ordering -> Fix: Coordinate mutating webhooks and PSP rules.
Symptom: Excessive noise from dev denies -> Root cause: No namespace separation for dev vs prod -> Fix: Relax policies in development clusters.
Symptom: Missing context in deny messages -> Root cause: Poor denyReason formatting -> Fix: Improve admission responses with actionable messages.
Symptom: Unable to test PSP in CI -> Root cause: CI runner lacks RBAC to simulate PSP -> Fix: Provide test harness with impersonation or dedicated test SA.
Symptom: Observability gaps during incidents -> Root cause: No correlation between audit logs and alerting system -> Fix: Add correlation keys and enrich events.
Symptom: PSP prevents autoscaling pods -> Root cause: PSP denies necessary host settings for autoscaler -> Fix: Identify required permissions and give minimal allowances.
Symptom: Security team frustrated with exceptions -> Root cause: No SLA for exception review -> Fix: Define SLA and automated vetting steps.
Symptom: PSP definitions not versioned -> Root cause: Direct cluster edits -> Fix: Adopt GitOps for PSP objects.
Symptom: Relying solely on PSP for compliance -> Root cause: Misunderstanding scope of PSP -> Fix: Add other controls like runtime security and image signing.

Observability pitfalls included above: lack of metrics, missing audit aggregation, admission latency invisibility, poor deny messages, lack of correlation in logs.

Best Practices & Operating Model

Ownership and on-call:

Platform team owns PSP catalog and exception workflows.
Define on-call rotation for platform security incidents.
Assign namespace owners for policy impacts.

Runbooks vs playbooks:

Runbook: Operational steps to remediate blocked deployment.
Playbook: Postmortem and policy tuning steps for systemic issues.

Safe deployments:

Canary PSP enforcement: Start as dry-run, then staged enforcement in namespaces.
Rollback strategy: Pre-approved RBAC change to lift denial temporarily.

Toil reduction and automation:

Automate exception PR templates.
Use mutation policies to add safe defaults (e.g., RunAsNonRoot).
Auto-remediation for trivial fixes like adding non-root fields.

Security basics:

Enforce least privilege, non-root, disallow privileged containers, restrict hostPath and dangerous capabilities.
Regularly review service accounts for unnecessary permissions.

Weekly/monthly routines:

Weekly: Review PSP exception requests and deny trends.
Monthly: Audit PSP definitions and RBAC bindings.
Quarterly: Run game day and tests on PSP migration scenarios.

What to review in postmortems related to Pod Security Policy:

Was PSP a contributing factor to outage?
Were deny messages actionable?
Did exception process cause delay?
Any RBAC misconfigurations enabling outage?
Action items: CI tests, dry-run prior to enforcement, better messaging.

Tooling & Integration Map for Pod Security Policy (TABLE REQUIRED)

ID	Category	What it does	Key integrations	Notes
I1	Admission Control	Enforces pod constraints at API	RBAC, API server	Native PSP plugin or replacement
I2	Policy Engine	Complex policies and audit	OPA, Gatekeeper	Rego-based policies
I3	Policy Engine	Kubernetes-native mutate/validate	Kyverno, GitOps	Easier authoring for YAML
I4	Logging	Aggregates audit logs	Fluentd, Elasticsearch	Enables denial analysis
I5	Metrics	Stores and queries metrics	Prometheus, Thanos	SLI and alerting source
I6	Visualization	Dashboards and alerts	Grafana	Executive and on-call views
I7	CI/CD	Fails builds with policy violations	GitLab CI, GitHub Actions	Shift-left validation
I8	GitOps	Sync policies from repo to cluster	ArgoCD, Flux	Prevents manual drift
I9	Runtime Security	Detects runtime anomalies	Falco, runtime EDR	Complements admission controls
I10	Managed Control	Provider-managed policy alternatives	Cloud provider controls	Behavior and support vary

Row Details (only if needed)

None

Frequently Asked Questions (FAQs)

What is the current status of PSP in Kubernetes?

PSP has been deprecated in many upstream versions and replaced by newer admission options; migrations vary by distribution.

Can PSP change runtime behavior of a running pod?

No. PSP enforces admission-time constraints. Runtime changes require other tools like mutating webhooks or runtime agents.

How do I test PSP before enforcing?

Use dry-run enforcement and mirror checks in CI to validate manifests before flipping enforcement.

How do PSP and OPA Gatekeeper relate?

PSP is a native admission object; OPA Gatekeeper is a policy engine providing richer policies and audit capabilities.

Are PSPs enough for compliance?

PSPs contribute to compliance but are not sufficient alone; combine with audit logging, image signing, and runtime detection.

How to handle exceptions at scale?

Automate exception requests via templated PRs, track with tickets, and apply short-lived exceptions with audits.

What telemetry is essential for PSP?

Admission deny counts, compliance rates, privileged pod counts, and time-to-remediate are critical.

Will PSP cause latency in pod creation?

Minimal if native; however, slow validating/mutating webhooks and heavy audit ingestion can add latency.

Can I restrict hostPath only for some namespaces?

Yes. Use RBAC bindings to grant PSP use only to specific service accounts or namespaces.

How to migrate from PSP to PodSecurityAdmission or Gatekeeper?

Plan: inventory policies, map constraints, run parallel dry-runs, update CI, and schedule migration window.

Who should own PSPs in an organization?

Platform/security team usually owns global PSPs; namespace owners manage local exceptions.

Do PSPs work with serverless platforms?

Yes, but ensure the function controller uses intended service accounts bound to PSPs.

What common mistakes should I avoid?

Avoid overbroad RBAC, skipping dry-run, and lacking observability for denies.

How often should PSPs be reviewed?

Monthly reviews are recommended, or after any major platform change.

Can PSPs be applied per service account?

Yes. RBAC bindings allow you to grant PSP use to specific service accounts.

How do I measure PSP effectiveness?

Track SLIs like compliance rate and remediation time and compare against SLOs.

Is PSP replacement standardized across cloud providers?

Varies / depends; providers implement different controls and migration paths.

What’s the best way to onboard teams to PSPs?

Provide clear docs, CI validations, migration tooling, and an automated exception workflow.

Conclusion

Pod Security Policy provides admission-time controls that reduce risk and enforce least privilege for Kubernetes workloads. While PSP is a powerful tool, it must be part of a broader security and observability strategy. Plan rollouts carefully, use dry-run and CI validation, instrument denials and compliance metrics, and automate exception handling to maintain developer velocity.

Next 7 days plan:

Day 1: Inventory service accounts and PSP needs.
Day 2: Implement dry-run PSPs for non-root and no hostPath.
Day 3: Add CI checks to validate PSP compliance.
Day 4: Configure audit log shipping and basic metrics.
Day 5: Create executive and on-call dashboards.
Day 6: Run a small game day testing policy denials and rollback.
Day 7: Document exception process and schedule policy review cadence.

Appendix — Pod Security Policy Keyword Cluster (SEO)

Primary keywords:

Pod Security Policy
Kubernetes PSP
PSP admission controller
pod security admission
Kubernetes security policies

Secondary keywords:

PSP vs PodSecurityAdmission
PSP migration guide
Kubernetes admission control
pod admission policies
cluster security policies

Long-tail questions:

How does Pod Security Policy work in Kubernetes?
What replaces PSP in modern Kubernetes distributions?
How to enforce non-root pods with PSP?
How to restrict hostPath mounts with PSP?
How to audit PSP denials in production?
How to migrate from PSP to OPA Gatekeeper?
How to test PSP in CI pipelines?
What metrics should I track for PSP?
How to manage PSP exceptions at scale?
How to reduce noise from PSP denials?

Related terminology:

admission controller
RBAC bindings
service account security
seccomp profiles
SELinux context
AppArmor profile
capabilities NET_ADMIN
privileged containers
hostPath volume
readOnlyRootFilesystem
runAsNonRoot
mutate admission webhook
validating admission webhook
OPA Gatekeeper
Kyverno policies
GitOps policy sync
audit logs aggregation
Prometheus metrics
Grafana dashboards
exception request workflow
dry-run enforcement
policy-as-code
least privilege principle
runtime security agents
Falco alerts
admission latency
compliance reporting
policy drift detection
cluster role bindings
namespace isolation
multi-tenant security
incident response playbook
policy remediation SLA
burn-rate alerts
CI policy checks
policy mutation automation
PSP deprecation
PodSecurityAdmission profiles
capability white-listing
mutating controller ordering
deny reason parsing
audit event enrichment
exception ticketing
emergency rollback plan
game day policy tests
policy telemetry design
compliance audit trail
container security posture
container runtime constraints

Quick Definition (30–60 words)

What is Pod Security Policy?

Pod Security Policy in one sentence

Pod Security Policy vs related terms (TABLE REQUIRED)

Row Details (only if any cell says “See details below”)

Why does Pod Security Policy matter?

Where is Pod Security Policy used? (TABLE REQUIRED)

Row Details (only if needed)

When should you use Pod Security Policy?

How does Pod Security Policy work?

Typical architecture patterns for Pod Security Policy

Failure modes & mitigation (TABLE REQUIRED)

Row Details (only if needed)

Key Concepts, Keywords & Terminology for Pod Security Policy

How to Measure Pod Security Policy (Metrics, SLIs, SLOs) (TABLE REQUIRED)

Row Details (only if needed)

Best tools to measure Pod Security Policy

Tool — Prometheus

Tool — Grafana

Tool — Elasticsearch + Kibana

Tool — OPA Gatekeeper

Tool — Kyverno

Recommended dashboards & alerts for Pod Security Policy

Implementation Guide (Step-by-step)

Use Cases of Pod Security Policy

Scenario Examples (Realistic, End-to-End)

Scenario #1 — Kubernetes multi-tenant SaaS

Scenario #2 — Serverless function platform (managed PaaS)

Scenario #3 — Incident-response/postmortem

Scenario #4 — Cost/performance trade-off

Common Mistakes, Anti-patterns, and Troubleshooting

Best Practices & Operating Model

Tooling & Integration Map for Pod Security Policy (TABLE REQUIRED)

Row Details (only if needed)

Frequently Asked Questions (FAQs)

What is the current status of PSP in Kubernetes?

Can PSP change runtime behavior of a running pod?

How do I test PSP before enforcing?

How do PSP and OPA Gatekeeper relate?

Are PSPs enough for compliance?

How to handle exceptions at scale?

What telemetry is essential for PSP?

Will PSP cause latency in pod creation?

Can I restrict hostPath only for some namespaces?

How to migrate from PSP to PodSecurityAdmission or Gatekeeper?

Who should own PSPs in an organization?

Do PSPs work with serverless platforms?

What common mistakes should I avoid?

How often should PSPs be reviewed?

Can PSPs be applied per service account?

How do I measure PSP effectiveness?

Is PSP replacement standardized across cloud providers?

What’s the best way to onboard teams to PSPs?

Conclusion

Appendix — Pod Security Policy Keyword Cluster (SEO)

Leave a Comment Cancel reply