Glossary/Detection Engineering/Azure Kubernetes Service (AKS)

What Is Azure Kubernetes Service (AKS)? A Defender's Guide

Azure Kubernetes Service (AKS) is a managed Kubernetes service for deploying and running containerized applications on Azure, where Microsoft operates the control plane and the customer secures the workloads, access, and configuration on top.

A pod in your AKS cluster gets compromised through a vulnerable image. The attacker reads a Kubernetes Secret mounted into the pod, finds a service account token, and calls the Kubernetes API. From there they enumerate the cluster, find a second namespace with weaker controls, and pivot. None of this touched the Azure control plane. Microsoft's managed API server did exactly what it was built to do: serve authenticated, authorized requests. The break happened entirely in the layer you configure: the image, the RBAC binding, the network policy that was never written, and the audit log that was never turned on.

That is the shape of most AKS incidents. The cluster itself was not "hacked." A workload, a permission, or a missing control was.

Azure Kubernetes Service is managed Kubernetes, and managed means a line is drawn between what Microsoft secures and what you secure. This guide is written for the defender on the customer side of that line: SOC analysts, incident responders, and cloud security engineers who have to harden, monitor, and investigate AKS. It covers what AKS is, how the control plane and node split works, the shared responsibility model in practice, the security controls you own (RBAC, network policy, pod security, secrets, image scanning), how to turn on the audit logging that makes detection possible, and the threats that actually land.

What is Azure Kubernetes Service (AKS)?

Azure Kubernetes Service is a managed Kubernetes service for deploying and running containerized applications on Azure. Kubernetes is the open-source system that schedules containers across a fleet of machines, restarts them when they die, scales them under load, and routes traffic to them. Running Kubernetes yourself means operating the control plane (the API server, scheduler, etcd, and controllers) plus every worker node. AKS takes the control plane off your hands and leaves you the workloads.

When you create an AKS cluster, Azure creates and configures the Kubernetes control plane for you at no cost, and you pay only for the worker nodes that run your applications. Microsoft handles control-plane health monitoring and maintenance. Each AKS cluster gets its own single-tenant, dedicated Kubernetes control plane that provides the API server, scheduler, and the rest of the primary components.

The service offers two cluster modes, and the mode changes how much you configure and therefore how much you secure:

  • AKS Automatic is a fully managed experience with production-ready, hardened defaults. Azure handles node provisioning, scaling, security baselines, monitoring, and upgrades. Several security controls are on by default.
  • AKS Standard gives you full control over cluster configuration and node pools. Security features are opt-in, one by one. You choose the networking model, the authorization model, and when to upgrade.

For a defender, the mode is the first thing to establish on any cluster you inherit. AKS Automatic ships with Azure RBAC, workload identity, deployment safeguards in enforcement mode, and baseline Pod Security Standards turned on. AKS Standard ships with those available but not necessarily enabled. The same attack that fails against a hardened Automatic cluster can succeed against a Standard cluster where nobody enabled the equivalent control.

The service is CNCF-certified and compliant with SOC, ISO, PCI DSS, and HIPAA, which matters for regulated workloads but does not transfer the configuration work to Microsoft. Compliant infrastructure is necessary, not sufficient.

The AKS control plane and node split

The architecture is the security boundary, so it is worth being precise about which side runs what.

Component Who runs it What it does
Kubernetes API server, scheduler, etcd, controllers (control plane) Microsoft (managed) Accepts and authorizes API requests, schedules pods, stores cluster state
Worker nodes (Azure VMs) Shared: Microsoft provisions, you configure Run your pods; in Standard you manage node pool lifecycle
Node OS and runtime Microsoft images, you patch via upgrades Optimized Ubuntu or Azure Linux, containerd runtime
Pods, containers, images You Your application workloads
RBAC, network policy, secrets, configuration You The controls that actually contain an attacker

The control plane is single-tenant and dedicated per cluster. By default the Kubernetes API server uses a public IP and an FQDN, which is a detail defenders should not skip past: a cluster left at that default exposes its API server to the internet, authenticated but reachable. You can restrict it with authorized IP ranges or make it a fully private cluster reachable only from your virtual network. AKS Automatic preconfigures API server virtual network integration; Standard makes you choose.

Nodes are Azure VMs deployed onto a private virtual network subnet with no public IP addresses. They run optimized Ubuntu or Azure Linux, with containerd as the container runtime on current Kubernetes versions. When a cluster is created or scaled up, new nodes come with the latest OS security updates. Patching after that is the friction point: node OS and Kubernetes version updates happen through cluster upgrades, which are automatic in AKS Automatic and manual by default in AKS Standard. A Standard cluster nobody upgrades runs a stale node image and an aging Kubernetes version, both of which accumulate known vulnerabilities.

The AKS shared responsibility model

AKS Shared Responsibility
Microsoft secures the control plane. You secure the rest.
Managed Kubernetes removes the operational work, not the work that stops a breach.
Microsoft manages
Kubernetes API server, scheduler, etcd, controllers
Control plane availability, patching, hardening
Single-tenant dedicated control plane per cluster
Node OS images with security updates
Underlying Azure fabric
You secure
Container images, dependencies, vulnerabilities
Kubernetes RBAC and Azure RBAC bindings
Network policy (default-deny pod paths)
Pod Security Standards, least privilege
Secrets handling and etcd encryption
Node and Kubernetes upgrades (manual in Standard)
Audit and control plane logging
Defender takeaway The managed control plane is rarely the weak point. Nearly every AKS breach lands in the customer-configured layer: a vulnerable image, an over-broad RBAC binding, a missing network policy, or audit logging that was never turned on.

Every cloud service splits security duties between provider and customer. For managed Kubernetes the split is specific, and misreading it is how teams leave gaps.

The short version: Microsoft secures and operates the Kubernetes primary (the control plane) and the underlying Azure fabric. You secure everything you deploy and configure on top of it. This is the same principle that governs cloud security generally, applied to a Kubernetes cluster.

Security area AKS Automatic AKS Standard
Control plane availability, patching, hardening Microsoft Microsoft
Node OS security updates, Kubernetes version upgrades Automatic (managed) You (manual by default)
Authorization model (Kubernetes RBAC / Azure RBAC) Azure RBAC on by default You choose and configure
Network policy, ingress/egress controls Secure baseline preconfigured You select and enable
Pod Security Standards, deployment safeguards Enforce mode by default Opt-in
Image provenance, scanning, admission You (build and registry gates) You (build and registry gates)
Workload identity, secrets handling Workload identity preconfigured You enable
Audit and control-plane logging You create the diagnostic setting You create the diagnostic setting

Two things never leave you regardless of mode. First, what runs in your containers is yours: the image, its dependencies, its vulnerabilities, the privileges it requests. Microsoft will not catch a vulnerable library or a container that runs as root. Second, how access is configured is yours: the RBAC bindings, the service account tokens, the network paths between pods. Even on a hardened Automatic cluster, a permissive RoleBinding you create is a permissive RoleBinding.

The practical read for a defender: a managed control plane removes a class of work (you are not patching etcd at 2 a.m.), but it removes none of the work that actually stops a breach. The audit logging, the RBAC review, the image scanning, the network segmentation, all of it lives on your side.

Securing an AKS cluster: the controls you own

AKS gives you the Kubernetes and Azure security primitives. Turning them on and configuring them correctly is the job. These are the controls that move risk, roughly in the order an attacker tests them.

Authentication and authorization. Control access to the API server with Microsoft Entra ID for authentication and a role-based model for authorization. AKS supports both Kubernetes RBAC and Azure RBAC for Kubernetes; pick one and scope roles to the minimum each user and workload needs. The common failure is a cluster-admin ClusterRoleBinding handed out broadly because it was easy. Tight, namespace-scoped access control is what limits a stolen credential to one namespace instead of the whole cluster.

Network policy. By default, every pod in a Kubernetes cluster can talk to every other pod. That flat network is what lets a single compromised pod reach the rest of the cluster. The service supports Kubernetes network policies that allow or deny pod-to-pod paths by namespace and label. Writing default-deny policies and explicitly allowing only required paths is the single most effective control against lateral movement inside a cluster.

Pod security and least privilege. A container that runs as root, mounts the host filesystem, or requests escalated privileges turns a workload compromise into a node compromise. Apply Pod Security Standards (AKS Automatic enforces a baseline by default) and use deployment safeguards to block risky configurations at admission. The Microsoft-recommended baseline includes avoiding privileged containers and using Linux features like AppArmor and seccomp to constrain what a container can do.

Secrets. Kubernetes Secrets inject credentials and keys into pods. AKS stores Secrets in tmpfs on the node rather than on disk, scopes them to a namespace, and delivers a Secret only to nodes running a pod that needs it. But the raw Secret manifests hold the data in base64, which is encoding, not encryption, so they must never be committed to source control. Kubernetes Secrets live in etcd; AKS supports encrypting them at rest in etcd with customer-managed keys. For higher assurance, integrate Azure Key Vault rather than relying on plain Kubernetes Secrets.

Image and supply-chain security. Most cluster compromises start with the image. Scan images for vulnerabilities in CI before they are promoted, and scan them again in the registry, because new vulnerabilities are disclosed after build. Use image signing and verification (such as Notary V2) so only trusted, provenance-verified images deploy. AKS Automatic runs an Image Cleaner to remove unused vulnerable images from nodes. None of this is automatic on a Standard cluster you have not configured.

Runtime protection. Microsoft Defender for Containers detects and restricts attacks against running pods, and surfaces drift in the vulnerability state of your workloads. It is the closest thing AKS has to EDR for containers, and like EDR it is a detection layer, not a substitute for getting the configuration right underneath it.

Node hardening. Node authorization, which restricts what the kubelet API can request, is on by default on AKS clusters running Kubernetes 1.24 and higher. Generation 2 VM node pools support Trusted Launch (secure boot plus a virtual TPM) to protect the node boot chain. For workloads needing strong isolation, AKS offers confidential and kernel-isolated node options.

Monitoring and audit logging for AKS

You cannot detect or investigate what you never recorded, and AKS control-plane activity is invisible by default. Turning on the right logs is the precondition for treating an AKS cluster as something a SOC can actually watch.

Observability here has two distinct layers, and defenders need both:

Operational telemetry. Container Insights (part of Azure Monitor) collects stdout/stderr logs, Kubernetes events, and node and pod metrics. Managed Prometheus collects metrics, and Azure Managed Grafana visualizes them. In AKS Automatic these are on by default; in Standard they are opt-in. This layer tells you a pod is crash-looping or a node is saturated.

Control-plane audit logs. This is the security-relevant layer, and it is the one teams forget. AKS implements control-plane logs as Azure Monitor resource logs, and they are not collected until you create a diagnostic setting that routes them to a destination, typically a Log Analytics workspace. The categories that matter for security:

Log category What it captures Defender use
kube-audit Every request to the Kubernetes API server, including get/list reads Full audit trail; high volume and cost
kube-audit-admin API requests excluding get/list read events Practical default for detection; catches changes and access
kube-apiserver API server operations Control-plane behavior
guard Microsoft Entra and Azure RBAC authorization decisions Who was allowed or denied

Once routed, in resource-specific mode these land in the AKSAudit, AKSAuditAdmin, and AKSControlPlane tables in Log Analytics, where you query them with KQL or forward them to a SIEM such as Microsoft Sentinel for correlation and alerting. The cost caution is real: kube-audit is high volume, so Microsoft recommends enabling kube-audit-admin (which drops the noisy read events), using resource-specific mode, and configuring the audit table as Basic logs to control ingestion cost.

For a defender, the order is: enable the diagnostic setting first (no log, no investigation), prefer kube-audit-admin plus guard for the signal-to-cost ratio, ship it to a SIEM, and write detections for the cluster-specific behaviors below.

The threats AKS defenders actually face

AKS attacks rarely look like a network exploit. They look like authenticated Kubernetes API calls and container behavior that is technically allowed.

Vulnerable or malicious images. An image with a known-vulnerable dependency, or a poisoned image pulled from an untrusted registry, is the most common entry point. The fix is upstream: scan and sign before deployment, enforce trusted registries at admission.

Over-privileged RBAC and service accounts. A workload bound to a role broader than it needs, or a default service account token an attacker can read, becomes the pivot from one pod to the API server. This is the cloud-native form of privilege escalation: the attacker does not exploit a CVE, they use permissions that were granted.

Lateral movement across a flat pod network. Without network policies, a compromised pod can reach every service in the cluster. Default-deny network policy is the containment.

Exposed API server. A cluster left on the default public API endpoint, without authorized IP ranges or private-cluster mode, is reachable from the internet. Credentials still gate it, but it widens the attack surface unnecessarily.

Container escape to the node. A privileged container, a host-path mount, or an unpatched kernel can let a workload break out onto the node VM, where it can reach other pods and node credentials. Pod Security Standards and current node images are the defense.

Secrets exposure. Secrets committed to a repo in base64, or readable by pods that do not need them, leak credentials without any exploit. Scope secrets tightly, encrypt etcd with customer-managed keys, and keep manifests out of source control.

The through-line matches the rest of cloud security: nearly every one of these lives in the customer-owned layer. The managed control plane is not the weak point. The configuration on top of it is.

How blue teams operate AKS

The model earns its keep when a SOC treats an AKS cluster like any other monitored asset, not a black box the platform team owns.

Baseline the cluster. Establish which mode it runs (Automatic or Standard), whether the API server is public or restricted, whether audit logging is on, and what the RBAC bindings actually grant. Most AKS gaps are visible in this inventory before any attack.

Get the audit log into the SIEM. A cluster whose kube-audit-admin and guard logs are not flowing into your SIEM is a cluster you cannot investigate. This is the highest-leverage single action on an unmonitored cluster.

Write detections for Kubernetes behavior. Alert on anomalous API activity: a service account suddenly listing secrets across namespaces, an exec into a running pod, a new ClusterRoleBinding, an image pulled from an unapproved registry. These map to real attacker steps and are visible in AKSAudit.

Practice the investigation. AKS forensics reads differently from disk forensics: the evidence is control-plane audit logs, pod and container telemetry, and node VM data, and pods can vanish before you reach them. Knowing how to pull and pivot through those logs is a learned skill, the same constraint that binds threat hunting everywhere.

The bottom line

Azure Kubernetes Service is managed Kubernetes: Microsoft runs the control plane for free and keeps it healthy and patched, and you run and secure the workloads on top. The managed control plane is rarely the weak point. The breaches happen in the layer you own, a vulnerable image, an over-broad RBAC binding, a missing network policy, secrets in a repo, and audit logging that was never turned on.

The defender's job is the work the managed service does not do for you: scope RBAC tight, write default-deny network policies, enforce pod security, scan and sign images, keep nodes and Kubernetes current (especially on Standard clusters), and above all enable control-plane audit logging and get it into a SIEM. AKS Automatic gives you a hardened head start; AKS Standard hands you the controls and the responsibility to turn them on. Either way, treat an AKS cluster like any other asset your SOC has to see, and the way to build that instinct is to work real cloud intrusions and read the logs they leave behind.

Frequently asked questions

What is Azure Kubernetes Service (AKS) in simple terms?

<p>Azure Kubernetes Service is Microsoft's managed Kubernetes offering. It runs the Kubernetes control plane (the API server, scheduler, and cluster state) for you at no charge, and you pay only for the worker node VMs that run your containers. AKS handles control-plane availability, maintenance, and patching, while you deploy and secure your own workloads, access controls, and configuration.</p>

Is AKS secure by default?

<p>It depends on the cluster mode. AKS Automatic ships with a hardened baseline: Azure RBAC, workload identity, deployment safeguards, baseline Pod Security Standards in enforce mode, and an image cleaner are all on by default. AKS Standard makes most of those opt-in, so a Standard cluster is only as secure as the controls you have explicitly enabled. Either way, image security, RBAC scoping, and audit logging remain your responsibility.</p>

What does Microsoft secure versus the customer in AKS?

<p>Microsoft secures and operates the Kubernetes control plane (API server, scheduler, etcd, controllers) and the underlying Azure fabric, and provides node OS images with security updates. The customer secures the workloads and images they deploy, the RBAC and network configuration, secrets handling, applying node and Kubernetes upgrades (manual in Standard), and enabling audit logging. Most AKS breaches happen in the customer-owned layer.</p>

How do I enable audit logging in AKS?

<p>AKS control-plane logs are Azure Monitor resource logs and are off until you create a diagnostic setting routing them to a destination such as a Log Analytics workspace. Enable the <code>kube-audit-admin</code> category for a practical audit trail (it excludes high-volume read events), plus <code>guard</code> for authorization decisions, then query the <code>AKSAudit</code> and <code>AKSControlPlane</code> tables in Log Analytics or forward them to a SIEM. To control cost, use resource-specific mode and the Basic logs tier for audit tables.</p>

How is AKS different from running Kubernetes yourself?

<p>Self-managed Kubernetes means you operate the control plane (API server, etcd, scheduler, controllers) and every node, including their availability and patching. AKS offloads control-plane operation to Microsoft and provides managed, security-updated node images. You still own workload security, RBAC, network policy, secrets, upgrades, and logging, so AKS removes the operational burden of running Kubernetes but not the work of securing what you deploy on it.</p>

What are the main security threats to AKS clusters?

<p>The most common AKS threats are vulnerable or untrusted container images, over-privileged RBAC and service account tokens, lateral movement across a flat pod network with no network policies, an API server left exposed on its default public endpoint, container escape to the node from privileged containers, and secrets leaked through committed manifests. Almost all of these are weaknesses in the customer-configured layer rather than exploits against Microsoft's control plane.</p>

Practice track
Threat Hunting
Develop proactive detection skills by analyzing security logs, identifying advanced attack patterns, and uncovering hidden threats across enterprise environments.
Browse Threat Hunting Labs โ†’