What Is Cloud Threat Hunting? A Practitioner Guide
Cloud threat hunting is the proactive, hypothesis-driven search for attacker activity in cloud environments that automated detection did not catch.
A compromised access key does not trip an alarm. It logs in, calls sts:AssumeRole, enumerates a few buckets, and reads data it was technically allowed to read. Every API call is valid. Every credential is real. The automated detection rules stay quiet because nothing matched a signature, and the activity sat inside the permissions the account already held. Weeks later, someone notices an egress charge for a region the company has never deployed to, and the investigation starts cold.
That gap, between what automated cloud detection catches and what actually happened, is where cloud threat hunting lives. It is the proactive, hypothesis-driven search for attacker activity in cloud environments that evaded the alerts. A hunter does not wait for a rule to fire. They form a hypothesis about how an attacker would operate against this specific account, pull the cloud telemetry that would show it, and look. This guide covers what cloud threat hunting is, how it differs from hunting on-prem, the hunting loop you run, concrete hunts mapped to MITRE ATT&CK for cloud, and the tooling and skills the work demands. It is written for SOC analysts, threat hunters, and DFIR responders who have to reason about cloud activity after the controls missed it.
What is cloud threat hunting?
Cloud threat hunting is the practice of proactively searching cloud environments for threats that automated detection did not catch, driven by a hypothesis rather than an alert. The starting point is an assumption: that a capable attacker is already inside, or could be, and that the existing detections would not see them. The hunter's job is to test that assumption against real telemetry and either find the activity or rule it out with evidence.
It is the cloud-specific application of threat hunting, and the distinction matters because the cloud changes both what an attacker does and what evidence they leave. An attacker in a cloud account rarely drops malware on a disk. They abuse identity, call APIs, and reconfigure the control plane. The hunt follows that shift. Instead of hunting for a malicious process tree on an endpoint, you hunt for an anomalous sequence of API calls in an audit log, a new credential on an old role, or a resource that was made reachable from the public internet.
Cloud threat hunting is not the same as cloud detection, and it is not a replacement for it. Automated detection handles the known and the high-volume: the rule that fires on a disabled trail, the alert on impossible travel. Hunting handles the unknown and the quiet: the attacker who stays inside granted permissions, the technique no rule covers yet, the slow reconnaissance that never crosses a threshold. The two feed each other. A good hunt that finds something ends by becoming a new detection rule.
How cloud hunting differs from on-prem hunting
The instinct to treat the cloud as "someone else's data center" is the fastest way to hunt badly. The telemetry, the asset model, and the boundary of what you can even see are all different.
The telemetry is API-shaped, not host-shaped. On-prem, you hunt across endpoint process logs, EDR telemetry, Windows event logs, and network captures. In the cloud, the richest source is the control-plane audit log: AWS CloudTrail, Azure activity and sign-in logs, Google Cloud audit logs. Every meaningful action, creating a user, assuming a role, changing a bucket policy, reading a secret, is an API call recorded with the identity that made it, the source IP, the region, and the parameters. Identity logs, runtime telemetry from workloads, and network flow logs (VPC Flow Logs and their equivalents) round out the picture. The hunter's primary artifact is the API call, not the process.
Assets are ephemeral. A container or a serverless function may exist for seconds. The instance an attacker used to stage data may be terminated before anyone looks. On-prem, the compromised host is usually still there to image; in the cloud, the asset can be gone and only the log of its creation and deletion remains. This raises the value of log analysis sharply, because the log is frequently the only durable evidence left.
The fight is on the control plane. On-prem attackers move laterally host to host. Cloud attackers move through identity and the API. They assume roles, mint credentials, escalate by attaching a policy, and pivot across accounts. The control plane, the set of APIs that manage the environment, is the terrain. Much of cloud lateral movement is a chain of AssumeRole calls and policy changes, not a sequence of remote logins.
Shared responsibility draws a hard line. The cloud provider secures the infrastructure; you secure your configuration, identities, and data. That line dictates what you can hunt. You will not get hypervisor logs. You will get the audit trail of your own account's actions, and that is exactly where attacker activity in your tenant shows up. Hunting respects the boundary: hunt the layers you own and can observe.
The cloud threat hunting loop
Hunting is a loop, not a one-time scan. The same five steps repeat, and each completed hunt sharpens the next.
1. Form a hypothesis. Start with a specific, testable statement about attacker behavior in this environment. Not "is there an attacker?" but "an attacker who stole a developer's access key would use it from an IP and region the developer never uses." A good hypothesis names the technique, the expected evidence, and where that evidence would live. Threat intelligence and ATT&CK for cloud are the usual sources.
2. Gather the cloud telemetry. Pull the data that would confirm or refute the hypothesis: the CloudTrail events for that principal, the sign-in logs, the flow logs for that subnet, the configuration history for that resource. Scope it tightly to the hypothesis so you analyze signal, not the entire account.
3. Analyze. Look for the pattern the hypothesis predicts. Baseline normal first, what regions, IPs, and API calls are routine for this identity, then surface the deviation. Most cloud hunting is anomaly work: this principal has never called this API, this role has never been assumed from this country, this bucket has never been public.
4. Confirm or respond. If the evidence supports the hypothesis, you have an incident: confirm scope, preserve the relevant logs before ephemeral context ages out, and hand off to incident response. If it does not, you have ruled out a threat with evidence, which is a real result, not a failed hunt.
5. Operationalize. Turn what you learned into a durable detection. If the hunt found a real technique, write the rule that would have caught it. If it found nothing but taught you the environment's baseline, encode that baseline so the next anomaly is easier to see. This is the step that keeps the SOC from hunting the same thing twice.
The loop runs whether or not it finds an attacker. A hunt that ends in "ruled out, and here is a new detection rule" has done its job.
Cloud hunts mapped to MITRE ATT&CK
The fastest way to generate good hypotheses is to walk the MITRE ATT&CK Cloud matrices and ask, for each technique, "what would this look like in my logs?" Below are five hunts that recur, each tied to the technique it tests. Reproduce the technique IDs exactly when you document a hunt; a wrong ID misroutes the entire investigation.
Anomalous role assumption (T1078.004, Valid Accounts: Cloud Accounts). A stolen credential is a valid one, so the abuse hides in valid-account activity. Hunt for AssumeRole calls that break the principal's baseline: a role assumed from a new source IP, a new country, at an unusual hour, or assumed by a principal that has never used it before. Chains of role assumptions across accounts are the cloud's lateral movement.
New access keys or credentials (T1098.001, Account Manipulation: Additional Cloud Credentials). A common persistence move is to add a second credential to an account the attacker controls, so they keep access even if the first is rotated. Hunt for CreateAccessKey, new service principal secrets, or added SSH keys, especially when the principal creating the key is not the one it belongs to, or when a normally interactive user suddenly mints a programmatic key.
Activity in unused regions (T1535, Unused/Unsupported Cloud Regions). Attackers provision resources in regions an organization never uses, betting that detection is only deployed where the business operates. Hunt for any control-plane or resource-creation activity in regions outside your known footprint: RunInstances in a region with no production presence is a strong signal, often tied to resource hijacking for crypto mining.
Disabled or modified logging (T1562.008, Impair Defenses: Disable or Modify Cloud Logs). An attacker who can see the audit trail will try to blind it. Hunt for StopLogging, DeleteTrail, changes to a trail's configuration, or edits that stop events from reaching your SIEM. This is high signal: legitimate administrators rarely disable logging, and the event itself is one of the last things logged before the gap.
Reconnaissance and public exposure (T1580, Cloud Infrastructure Discovery, and T1530, Data from Cloud Storage). Before acting, attackers enumerate. Hunt for bursts of discovery calls, DescribeInstances, ListBuckets, GetBucketPolicy, from a single principal in a short window, which signals automated reconnaissance. Pair it with a hunt for newly public resources: a bucket policy changed to allow anonymous access, or storage made reachable from the internet, is how an exposure becomes a breach.
Hunt hypothesis, telemetry, and signal
Every hunt reduces to three columns: the hypothesis you are testing, the telemetry that holds the answer, and the specific signal that confirms it. The table below turns the five hunts above into that working form.
| Hunt hypothesis | Cloud telemetry to pull | Signal that confirms it |
|---|---|---|
| A stolen credential is assuming a role from outside its baseline (T1078.004) | CloudTrail / Azure sign-in logs for the principal | AssumeRole from a new IP, region, or country, or by a principal that never used the role |
| An attacker added a second credential for persistence (T1098.001) | CloudTrail / identity provider audit logs | CreateAccessKey or new secret created by an unexpected principal, or on a normally interactive account |
| Resources are running in a region the business never uses (T1535) | Multi-region CloudTrail, resource inventory | RunInstances or other creation calls in an out-of-footprint region |
| An attacker is blinding the audit trail (T1562.008) | CloudTrail management events, SIEM ingestion gaps | StopLogging, DeleteTrail, trail config change, or a sudden stop in event flow |
| A principal is enumerating and exposing resources (T1580, T1530) | CloudTrail data and management events, config history | Burst of Describe/List calls, then a bucket policy opened to anonymous or public access |
The point of the table is reuse. A hypothesis with no telemetry behind it cannot be hunted, and telemetry with no defined signal is just data. Fill all three columns before you start, and the hunt has a clear pass or fail.
Tooling and process
The tools follow the telemetry. None of this is exotic, and most of it is the cloud logging you already pay for.
Cloud-native logs and queries. The audit logs are the foundation: CloudTrail, Azure activity and sign-in logs, Google Cloud audit logs, plus flow logs and config history. You can query them directly, CloudTrail Lake, Azure Log Analytics with KQL, or BigQuery, before anything is centralized. A hunter who can write a precise query against raw audit logs is not blocked waiting on a platform.
SIEM. Most hunting happens after logs are centralized into a SIEM (Splunk, Elastic, Sentinel, Chronicle), where cloud telemetry sits alongside the rest of the estate and a single query can correlate an identity's behavior across sources. The SIEM is also where a successful hunt becomes a saved detection.
Cloud detection tooling. Purpose-built tooling adds runtime and behavioral context that raw logs lack. This is the space of cloud detection and response, which monitors cloud workloads and the control plane for threats and gives hunters a higher-level view of identity and runtime activity. It does not replace the hunt; it gives the hunt better starting data.
Process. Tooling without process produces noise. The discipline is the loop: hypothesis, scoped telemetry pull, analysis against a known baseline, confirm or rule out, then operationalize. Document each hunt, the hypothesis, the query, the result, so the next hunter does not repeat it and so a "nothing found" still leaves the baseline better understood.
Skills the work demands
Cloud threat hunting sits at the intersection of three skill sets, and a hunter weak in any one of them will miss things.
First, cloud platform fluency. You cannot hunt what you do not understand. A hunter has to know what sts:AssumeRole does, what a normal IAM policy change looks like, which API calls are routine for a CI/CD pipeline versus a human, and how the provider's services actually behave. Without that, every anomaly looks the same and the false-positive rate buries the real signal.
Second, query and data skill. The hunt is executed in a query language against logs that can run to billions of events. Comfort with the SIEM's query syntax, with KQL or SQL against cloud logs, and with the patience to baseline before alerting, is the difference between a hunt that finds the needle and one that drowns.
Third, an attacker's mindset. The good hypotheses come from knowing how attackers operate in the cloud: that they prefer valid credentials to malware, that they reconnoiter before they act, that they blind logging when they can, and that they hide in the permissions an account already holds. ATT&CK for cloud is the structured version of that mindset, and walking its techniques is the most reliable way to generate hunts that matter.
The bottom line
Cloud threat hunting is the proactive, hypothesis-driven search for threats that slipped past automated cloud detection. It exists because cloud attacks hide in valid activity: real credentials, allowed API calls, granted permissions, none of which trip a signature. Hunting differs from on-prem work because the evidence is API-shaped and lives in control-plane audit logs, the assets are ephemeral so the log is often the only record left, and the fight is on identity and the control plane rather than on hosts.
The work runs as a loop: form a hypothesis, pull the cloud telemetry that would prove it, analyze against a baseline, confirm or rule it out, then operationalize what you learned into a detection. Anchor the hypotheses in ATT&CK for cloud, anomalous role assumption, new credentials, unused regions, disabled logging, public exposure, and each hunt has a named technique, a defined log signal, and a clear result. Done consistently, it turns the gap between what your alerts catch and what actually happened into the place your detections get better.
Frequently asked questions
<p>Cloud threat hunting is the proactive, hypothesis-driven search for attacker activity in cloud environments that automated detection did not catch. Instead of waiting for an alert, a hunter assumes a threat may already be present, forms a specific hypothesis about how an attacker would operate, pulls the relevant cloud telemetry, and tests it. The output is either a confirmed incident or a ruled-out threat plus a new detection rule.</p>
<p>Traditional hunting works across host telemetry: process trees, endpoint logs, network captures. Cloud hunting works across API and identity telemetry: control-plane audit logs like CloudTrail, sign-in logs, and flow logs. Cloud assets are often ephemeral, so the log is frequently the only durable evidence, and attacks center on the control plane and identity rather than on processes running on a host.</p>
<p>The core source is the control-plane audit log: AWS CloudTrail, Azure activity and sign-in logs, or Google Cloud audit logs. These record every API call with the identity, source IP, region, and parameters. Add identity provider logs, runtime telemetry from workloads, network flow logs, and configuration history. Most hunting queries these through a SIEM, though they can also be queried directly at the source.</p>
<p>The MITRE ATT&CK Cloud matrices catalog the techniques attackers use against cloud environments, which makes them a hypothesis generator. For each technique, a hunter asks what it would look like in their logs and builds a hunt around that signal. Common examples are anomalous role assumption (T1078.004), adding credentials for persistence (T1098.001), and disabling cloud logging (T1562.008).</p>
<p>Recurring hunts include anomalous <code>AssumeRole</code> calls from new IPs or regions, new access keys created on existing accounts, resource activity in regions the business never uses, audit logging being disabled, and bursts of reconnaissance API calls followed by a resource being made public. Each maps to a specific ATT&CK technique and a specific log signal.</p>
<p>No. They cover different ground. Automated detection handles known, high-volume, and rule-matchable threats efficiently. Hunting handles the unknown and the quiet, the attacker who stays inside granted permissions or uses a technique no rule covers yet. A mature program runs both, and feeds every successful hunt back into detection as a new rule.</p>