Detection Engineering

What Is Credential Harvesting? How It Works

12 min read·Updated June 2026·Credential-theftData BreachBlue Team

A finance employee gets an email that looks like a Microsoft 365 password-expiry notice. The link goes to a login page that is a pixel-perfect copy of the real one. They type their username and password, the page shows a spinner, then redirects them to the genuine portal. Nothing felt wrong. But the credentials they just entered went to an attacker-controlled server, and within minutes someone is logging into their mailbox from another country using their exact password. No malware ran. No exploit fired. The attacker just asked for the password, and got it.

That is credential harvesting: collecting valid usernames and passwords at scale so an attacker can log in as a legitimate user instead of breaking in. It is the quiet front end of most modern intrusions. The stolen password is not the goal; it is the key that opens everything behind it.

This guide covers what credential harvesting is, how the attack works, the techniques attackers use, why it keeps growing, why it is so hard to detect, and how individuals and organizations defend against it. It is written for blue teamers who have to spot a valid login that should not be happening.

What is credential harvesting?

Credential harvesting is the large-scale collection of authentication data, usernames, email addresses, passwords, password hashes, session tokens, and API keys, so an attacker can authenticate as someone they are not. The defining word is scale. A targeted theft of one password is just theft. Harvesting is the systematic gathering of many credentials, often thousands at a time, to be used, sold, or stockpiled.

It sits at the entry of the attack lifecycle. Most intrusions need an initial way in, and a valid credential is the cleanest one there is. Verizon's 2025 Data Breach Investigations Report found that the use of stolen credentials was the initial access vector in 22% of breaches, and that 88% of attacks against basic web applications involved stolen credentials. Credentials are the most common key in the attacker's keyring.

The reason it is so effective is that a stolen credential turns an attack into a login. The attacker is not exploiting a vulnerability or dropping a payload that antivirus can flag. They are authenticating with a real password to a real service. To the system, and to most logs, they look exactly like the employee whose credential they hold.

How credential harvesting works

Credential Harvesting · the pipeline

Collect, validate, then log in as a real user.

The stolen password is not the goal. It is the key, and using it looks like a normal login.

1 · LURE OR CAPTURE

Get the credential

Phishing page, infostealer malware, or interception in transit.

→

2 · COLLECT AT SCALE

Drop server / logs

Thousands of credentials flow back to attacker infrastructure.

→

3 · VALIDATE

Test which still work

Automated checks find live credentials. A confirmed login is worth far more.

→

4 · USE OR SELL

Valid login = initial access

Defender's window No malware on the auth server, no exploit, no traffic crossing the perimeter. The abuse is a valid login that does not fit the baseline: wrong country, new device, odd hour.

Credential harvesting is not a single trick but a pipeline: collect, validate, then use or sell.

Lure or capture. The attacker gets the credential in front of them. That might be a phishing page that asks the user to type it, malware that reads it off the machine, or interception of it in transit. The method varies; the goal is always to capture what the user knows or holds.
Collect at scale. The harvested credentials flow back to attacker infrastructure: a drop server behind the fake login page, a command-and-control channel for an infostealer, or a log file that gets bundled and shipped. One campaign can collect credentials from thousands of victims.
Validate. Raw credentials are noisy. Attackers test them automatically against the target service, or against many services, to find which still work. A confirmed-live credential is worth far more than an untested one.
Use or sell. Validated credentials are either used directly to log in and pursue the objective, or packaged and sold on criminal markets. A single corporate login to an email or VPN account is a sellable commodity.

The output of this pipeline is initial access. Once inside with a valid account, the attacker pursues the same goals as any intrusion: reach more systems, escalate privileges, plant backdoors, and get to the data or deploy ransomware. Harvesting is the on-ramp, not the destination.

Common credential harvesting techniques

Most harvesting falls into a handful of methods. They differ in how the credential is captured, but all end with the attacker holding a working login.

Technique	What the attacker does	What the defender sees
Phishing pages	Fake login page captures the password the user types	User submits credentials to a lookalike domain
Infostealer malware	Malware reads saved passwords, cookies, and tokens off the host	Process accessing browser credential stores
Keylogging	Records keystrokes to capture passwords as they are typed	Stealthy process hooking input
Man-in-the-middle	Intercepts credentials or session tokens in transit	Traffic relayed through an attacker proxy
Domain spoofing / typosquatting	Lookalike domain (paypa1.com) collects credentials from mistyped or linked visits	Logins or DNS lookups to near-miss domains
Adversary-in-the-middle (AiTM)	Proxy relays the real login in real time, stealing the session token after MFA	Valid login from an unexpected location or token reuse

A few patterns connect them. Phishing is the volume play: a single email campaign with a convincing fake login page can harvest credentials from a large group at once, which is why it is the most common entry point. Infostealers are the scale play: malware like commodity stealers vacuum up every saved password, cookie, and token on a machine and ship them to logs that are sold in bulk. AiTM is the MFA bypass: by proxying the real authentication, an adversary-in-the-middle kit steals the session token after the user passes multi-factor, defeating the most common MFA setups. The relationship between harvesting and what comes next is direct: once an attacker has a working password, credential stuffing tests it against other services to exploit password reuse.

Why credential harvesting is hard to detect

The core problem is the same one that makes it effective: a harvested credential turns the attack into a legitimate login. There is no malicious file on the authentication server and no exploit signature in the request. There is only a valid username and password being used by the wrong person.

That defeats the controls built for other phases. Signature antivirus sees nothing on the server, because no malware touches it. The perimeter firewall sees nothing wrong, because an authenticated session to a web app or VPN is exactly what that port exists to carry. Even multi-factor authentication, the standard answer, is bypassed by adversary-in-the-middle kits that steal the session token after the user completes the second factor. Detecting harvested-credential abuse means recognizing that a successful login is anomalous in context: this account does not usually sign in from that country, on that device, at that hour. It is a problem of behavior and baseline, not bad signatures.

How to detect credential harvesting

Because the malicious login looks valid, detection depends on the context around it rather than the credential itself.

Authentication anomalies. Watch for impossible travel (a login from two distant locations minutes apart), sign-ins from new devices or unusual countries, and a spike in failed-then-successful logins that signals validation. Centralize authentication logs in a SIEM so they can be correlated across applications.
The phishing front end. Many harvesting campaigns are catchable before the credential is used: newly registered lookalike domains, DNS lookups to typosquatted names, and email gateway hits on credential-phishing links. Catching the lure beats chasing the login.
Endpoint telemetry for infostealers. Watch for processes reading browser credential stores or accessing the memory of the local security authority, the behavior that defines credential-stealing malware.
Token and session monitoring. Session tokens reused from a new IP or impossible location are the tell for an AiTM theft that slipped past MFA. Tie sessions to device and location, and treat a moved session as suspect.
Exposure monitoring. Credentials harvested elsewhere surface in infostealer logs and breach dumps. Monitoring those for your domains gives an early warning that an account is exposed before it is abused.

The unifying skill is knowing what a normal login looks like for each account, so the one that does not fit stands out. That baseline is what turns a flood of ordinary authentication events into a detection.

Why credential harvesting keeps growing

Three forces push it. First, identity is now the perimeter. As organizations moved to cloud and SaaS, the thing that protects a system is increasingly the login, not the network boundary, so the credential became the highest-value target. Second, the criminal economy industrialized it. Infostealer malware is sold as a service, harvested logs trade in bulk, and access brokers specialize in selling working credentials to whoever wants in, including ransomware crews. The 2025 DBIR found that 54% of ransomware victims had their credentials show up in infostealer logs before the attack. Third, MFA adoption pushed attackers to better harvesting, and adversary-in-the-middle kits that steal post-MFA session tokens are now commodity tooling.

The result is that stolen credentials remain the most reliable way in, and the supply chain that produces them is mature and cheap.

How to prevent credential harvesting

You cannot stop every lure, but you can make a harvested credential worth far less and the attempt far noisier. Defense runs on two levels.

Steps individuals can take

Use a password manager with unique passwords. Unique per-site passwords mean a credential harvested from one place cannot be reused anywhere else, which kills credential stuffing.
Turn on multi-factor authentication, preferably phishing-resistant. MFA stops most reused-credential attacks. Use FIDO2 or passkeys where available, since they resist the adversary-in-the-middle kits that beat one-time codes.
Check the domain before you type a password. Most phishing depends on a lookalike domain. Verifying the address, or letting a password manager refuse to autofill on the wrong one, defeats it.
Keep the browser and software updated. Patched software closes the holes infostealers use to land and read saved credentials.

Steps organizations can take

Enforce phishing-resistant MFA and conditional access. Require MFA everywhere, prefer FIDO2/passkeys, and add conditional access that weighs device, location, and risk so a stolen credential alone is not enough.
Scope and manage privilege. Least privilege and privileged access management limit what a single harvested credential can reach, so one compromised account is not a path to everything.
Train and test against phishing. Security awareness training plus realistic simulations measurably lower the click rate on the lures that feed harvesting.
Monitor identity and hunt for abuse. Feed authentication telemetry into detection, watch for exposed credentials in infostealer logs, and run proactive threat hunting for the anomalous logins that signal a harvested credential in use.

None of these stop harvesting alone. Layered, they turn a stolen password from an open door into a single obstacle that still has to clear MFA, match a known device, and avoid tripping a behavioral alert.

Frequently Asked Questions

What is credential harvesting in simple terms?

Credential harvesting is the large-scale theft of usernames and passwords so an attacker can log in as a legitimate user. Instead of breaking into a system, the attacker collects working credentials, through fake login pages, malware, or interception, and then simply signs in. The stolen password is the key, and logging in with it looks like normal activity.

What is the difference between credential harvesting and phishing?

Phishing is one method used to harvest credentials, not a synonym for it. Phishing tricks a user into handing over a password, usually through a fake login page. Credential harvesting is the broader goal of collecting credentials at scale, which can also be done with infostealer malware, keyloggers, man-in-the-middle interception, or buying breach dumps. Phishing is a common front end; harvesting is the objective.

How do attackers use harvested credentials?

They validate the credentials to find which still work, then use them to log in as the legitimate user. From there they impersonate the account owner, test the password against other services to exploit reuse, access sensitive systems, establish footholds for lateral movement and privilege escalation, plant backdoors for persistence, or hand the access to a ransomware crew. Many credentials are also sold in bulk on criminal markets.

Can multi-factor authentication stop credential harvesting?

MFA stops most attacks that rely on a reused or guessed password, so it is essential, but it is not absolute. Adversary-in-the-middle phishing kits proxy the real login and steal the session token after the user passes MFA, bypassing one-time codes. Phishing-resistant methods like FIDO2 and passkeys resist this, which is why they are the stronger choice.

How do you detect credential harvesting?

You detect the abuse of the credential rather than the theft itself, by watching authentication for anomalies: impossible travel, logins from new devices or unusual locations, and failed-then-successful patterns that signal validation. Catching the front end helps too, by spotting lookalike domains and phishing links, monitoring endpoints for infostealer behavior, and watching breach dumps and infostealer logs for exposed company credentials.

The bottom line

Credential harvesting is how attackers collect valid logins at scale so they can walk in as legitimate users instead of breaking down the door. They lure or capture the credential, validate it, then use it or sell it, and a working password is the cleanest initial access there is. That is why stolen credentials sit behind so many breaches and why the criminal supply chain that produces them keeps growing.

It is also why the defense is identity-centric. The attacker has to use the credential somewhere, and that login is the chance to catch them: phishing-resistant MFA to raise the cost, least privilege to limit the blast radius, and behavioral detection to spot the valid login that does not fit the baseline. Stopping harvesting is less about walls and more about knowing what a normal login looks like, and noticing the one that does not.

Frequently asked questions

What is credential harvesting in simple terms?

What is the difference between credential harvesting and phishing?

How do attackers use harvested credentials?

Can multi-factor authentication stop credential harvesting?

How do you detect credential harvesting?