Detection Engineering

What Is Penetration Testing? A Practitioner's Guide

12 min read·Updated June 2026·red teamCybersecurityBlue Team

A vulnerability scanner reports 4,000 findings on a network and flags 280 as critical. A penetration tester takes one of them, a forgotten Jenkins server with default credentials, chains it to an over-permissioned service account, moves to the file server, and walks out with the HR database in an afternoon. Same network, two very different answers. The scanner says "you have 280 critical vulnerabilities." The pentest says "here is the one path an attacker actually uses to reach your crown jewels, and here is the screenshot."

That gap is the whole point of penetration testing. A scanner tells you what could be wrong in theory. A penetration test proves what is exploitable in practice, by having a skilled human attack the system the way a real adversary would, with permission and a scope agreed in advance. The deliverable is not a list of CVEs. It is a demonstrated attack path, ranked by the damage it caused, with the evidence to back it up.

This guide is for the people on the receiving end of that report: SOC analysts who will tune detections against the findings, incident responders who will recognize the same techniques in a real breach, and defenders deciding what to fix first. It covers what a penetration test is, how it differs from a vulnerability scan, the testing methods (black, white, and gray box), the phases of an engagement, the common test types, the standards and tools that govern the work, and how to read the report when it lands.

What is penetration testing?

Penetration testing is an authorized simulated cyberattack against a system, network, or application, performed by a security professional to find and exploit weaknesses before a real attacker does. The word that matters is authorized. The same actions without written permission are a crime; with a signed scope and rules of engagement, they are a controlled security assessment. Practitioners also call it pen testing or ethical hacking.

The goal is not to find every flaw. It is to prove which flaws are exploitable and what an attacker gains by chaining them. A single low-severity misconfiguration is noise on its own. The same misconfiguration that lets a tester pivot from a public web server to an internal domain controller is a critical finding, because the test demonstrated the path. Severity in a pentest is measured by impact achieved, not by a scanner's score in isolation.

A penetration test answers questions a scan cannot. Can an outsider reach the internal network? If they land on one machine, how far can they spread? Will the SOC notice, and how fast? Are the controls you paid for actually catching the techniques they were bought to catch? The output is a report that names the attack paths, rates them by real-world impact, includes the evidence (screenshots, captured data, command logs), and gives the defenders concrete remediation. It feeds directly into vulnerability management by validating which of the thousands of theoretical findings are the ones worth fixing now.

Penetration testing vs. vulnerability scanning

These two get conflated constantly, and the difference decides where your security budget goes. A vulnerability scan is automated, broad, and shallow: a tool checks systems against a database of known flaws and returns a list. A penetration test is manual, narrow, and deep: a human exploits the flaws, chains them together, and proves impact. You need both, and they answer different questions.

Dimension	Vulnerability scanning	Penetration testing
Who runs it	Automated tool, scheduled	Skilled human tester
Question answered	What weaknesses might exist?	What can an attacker actually do?
Method	Match systems against a known-flaw database	Exploit and chain weaknesses by hand
Output	A list of findings, scored	Demonstrated attack paths with evidence
False positives	Common, needs validation	Low, every finding is proven
Depth	Broad and shallow, whole estate	Narrow and deep, agreed scope
Frequency	Continuous or weekly	Periodic (quarterly, annually, on change)
Cost and effort	Low, runs unattended	High, days to weeks of expert time

The relationship is sequential, not competitive. A scan finds the candidates; a pentest proves which ones matter. Running a pentest without first scanning wastes expensive human hours rediscovering known flaws. Running scans without ever testing leaves you with a backlog of 4,000 findings and no idea which one is the open door. The scanner reduces the haystack; the tester finds the needle that is actually sharp.

There is also a depth difference scanners cannot close. A scanner sees each host in isolation, so it never reports the combination: the medium-severity flaw on host A plus the weak service account plus the flat network segment that, together, equal full domain compromise. Attackers think in chains. Scanners think in rows. The pentest is where the chain gets built and walked.

Black box, white box, and gray box testing

How much the tester knows going in defines the method, and each setting answers a different question. The three are points on a spectrum from zero knowledge to full disclosure.

Black box. The tester gets almost nothing beyond the company name or a target range, mimicking an external attacker with no inside information. It is the most realistic simulation of an opportunistic outsider, but the most time-consuming, because the tester burns hours on reconnaissance that a knowledgeable attacker might already have. Blind to whatever they never find in the time allowed.
White box. The tester gets full information up front: network diagrams, source code, credentials, architecture docs. This is the most thorough, because no time is wasted on discovery and the tester can review code and config directly. It models an insider, or an attacker who has already done extensive homework, and it finds the deep flaws a black box run would never reach in time.
Gray box. A middle ground. The tester gets limited knowledge, often a standard user account and some documentation. It simulates an attacker who has gained a foothold (a phished employee, a compromised low-privilege account) and focuses the effort on what that position can reach. It is the common choice, because it balances realism against efficient use of the tester's time.

There is no single best option. Black box answers "what can a stranger do from the outside." White box answers "what is actually wrong with this thing, holding nothing back." Gray box answers "what happens after the first compromise," which is the scenario most breaches actually follow. The right choice depends on the threat you are modeling, not on which sounds toughest.

The phases of a penetration test

Penetration testing · the engagement lifecycle

From signed scope to proven attack path

Seven phases. The early ones overlap a scan. The human value is everything after exploitation.

1 · PLANNING

Scope and rules

Signed authorization, scope, testing window. No signature, no test.

2 · RECON

Reconnaissance

Domains, ranges, services, people. Passive OSINT, then active.

3 · SCANNING

Scan and enumerate

Live hosts, open ports, services, candidate vulnerabilities.

4 & 5 · WHERE THE HUMAN ADDS VALUE

Exploitation, then post-exploitation

Gain a foothold, then escalate privilege, move laterally, reach the data, establish persistence. A theoretical flaw becomes a proven path. This is the chain a scanner can never build.

not "a flaw exists" but "this flaw reached the database"

6 · REPORTING

The deliverable

Attack paths, evidence, business impact, ranked remediation.

7 · RE-TEST

Fix and verify

Remediate, then re-test to confirm the hole closed, not moved.

Scan finds candidates, pentest proves the path A vulnerability scan covers phases 2 and 3 automatically and stops. The penetration test keeps going: it exploits, chains, and walks the path a real attacker would, then asks whether the SOC saw any of it.

A professional engagement follows a repeatable lifecycle. The exact labels vary by methodology, but the arc is consistent: agree the rules, learn the target, break in, see how far it goes, then write it up. These phases map closely to the standard kill-chain stages a real attacker uses, which is the point.

Planning and scoping. Define what is in scope, what is off limits, the rules of engagement, the testing window, and emergency contacts. This is the legal and operational foundation. Without a signed authorization, the rest is a crime.
Reconnaissance. Gather information about the target: domains, IP ranges, employees, exposed services, technologies. Passive recon uses public sources (OSINT) and touches nothing; active recon interacts with the target directly.
Scanning and enumeration. Use tools to map live hosts, open ports, running services, and known vulnerabilities. This turns the broad picture from recon into a concrete list of candidate weaknesses to attack.
Exploitation (gaining access). Attempt to exploit the weaknesses found, to gain a foothold. This is where a theoretical flaw becomes a proven one: a default credential, an unpatched service, an injection flaw that returns a shell.
Post-exploitation. Once inside, see how far the access goes. This is where the tester attempts privilege escalation, moves laterally to other systems, hunts for sensitive data, and establishes persistence to model a real intrusion. This phase produces the impact statement: not "a flaw exists" but "this flaw led to the customer database."
Reporting. Document every finding with its attack path, evidence, business impact, and concrete remediation, ranked so the defenders know what to fix first. The report is the deliverable; the rest of the engagement exists to produce it.
Remediation and re-testing. The organization fixes the findings, and the tester verifies the fixes actually closed the holes rather than moving them. A finding is not done until the re-test confirms it.

The early phases (recon, scanning) overlap with what a vulnerability scan does. The value a human adds is everything after: exploitation, the post-exploitation chain, and the judgment to know which path matters. Automation can find the door; it takes a tester to walk through it and see what is on the other side.

Types of penetration tests

"Penetration test" is a category, not a single thing. Engagements are scoped to a target, and the skills and tooling differ sharply between them. The common types:

Network penetration testing. The classic engagement. Test the internal and external network for exploitable services, misconfigurations, and weak segmentation. External tests model an outside attacker; internal tests model one who already has a foothold.
Web application testing. Target web apps and APIs for flaws like injection, broken authentication, and access-control gaps. Usually framed around the OWASP Top 10. This is the most commonly requested test, because web apps are the most exposed attack surface most organizations have.
Wireless testing. Attack Wi-Fi networks: weak encryption, rogue access points, guest-to-corporate network bleed, and the controls separating them.
Social engineering. Test the human layer with phishing, pretext calls, or physical impersonation. Often the fastest way in, because it sidesteps the technical controls entirely.
Physical penetration testing. Attempt to physically breach a facility: tailgating, badge cloning, lock bypass, reaching a network jack or an unlocked workstation inside the building.
Cloud penetration testing. Target cloud environments (AWS, Azure, GCP) for misconfigured storage, over-permissioned identities, exposed management interfaces, and insecure deployment pipelines. Scope must respect the cloud provider's rules.
Mobile application testing. Assess iOS and Android apps for insecure storage, weak transport security, and flawed authentication on the device and its backend.

Red teaming is the next step up: a goal-driven, stealthy, multi-vector campaign that combines several of these test types to test detection and response, not just to find flaws. A standard pentest aims to find as much as possible in the time; a red team aims to reach a specific objective without being caught, which exercises the SOC directly.

Methodologies, standards, and tools

Penetration testing is not improvised. Mature engagements follow published methodologies so the work is repeatable and the coverage is defensible.

PTES (Penetration Testing Execution Standard) defines the seven-phase engagement lifecycle, from pre-engagement interactions through reporting.
OWASP Web Security Testing Guide is the reference for web application and API testing, organized around the categories in the OWASP Top 10.
NIST SP 800-115, the Technical Guide to Information Security Testing and Assessment, is the US government reference methodology.
OSSTMM (Open Source Security Testing Methodology Manual) is a peer-reviewed methodology covering operational security testing across channels.

The tooling is mostly open source and well known to defenders, because the same tools show up in real intrusions. Reconnaissance and scanning lean on Nmap. Vulnerability identification uses scanners like Nessus and OpenVAS. Web testing runs through Burp Suite and OWASP ZAP. Exploitation centers on the Metasploit Framework. Credential attacks use Hashcat and John the Ripper. Traffic analysis uses Wireshark. Knowing this toolset matters for defenders too: a detection that fires on default Nmap scan patterns or Metasploit's default behaviors catches both the tester and the attacker who never bothered to change the defaults.

How to read a penetration test report

The report is the product, and most of its value is lost if the defenders read it as a to-do list and stop there. Read it the way the tester wrote it: as a story of how your defenses actually held up.

Start with the attack narrative, not the findings table. The narrative shows the chain: where the tester got in, how they escalated, where they moved, and what they reached. That chain is what a real attacker would follow, and it tells you which single fix breaks the whole path. Cutting one link in the chain is often worth more than patching ten unrelated findings.

Then weigh the findings by demonstrated impact, not by the raw severity label. A "critical" that the tester could not actually exploit in your environment is lower priority than a "medium" that turned out to be the pivot to the domain. The pentest exists precisely to make that distinction, so honor it.

Finally, ask the detection question the report can answer better than any tool: what did the SOC see? If the tester walked from external recon to domain admin and your alerts stayed quiet, the finding is not just the exploited flaw. It is the blind spot in your monitoring. The most useful output of a pentest for a blue team is the list of techniques that should have fired an alert and did not, because that is your detection-engineering backlog, written by someone who just proved it matters.

Frequently Asked Questions

What is penetration testing in simple terms?

Penetration testing is a controlled, authorized attack on your own systems, run by a security professional to find and prove which weaknesses a real attacker could exploit. Instead of just listing possible flaws like a scanner does, a pentester actually breaks in, sees how far they can get, and writes up the attack path with evidence so you can fix what matters most.

What is the difference between penetration testing and vulnerability scanning?

A vulnerability scan is automated and broad: a tool checks your systems against a database of known flaws and returns a scored list, which often contains false positives. A penetration test is manual and deep: a human exploits those flaws, chains them together, and proves real-world impact. The scan finds candidates; the pentest proves which ones an attacker can actually use. You run both, the scan continuously and the pentest periodically.

What are the types of penetration testing?

By scope, the common types are network, web application, wireless, social engineering, physical, cloud, and mobile application testing. By tester knowledge, engagements are black box (no inside information), white box (full information and source code), or gray box (limited knowledge, often a standard user account). The scope and method are chosen to match the specific threat being modeled.

What are the phases of a penetration test?

The standard lifecycle is planning and scoping, reconnaissance, scanning and enumeration, exploitation (gaining access), post-exploitation (privilege escalation, lateral movement, data access), reporting, and remediation with re-testing. The early phases overlap with automated scanning; the human value is in exploitation, the post-exploitation chain, and the judgment about which attack path actually matters.

Is penetration testing legal?

Yes, when it is authorized. A penetration test requires written permission from the system owner, a defined scope, and agreed rules of engagement before any testing begins. The exact same techniques performed without that authorization are illegal under computer-misuse laws. The signed authorization is what separates a security assessment from a crime, which is why scoping is the first phase.

How often should you do a penetration test?

Most organizations test at least annually, and many test quarterly or after any significant change: a new application, a major infrastructure shift, or a merger. Compliance frameworks like PCI DSS mandate regular testing. The cadence depends on how fast your environment changes and your risk tolerance, but a pentest is a point-in-time snapshot, so pairing it with continuous vulnerability scanning between engagements is standard practice.

What is the difference between penetration testing and red teaming?

A penetration test aims to find as many exploitable weaknesses as possible within an agreed scope and time, and it usually does not hide from the defenders. A red team engagement is goal-driven and stealthy: it pursues a specific objective (reach the payroll system, exfiltrate a target dataset) across multiple vectors while trying to avoid detection, which tests the SOC's ability to detect and respond, not just the presence of flaws.

The bottom line

A vulnerability scanner tells you what might be wrong. A penetration test proves what an attacker can actually do, by having a skilled professional break in with permission and document the path. The deliverable is not a list of CVEs; it is a demonstrated attack chain, ranked by the impact it achieved, with the evidence and the fix.

Use both, in order: scan continuously to find the candidates, then test periodically to prove which ones are the real open doors and to see whether your SOC notices. Read the report for the attack narrative first, weigh findings by demonstrated impact rather than raw scores, and mine it for the detections that should have fired and did not. The flaws will get patched. The lasting value of a good pentest is the proof of which path mattered, and the blind spots in your monitoring that someone just walked through to find it.

Frequently asked questions