What Is Incident Response? Process and Frameworks
What Is Incident Response? Process and Frameworks
02:14. An EDR alert fires: a host is encrypting files in bulk. Ninety seconds later, a second host starts. In the organization that prepared, the on-call analyst declares an incident, opens the runbook, isolates both hosts from the network, pages the incident lead, and starts preserving memory before anything reboots. In the organization that did not, someone is googling "ransomware what do I do" at 2 a.m. while a third host lights up. Same attack. Two outcomes, decided months earlier by whether a plan existed.
Incident response is the organized process a team follows to detect, contain, eradicate, and recover from a cyberattack, with the goal of limiting damage, cost, and downtime. It is a capability, not a product: people, a written plan, and a repeatable process. The tooling helps you execute it. It does not replace it.
This guide covers what incident response is, why speed decides the cost, the NIST and SANS frameworks (including NIST's 2025 overhaul), the six-step process, what goes in an incident response plan, who sits on the team, the tools, the metrics that measure it, and how to build the skill. It is written for blue teamers: SOC analysts, incident responders, and DFIR practitioners who run the playbook when it is real.
What is incident response?
Incident response (IR) is how an organization handles the aftermath of a security breach or attack. The job is to manage the incident from detection through recovery in a way that limits the harm and shortens the time to get back to normal.
A security event is not the same as a security incident. An event is anything observable: a login, a connection, an alert. An incident is an event, or a chain of them, that actually threatens the confidentiality, integrity, or availability of systems or data. Most events are noise. Incident response begins the moment triage decides an event is a real incident.
The work splits into two halves that are easy to confuse. The reactive half is the response itself: containing and eradicating an active threat. The proactive half is everything you do before, the plan, the team, the tooling, the rehearsals, that determines how well the reactive half goes. The opening scene is the whole point: the response at 2 a.m. is only as good as the preparation that preceded it.
Why incident response matters: speed decides the cost
The case for incident response is measured in money and time. According to IBM's Cost of a Data Breach 2025 report, the global average breach cost USD 4.44 million. The same report put the average breach lifecycle, the mean time to identify and contain a breach, at 241 days, the lowest in nine years.
That lifecycle is where IR earns its budget. Breaches with a lifecycle under 200 days cost an average of USD 3.61 million; those that ran past 200 days cost USD 5.49 million. The gap, roughly USD 1.88 million, is the price of being slow. Incident response is the discipline that compresses those numbers.
Speed does not come from heroics during the incident. It comes from preparation before it. The team that contained both hosts in the opening scene moved fast because the decisions, who declares an incident, who has authority to pull a host off the network, where the runbook lives, were made in advance. The unprepared team lost the most valuable hours of the response improvising answers to questions a plan should have already settled.
Incident response frameworks: NIST and SANS
Two frameworks dominate. Most IR programs are built on one or a blend of both.
| NIST SP 800-61 (Rev 2) | SANS (PICERL) | |
|---|---|---|
| Phases | 4 | 6 |
| Model | Preparation; Detection & Analysis; Containment, Eradication & Recovery; Post-Incident Activity | Preparation; Identification; Containment; Eradication; Recovery; Lessons Learned |
| Strength | Concise, widely referenced, maps to compliance | Granular, operations-focused, splits the response steps out |
The two are more alike than different. SANS breaks NIST's combined "Containment, Eradication & Recovery" phase into three distinct steps, which is why operators often prefer it: in a real incident, containment, eradication, and recovery are separate decisions with separate owners. NIST's four-phase model is the one auditors and compliance frameworks most often reference. Pick one as your backbone and most teams quietly use SANS-style granularity inside a NIST-shaped plan.
What changed in 2025: NIST SP 800-61 Revision 3
The classic four-phase NIST lifecycle came from Revision 2, published in 2012. In April 2025, NIST released Revision 3, and it is a genuine rethink, not a refresh. Revision 3 retires the standalone four-phase lifecycle and instead maps incident response onto the six functions of the NIST Cybersecurity Framework (CSF) 2.0: Govern, Identify, Protect, Detect, Respond, and Recover.

The reasoning matters for practitioners. The old model framed incident response as a discrete set of tasks bounded by the few days around an incident. Revision 3 treats it as part of continuous cybersecurity risk management, because breaches are now more frequent and take far longer to recover from. In practice it pulls preparation (Govern, Identify, Protect) and detection (Detect) into the same lifecycle as the response itself (Respond, Recover), which is exactly the proactive-plus-reactive split this guide keeps returning to. The four-phase and six-step models are still useful operational shorthand. The Revision 3 view is the one that reflects how mature programs actually run.
The incident response process, step by step
The SANS six-step model (PICERL) is the most practical operational spine, so the process below follows it. Each step is a set of decisions with an owner, not a box to tick.

1. Preparation
Everything that makes the other five steps possible. Write the incident response plan, build and train the team, define what counts as an incident and how severity is rated, stand up the tooling, and assemble the jump kit: contact lists, credentials, forensic tools, and clean systems ready to go. Run tabletop exercises so the first time the team executes the plan is not during a real breach. This is the step that pays for all the others.
2. Identification
Determine whether an event is an incident, and if so, what kind and how severe. Triage the alert, gather corroborating evidence from logs and endpoints, confirm the scope, and classify the severity. Document from the first minute: a clean timeline started here saves hours later. The output of this step is a declared incident with an assigned severity and an owner.
3. Containment
Stop the bleeding without destroying the evidence. Containment usually runs in two stages. Short-term containment isolates affected systems immediately: pull the host off the network, disable the compromised account, block the malicious IP. Long-term containment applies more durable fixes, like rebuilding a clean system or applying a patch, while the investigation continues. Preserve forensic images before you wipe anything, because eradication destroys evidence you may need.
4. Eradication
Remove the threat from the environment entirely. Delete malware, close the initial access vector, remove persistence mechanisms the attacker planted, and reset compromised credentials. The trap here is treating the symptom and missing the root cause: if the attacker got in through an unpatched VPN and you only remove the malware, they walk back in through the same door.
5. Recovery
Bring systems back to production safely. Restore from clean backups, validate that systems are fully cleaned before reconnecting them, and monitor closely for signs the attacker returns. Recovery is a judgment call about confidence: you reconnect when you can prove the environment is clean, not when leadership is impatient.
6. Lessons learned
The step teams skip, and the one that compounds. Within a week or two of closing the incident, run an honest post-incident review: what happened, how the response went, what slowed it down, and what to change. Feed the findings back into detection rules, the plan, and the next tabletop. An incident you do not learn from is one you have agreed to suffer again.
The incident response plan
The incident response plan (IRP) is the document the team executes under pressure. A good one removes decisions from the worst possible moment to make them. The components that matter:
- Roles and authority. Who declares an incident, who leads it, who can take systems offline, and who speaks for the organization. Decided in advance, in writing.
- Severity classification. A simple, agreed scale so the response matches the threat. For example:
| Severity | Example | Response |
|---|---|---|
| SEV-1 (Critical) | Active ransomware, confirmed data exfiltration | Full team, executive and legal notified, all hands |
| SEV-2 (High) | Single compromised host, contained malware | IR team engaged, scoped investigation |
| SEV-3 (Low) | Isolated policy violation, blocked attempt | Analyst handles, documented, no escalation |
- Playbooks. Step-by-step runbooks for the incident types you actually face: ransomware, business email compromise, data exfiltration, account takeover. Generic plans stall; specific playbooks move.
- Communication plan. Internal escalation paths and, critically, external communication: when and how to notify regulators, customers, law enforcement, and the public. Regulatory clocks (breach-notification deadlines) start whether or not you are ready.
- Contact list and tooling. Who to call at 2 a.m., and where the tools, credentials, and clean systems live.
The references most teams learn from gloss over two of these: severity classification and external or regulatory communication. Both are where unprepared responses fall apart.
The incident response team
Incident response is a team sport, and the team has a name: the CSIRT (Computer Security Incident Response Team), sometimes called a CIRT or CERT. In smaller organizations it overlaps heavily with the SOC; in larger ones it is a distinct function the SOC escalates into.
Core roles:
- Incident commander / IR lead. Owns the incident end to end, makes the containment and recovery calls, and coordinates everyone else. Not necessarily the most technical person, but the one who can run the response.
- Security analysts and responders. Do the hands-on triage, investigation, and containment work.
- Forensic analysts. Preserve and analyze evidence, reconstruct the timeline, and determine root cause and scope.
- Threat intelligence. Supplies context on the adversary and their techniques, often mapped to MITRE ATT&CK.
For a major incident, the team extends beyond security: legal, communications and PR, HR, executive leadership, and sometimes outside IR retainers and law enforcement. The plan should name these people before the incident, not scramble for them during it.
Incident response tools
No tool runs the response for you, but the right stack makes a fast response possible. The core categories:
| Tool | Role in incident response |
|---|---|
| SIEM | Centralized logs and detection; where most incidents are first spotted and investigated |
| EDR / XDR | Endpoint visibility plus the ability to isolate a host and kill a process remotely |
| SOAR | Automates repetitive response steps with playbooks; cuts containment time |
| Forensics suite | Disk and memory capture and analysis for root-cause and scope |
| Threat intelligence | Enriches indicators and ties activity to known adversaries |
| Case management | Tracks the incident, the timeline, and the evidence through to closure |
A SIEM is usually where an incident surfaces and where the investigation runs, since it holds the correlated history across sources. EDR gives responders the reach to contain an endpoint without walking to it. SOAR is what shortens the response: per IBM's 2025 report, organizations using AI and automation extensively saved an average of USD 1.9 million and cut the breach lifecycle by 68 days. The tools matter most when they are integrated and the team has drilled with them.
Incident response metrics
If you cannot measure the response, you cannot improve it or prove its value. The metrics that count:
- MTTD (Mean Time to Detect). From the start of the attack to detection. The first half of the breach lifecycle.
- MTTR (Mean Time to Respond / Recover). From detection to containment or full recovery. The second half.
- Dwell time. How long the attacker operated before being caught. The number breaches are ultimately judged on.
- Containment time. Specifically how long from detection to stopping the spread. This is the metric the cost data ties to most directly.
These are not vanity numbers. The IBM figures, a 241-day average lifecycle and a USD 1.88 million penalty for crossing 200 days, are MTTD and containment time expressed in dollars. Every improvement to detection and containment time is a direct cut to the cost of the next breach.
Building an incident response capability
Standing up IR, or sharpening it, comes down to a few disciplines done consistently.
- Write the plan before you need it. A plan drafted during an incident is not a plan. Define roles, severity, playbooks, and the communication tree while it is quiet.
- Pick a framework and map to it. NIST or SANS as the backbone, mapped to the incident types you actually face. Consistency beats cleverness under pressure.
- Classify severity and pre-decide escalation. The response to a blocked phishing attempt and an active ransomware event should not look the same, and nobody should be debating which it is at 2 a.m.
- Rehearse with tabletop exercises. Run the plan against realistic scenarios before the real one. Tabletops surface the gaps cheaply.
- Integrate detection and intel. The faster identification happens, the cheaper the whole incident. Good detection content and current threat intelligence shorten the front half of the lifecycle.
- Run honest post-incident reviews. Every incident feeds the next plan, the next detection rule, and the next tabletop. This is how the program matures.
The constraint, as with every blue team discipline, is skill. A plan is only as good as the responders executing it, and that judgment is built by doing the work on real attacks. Practicing investigations and response on realistic intrusion data, the kind in CyberDefenders blue team labs, builds the triage, forensics, and containment instincts a document cannot teach. Those are the skills the CCD certification track validates for incident response roles.