Malware Analysis

What Is Incident Response? Process, Frameworks & plan

14 min read·Updated June 2026·DFIRincident responseSOC

02:14. An EDR alert fires: a host is encrypting files in bulk. Ninety seconds later, a second host starts. In the organization that prepared, the on-call analyst declares an incident, opens the runbook, isolates both hosts from the network, pages the incident lead, and starts preserving memory before anything reboots. In the organization that did not, someone is googling "ransomware what do I do" at 2 a.m. while a third host lights up. Same attack. Two outcomes, decided months earlier by whether a plan existed.

Incident response is the organized process a team follows to detect, contain, eradicate, and recover from a cyberattack, with the goal of limiting damage, cost, and downtime. It is a capability, not a product: people, a written plan, and a repeatable process. The tooling helps you execute it. It does not replace it.

This guide covers what incident response is, why speed decides the cost, the NIST and SANS frameworks (including NIST's 2025 overhaul), the six-step process, what goes in an incident response plan, who sits on the team, the tools, the metrics that measure it, and how to build the skill. It is written for blue teamers: SOC analysts, incident responders, and DFIR practitioners who run the playbook when it is real.

What is incident response?

Incident response (IR) is how an organization handles the aftermath of a security breach or attack. The job is to manage the incident from detection through recovery in a way that limits the harm and shortens the time to get back to normal.

A security event is not the same as a security incident. An event is anything observable: a login, a connection, an alert. An incident is an event, or a chain of them, that actually threatens the confidentiality, integrity, or availability of systems or data. Most events are noise. Incident response begins the moment triage decides an event is a real incident.

The work splits into two halves that are easy to confuse. The reactive half is the response itself: containing and eradicating an active threat. The proactive half is everything you do before, the plan, the team, the tooling, the rehearsals, that determines how well the reactive half goes. The opening scene is the whole point: the response at 2 a.m. is only as good as the preparation that preceded it.

Why incident response matters: speed decides the cost

The case for incident response is measured in money and time. According to IBM's Cost of a Data Breach 2025 report, the global average breach cost USD 4.44 million. The same report put the average breach lifecycle, the mean time to identify and contain a breach, at 241 days, the lowest in nine years.

That lifecycle is where IR earns its budget. Breaches with a lifecycle under 200 days cost an average of USD 3.61 million; those that ran past 200 days cost USD 5.49 million. The gap, roughly USD 1.88 million, is the price of being slow. Incident response is the discipline that compresses those numbers.

Speed does not come from heroics during the incident. It comes from preparation before it. The team that contained both hosts in the opening scene moved fast because the decisions, who declares an incident, who has authority to pull a host off the network, where the runbook lives, were made in advance. The unprepared team lost the most valuable hours of the response improvising answers to questions a plan should have already settled.

Incident response frameworks: NIST and SANS

Two frameworks dominate. Most IR programs are built on one or a blend of both.

	NIST SP 800-61 (Rev 2)	SANS (PICERL)
Phases	4	6
Model	Preparation; Detection & Analysis; Containment, Eradication & Recovery; Post-Incident Activity	Preparation; Identification; Containment; Eradication; Recovery; Lessons Learned
Strength	Concise, widely referenced, maps to compliance	Granular, operations-focused, splits the response steps out

The two are more alike than different. SANS breaks NIST's combined "Containment, Eradication & Recovery" phase into three distinct steps, which is why operators often prefer it: in a real incident, containment, eradication, and recovery are separate decisions with separate owners. NIST's four-phase model is the one auditors and compliance frameworks most often reference. Pick one as your backbone and most teams quietly use SANS-style granularity inside a NIST-shaped plan.

What changed in 2025: NIST SP 800-61 Revision 3

The classic four-phase NIST lifecycle came from Revision 2, published in 2012. In April 2025, NIST released Revision 3, and it is a genuine rethink, not a refresh. Revision 3 retires the standalone four-phase lifecycle and instead maps incident response onto the six functions of the NIST Cybersecurity Framework (CSF) 2.0: Govern, Identify, Protect, Detect, Respond, and Recover.

what is incident response comparison

The reasoning matters for practitioners. The old model framed incident response as a discrete set of tasks bounded by the few days around an incident. Revision 3 treats it as part of continuous cybersecurity risk management, because breaches are now more frequent and take far longer to recover from. In practice it pulls preparation (Govern, Identify, Protect) and detection (Detect) into the same lifecycle as the response itself (Respond, Recover), which is exactly the proactive-plus-reactive split this guide keeps returning to. The four-phase and six-step models are still useful operational shorthand. The Revision 3 view is the one that reflects how mature programs actually run.

The incident response process, step by step

The SANS six-step model (PICERL) is the most practical operational spine, so the process below follows it. Each step is a set of decisions with an owner, not a box to tick.

what is incident response lifecycles

1. Preparation

Everything that makes the other five steps possible. Write the incident response plan, build and train the team, define what counts as an incident and how severity is rated, stand up the tooling, and assemble the jump kit: contact lists, credentials, forensic tools, and clean systems ready to go. Run tabletop exercises so the first time the team executes the plan is not during a real breach. This is the step that pays for all the others.

2. Identification

Determine whether an event is an incident, and if so, what kind and how severe. Triage the alert, gather corroborating evidence from logs and endpoints, confirm the scope, and classify the severity. Document from the first minute: a clean timeline started here saves hours later. The output of this step is a declared incident with an assigned severity and an owner.

3. Containment

Stop the bleeding without destroying the evidence. Containment usually runs in two stages. Short-term containment isolates affected systems immediately: pull the host off the network, disable the compromised account, block the malicious IP. Long-term containment applies more durable fixes, like rebuilding a clean system or applying a patch, while the investigation continues. Preserve forensic images before you wipe anything, because eradication destroys evidence you may need.

4. Eradication

Remove the threat from the environment entirely. Delete malware, close the initial access vector, remove persistence mechanisms the attacker planted, and reset compromised credentials. The trap here is treating the symptom and missing the root cause: if the attacker got in through an unpatched VPN and you only remove the malware, they walk back in through the same door.

5. Recovery

Bring systems back to production safely. Restore from clean backups, validate that systems are fully cleaned before reconnecting them, and monitor closely for signs the attacker returns. Recovery is a judgment call about confidence: you reconnect when you can prove the environment is clean, not when leadership is impatient.

6. Lessons learned

The step teams skip, and the one that compounds. Within a week or two of closing the incident, run an honest post-incident review: what happened, how the response went, what slowed it down, and what to change. Feed the findings back into detection rules, the plan, and the next tabletop. An incident you do not learn from is one you have agreed to suffer again.

The incident response plan

The incident response plan (IRP) is the document the team executes under pressure. A good one removes decisions from the worst possible moment to make them. The components that matter:

Roles and authority. Who declares an incident, who leads it, who can take systems offline, and who speaks for the organization. Decided in advance, in writing.
Severity classification. A simple, agreed scale so the response matches the threat. For example:

Severity	Example	Response
SEV-1 (Critical)	Active ransomware, confirmed data exfiltration	Full team, executive and legal notified, all hands
SEV-2 (High)	Single compromised host, contained malware	IR team engaged, scoped investigation
SEV-3 (Low)	Isolated policy violation, blocked attempt	Analyst handles, documented, no escalation

Playbooks. Step-by-step runbooks for the incident types you actually face: ransomware, business email compromise, data exfiltration, account takeover. Generic plans stall; specific playbooks move.
Communication plan. Internal escalation paths and, critically, external communication: when and how to notify regulators, customers, law enforcement, and the public. Regulatory clocks (breach-notification deadlines) start whether or not you are ready.
Contact list and tooling. Who to call at 2 a.m., and where the tools, credentials, and clean systems live.

The references most teams learn from gloss over two of these: severity classification and external or regulatory communication. Both are where unprepared responses fall apart.

The incident response team

Incident response is a team sport, and the team has a name: the CSIRT (Computer Security Incident Response Team), sometimes called a CIRT or CERT. In smaller organizations it overlaps heavily with the SOC; in larger ones it is a distinct function the SOC escalates into.

Core roles:

Incident commander / IR lead. Owns the incident end to end, makes the containment and recovery calls, and coordinates everyone else. Not necessarily the most technical person, but the one who can run the response.
Security analysts and responders. Do the hands-on triage, investigation, and containment work.
Forensic analysts. Preserve and analyze evidence, reconstruct the timeline, and determine root cause and scope.
Threat intelligence. Supplies context on the adversary and their techniques, often mapped to MITRE ATT&CK.

For a major incident, the team extends beyond security: legal, communications and PR, HR, executive leadership, and sometimes outside IR retainers and law enforcement. The plan should name these people before the incident, not scramble for them during it.

Incident response tools

No tool runs the response for you, but the right stack makes a fast response possible. The core categories:

Tool	Role in incident response
SIEM	Centralized logs and detection; where most incidents are first spotted and investigated
EDR / XDR	Endpoint visibility plus the ability to isolate a host and kill a process remotely
SOAR	Automates repetitive response steps with playbooks; cuts containment time
Forensics suite	Disk and memory capture and analysis for root-cause and scope
Threat intelligence	Enriches indicators and ties activity to known adversaries
Case management	Tracks the incident, the timeline, and the evidence through to closure

A SIEM is usually where an incident surfaces and where the investigation runs, since it holds the correlated history across sources. EDR gives responders the reach to contain an endpoint without walking to it. SOAR is what shortens the response: per IBM's 2025 report, organizations using AI and automation extensively saved an average of USD 1.9 million and cut the breach lifecycle by 68 days. The tools matter most when they are integrated and the team has drilled with them.

Incident response metrics

If you cannot measure the response, you cannot improve it or prove its value. The metrics that count:

MTTD (Mean Time to Detect). From the start of the attack to detection. The first half of the breach lifecycle.
MTTR (Mean Time to Respond / Recover). From detection to containment or full recovery. The second half.
Dwell time. How long the attacker operated before being caught. The number breaches are ultimately judged on.
Containment time. Specifically how long from detection to stopping the spread. This is the metric the cost data ties to most directly.

These are not vanity numbers. The IBM figures, a 241-day average lifecycle and a USD 1.88 million penalty for crossing 200 days, are MTTD and containment time expressed in dollars. Every improvement to detection and containment time is a direct cut to the cost of the next breach.

Building an incident response capability

Standing up IR, or sharpening it, comes down to a few disciplines done consistently.

Write the plan before you need it. A plan drafted during an incident is not a plan. Define roles, severity, playbooks, and the communication tree while it is quiet.
Pick a framework and map to it. NIST or SANS as the backbone, mapped to the incident types you actually face. Consistency beats cleverness under pressure.
Classify severity and pre-decide escalation. The response to a blocked phishing attempt and an active ransomware event should not look the same, and nobody should be debating which it is at 2 a.m.
Rehearse with tabletop exercises. Run the plan against realistic scenarios before the real one. Tabletops surface the gaps cheaply.
Integrate detection and intel. The faster identification happens, the cheaper the whole incident. Good detection content and current threat intelligence shorten the front half of the lifecycle.
Run honest post-incident reviews. Every incident feeds the next plan, the next detection rule, and the next tabletop. This is how the program matures.

The constraint, as with every blue team discipline, is skill. A plan is only as good as the responders executing it, and that judgment is built by doing the work on real attacks.

Frequently asked questions

What is incident response in simple terms?

Incident response is the organized process a team uses to detect, contain, and recover from a cyberattack while limiting the damage. It combines a written plan, a trained team, and a repeatable set of steps so that when a breach happens, the response is controlled rather than improvised.

What are the steps of the incident response process?

The widely used SANS model has six steps: preparation, identification, containment, eradication, recovery, and lessons learned. NIST's model groups them into four phases: preparation; detection and analysis; containment, eradication, and recovery; and post-incident activity. Both describe the same lifecycle at different levels of detail.

What is the difference between the NIST and SANS incident response frameworks?

The SANS framework uses six steps and the older NIST SP 800-61 model uses four phases, but they cover the same ground. SANS splits containment, eradication, and recovery into separate steps for operational clarity, while NIST combines them. NIST's 2025 Revision 3 reframes incident response entirely around the Cybersecurity Framework 2.0 functions.

What is an incident response plan?

An incident response plan is the document a team executes during a security incident. It defines roles and authority, a severity classification scale, step-by-step playbooks for common incident types, and a communication plan covering internal escalation and external notification to regulators and customers. Its purpose is to make decisions in advance so they are not made under pressure.

Who is on an incident response team?

The core incident response team (CSIRT) includes an incident commander who leads the response, security analysts and responders who do the hands-on work, forensic analysts who preserve evidence and find root cause, and threat intelligence support. Major incidents also pull in legal, communications, executives, and sometimes outside responders.

How is incident response measured?

The key metrics are mean time to detect (MTTD), mean time to respond or recover (MTTR), dwell time, and containment time. These map directly to cost: IBM's 2025 data shows breaches with a lifecycle under 200 days cost roughly USD 1.88 million less than slower ones, so faster detection and containment translate straight into reduced breach cost.