Glossary/Detection Engineering/Logging Best Practices

Logging Best Practices: A Defender's Guide

Logging best practices are the standards that make logs useful for security: capturing high-value events including failures, structuring them consistently, retaining them long enough, securing them against tampering, and centralizing them into a SIEM for correlation.

The first thing that goes wrong in an incident is not the alert. It is the log that was never written, or written in a format nobody can parse, or rotated off disk a week before the breach you are now investigating. By the time you reach for the evidence, the logging decisions were made months ago, by someone who was thinking about disk space and not about the question you are now trying to answer: what did the attacker touch, when, and from where.

Good logging is a defensive control, not a housekeeping chore. A structured, complete, centralized log lets a SOC analyst pivot from one alert to the full scope of an intrusion in minutes. A pile of inconsistent, unsecured, half-retained text files turns the same investigation into a week of guesswork. This guide covers the logging best practices that decide which of those two you end up with: what to log, how to structure it, how long to keep it, how to protect it, and how to get it into a place where it is actually queryable. It is written for the people who pay the price when logging is done badly, the SOC analysts, threat hunters, and DFIR responders who have to read whatever was captured.

Why logging best practices matter

A log is only useful if it survives, parses, and answers a question. Each of those is a property you have to design for, not one you get for free.

Survives means the relevant log existed, was written, and was still on disk when you went looking. Parses means a machine can split the line into fields without a bespoke regex per source. Answers means the fields you need (who, what, when, from where, and the outcome) are actually in the record. Most logging failures are a failure of one of these three, and they compound: an unparseable log that is also gone in seven days is worthless twice over.

The payoff for getting it right is concrete. Logging is the raw material of detection engineering: detection rules fire on fields, so consistent fields mean detections that work across sources instead of one brittle rule per log type. Investigations move at the speed of correlation, and correlation needs a common schema and a common clock. Compliance regimes such as PCI DSS and many others set explicit log retention floors, so retention is not only an operational choice but a legal one. And the whole discipline of turning raw records into findings, log analysis, only scales when the inputs are clean. Bad logging caps how good your detection and response can ever be.

What to log (and what to leave out)

The first decision is scope. Log too little and the evidence is not there. Log everything at full verbosity and you drown the signal, blow the budget, and slow every query. The goal is high-value events captured consistently, not maximum volume.

Capture the events that answer security questions: authentication (success and, critically, failure), authorization changes, privilege use, account and group changes, process creation, network connections, file and object access on sensitive data, configuration changes, and application errors. On Windows, that means turning on the audit policies that produce the 4624 / 4625 logon events and 4688 process-creation events rather than relying on defaults. On Linux, it means auditd rules and the auth logs behind SSH and sudo. Failure events matter as much as successes: a burst of failed logons is the reconnaissance that precedes the one that works.

Be deliberate about what you do not log, and about what must never appear in a log at all. Passwords, full payment card numbers, API keys, session tokens, and other secrets do not belong in plaintext log lines, both because logs are widely readable and because regulations forbid it. Mask, tokenize, or hash these fields before they are written. The same applies to bulk personal data: log the fact that a record was accessed and by whom, not the contents of the record.

Structure logs for machines, then humans

A log line a human can read but a machine cannot parse is a line you will grep by hand at 2 a.m. The single highest-leverage logging practice is to emit structured logs in a consistent, machine-parseable format.

Structured logging means each event is a set of explicit key-value fields rather than a freeform sentence. JSON is the common choice because every log pipeline ingests it, but the format matters less than the discipline: the same field name for the same thing, every time, across every service. src_ip is always src_ip, never source, client, or remoteAddr depending on which developer wrote that module. A consistent schema is what lets one detection rule and one search work across dozens of sources.

Use severity levels consistently so they can drive filtering and alerting. The common ladder, from most to least severe, is FATAL, ERROR, WARN, INFO, DEBUG, and TRACE. FATAL and ERROR are events demanding attention; WARN is a future problem; INFO is normal operation; DEBUG and TRACE are verbose diagnostics you generally keep out of production volume except when actively troubleshooting. Levels are only useful if applied consistently, an ERROR in one service should mean the same class of thing as an ERROR in another.

Make every log line carry context

A structured line is only as good as the fields in it. Every security-relevant event should be able to answer five questions on its own: who (the user or service identity), what (the action and its outcome), when (a precise timestamp), where (source host and IP), and which (the resource or object touched). Enrich at the source where you can, a request ID or trace ID that ties related events together is worth more than another paragraph of prose in the message.

Two fields are non-negotiable across every source. Timestamps must be precise and in a single zone, and UTC in ISO 8601 (2026-06-20T02:14:07Z) is the practice that saves you when you correlate logs from machines in different regions. A timeline built from logs in three local time zones is a timeline you cannot trust. Second, every event needs a stable identity for who or what caused it, so you can group an attacker's activity across sources by one pivot rather than reconciling three different username formats by hand.

Decide retention before you need it

Retention is where logging meets the budget, and where investigations quietly die. Keep logs too briefly and the breach you discover today, dwell time on intrusions is routinely measured in weeks, happened in a window you can no longer see. Keep everything hot forever and the storage bill becomes the reason logging gets cut.

The workable answer is tiered retention. Keep recent logs hot, indexed and instantly searchable, for the active detection and triage window. Roll older logs to warm or cold storage that is cheaper and slower but still retrievable for an investigation or audit. Set the floor by the longest of three constraints: your compliance obligations (PCI DSS, for example, requires retaining audit log history with a defined period kept readily available for analysis), your typical detection-to-investigation lag, and the realistic dwell time of an intrusion. The right number is a deliberate policy, not whatever the default rotation config happened to be.

TierTypical windowStorageWhat it is for
HotRecent (days to weeks)Indexed, fast, costlyLive detection, alert triage, active hunts
WarmMedium (weeks to months)Slower, cheaperInvestigation lookback, recent correlation
Cold / archiveLong (months to years)Cheapest, slow retrievalCompliance retention, late-discovered breaches

Secure the logs themselves

Logs are evidence, and evidence that an attacker can read, alter, or delete is not evidence. Treating the log store as a security boundary in its own right is a best practice that is easy to skip and expensive to skip.

Three controls do most of the work. First, restrict access: logs frequently contain sensitive operational detail and sometimes regulated data, so read and especially write or delete access belongs to a small set of identities, governed by tight access control. Second, protect integrity: ship logs off the originating host quickly and store them where the source system, and therefore an attacker who compromised it, cannot rewrite history. Append-only or write-once storage and integrity hashing matter here, because clearing or editing logs to cover tracks is a standard step in the attacker playbook. Third, protect confidentiality: encrypt logs in transit and at rest. The audit trail of who accessed the logs is itself a log worth keeping.

Centralize and feed the SIEM

Logging pipeline · raw event to detection
Every best practice is one stage of this pipeline
A scattered event becomes a correlatable detection only if each stage is done right. Skip one and the SIEM does less with more tuning.
1 · CAPTURE
High-value events
Auth, privilege, process, network. Failures too. Secrets out.
2 · STRUCTURE
JSON, UTC, identity
Consistent fields, ISO 8601 timestamp, stable who/what.
3 · SECURE + SHIP
Off-host, encrypted
A copy the compromised host cannot rewrite or delete.
4 · CENTRALIZE
SIEM normalize
Many sources, one schema, one clock, queryable at scale.
5 · DETECT
Correlate + alert
Failed-login burst, then a success, from one source: one alert.
The practices stack Structured fields and a UTC clock at stage 2 are exactly what make normalization cheap at stage 4 and correlation accurate at stage 5. Get the early stages wrong and you write per-source parsers instead of catching intrusions.

Individually, logs are scattered text on hundreds of hosts, each with its own clock, format, and retention. The events that reveal an attack rarely sit in one place: a failed-login burst on a domain controller, an anomalous process on a workstation, and an outbound connection on a firewall are one intrusion told across three logs. You cannot see it until they are in one place, normalized, and on one clock.

Centralizing means shipping logs off their source hosts into a central platform, parsing and normalizing them to a common schema, and making the result queryable and correlatable. This is the job of a security information and event management (SIEM) platform: aggregate from many sources, normalize the fields, and run detection logic across the combined stream. Centralization also hardens the logs, because a copy off the host survives an attacker who wipes the local file. The practices stack here: structured logs with consistent fields and UTC timestamps are exactly what make normalization cheap and correlation accurate. Get the earlier practices right and the SIEM does more with less tuning; get them wrong and you spend your time writing per-source parsers instead of catching intrusions.

Once logs are centralized and normalized, lean on real-time analysis. Alert on the patterns that matter (a failed-login burst followed by a success, a new process spawning from a web server, privilege changes outside a change window) and route the response. The point of feeding the SIEM is not storage, it is turning a stream of events into detections fast enough to act on.

A logging best practices checklist

The practices above condense to a short list you can audit a source against:

  • Log high-value security events, including failures, and keep secrets and bulk PII out of the log entirely.
  • Emit structured logs (JSON or equivalent) with a consistent field schema across sources.
  • Stamp every event in UTC, ISO 8601, with a stable identity for who or what acted.
  • Use severity levels (FATAL through TRACE) consistently so they can drive filtering and alerts.
  • Set retention by the longest of compliance, investigation lag, and dwell time, tiered hot to cold.
  • Restrict access to logs, ship them off-host quickly, and encrypt them in transit and at rest.
  • Centralize into a SIEM, normalize to a common schema, and alert in real time on the patterns that matter.

Frequently Asked Questions

What are logging best practices?

Logging best practices are the standards that make logs useful for security: capturing high-value events including failures, emitting them in a structured and consistent format, stamping each with a precise UTC timestamp and a stable identity, retaining them long enough to cover real investigation and dwell-time windows, securing them against tampering, and centralizing them into a SIEM for correlation. The goal is logs that survive, parse, and answer the questions an investigation asks.

What should you log for security?

Log authentication successes and failures, authorization and privilege changes, account and group modifications, process creation, network connections, access to sensitive files and objects, configuration changes, and application errors. Failure events are as important as successes, since failed logons and denied access are often the reconnaissance that precedes a breach. Never log secrets such as passwords, tokens, and API keys, or bulk personal data, in plaintext.

Why is structured logging important?

Structured logging records each event as explicit key-value fields rather than freeform text, so a machine can parse it without a custom regex per source. A consistent field schema lets one detection rule and one search work across many log sources, and it makes normalization in a SIEM cheap and accurate. Unstructured logs force manual parsing and brittle, source-specific detections.

How long should you retain logs?

Retain logs for the longest of three constraints: your compliance obligations, the typical lag between an incident occurring and being investigated, and the realistic dwell time of an intrusion, which is often weeks or longer. A tiered approach keeps recent logs hot and searchable, rolls older logs to cheaper warm and cold storage, and preserves an archive for compliance and late-discovered breaches. Many regimes such as PCI DSS set explicit retention floors.

Why centralize logs into a SIEM?

Centralizing ships logs off their source hosts into one platform that normalizes them to a common schema and makes them queryable and correlatable. The events that reveal an attack usually span multiple systems, and you cannot correlate them until they share one place and one clock. Centralization also protects the logs, because an off-host copy survives an attacker who clears the local file.

How do you protect logs from tampering?

Restrict read, write, and delete access to a small set of identities, ship logs off the originating host quickly so a compromised system cannot rewrite its own history, use append-only or write-once storage with integrity hashing, and encrypt logs in transit and at rest. Clearing or editing logs is a standard attacker step to cover tracks, so the log store should be treated as a security boundary in its own right.

The bottom line

Logging is a control you configure long before the incident and rely on entirely during it. The practices that matter are not exotic: log the high-value events including failures, keep secrets out, structure every line with consistent fields and a UTC timestamp, retain by the longest of compliance, investigation lag, and dwell time, secure the log store against tampering, and centralize into a SIEM so the records can be correlated and acted on.

Every one of these is a decision made in advance. The analyst working an alert at 2 a.m. cannot retroactively turn on auditing, re-parse an inconsistent format, or recover a log that rotated off disk last week. The only logging that helps is the logging that was already right when the attacker arrived.

Frequently asked questions

What are logging best practices?

<p>Logging best practices are the standards that make logs useful for security: capturing high-value events including failures, emitting them in a structured and consistent format, stamping each with a precise UTC timestamp and a stable identity, retaining them long enough to cover real investigation and dwell-time windows, securing them against tampering, and centralizing them into a SIEM for correlation. The goal is logs that survive, parse, and answer the questions an investigation asks.</p>

What should you log for security?

<p>Log authentication successes and failures, authorization and privilege changes, account and group modifications, process creation, network connections, access to sensitive files and objects, configuration changes, and application errors. Failure events are as important as successes, since failed logons and denied access are often the reconnaissance that precedes a breach. Never log secrets such as passwords, tokens, and API keys, or bulk personal data, in plaintext.</p>

Why is structured logging important?

<p>Structured logging records each event as explicit key-value fields rather than freeform text, so a machine can parse it without a custom regex per source. A consistent field schema lets one detection rule and one search work across many log sources, and it makes normalization in a SIEM cheap and accurate. Unstructured logs force manual parsing and brittle, source-specific detections.</p>

How long should you retain logs?

<p>Retain logs for the longest of three constraints: your compliance obligations, the typical lag between an incident occurring and being investigated, and the realistic dwell time of an intrusion, which is often weeks or longer. A tiered approach keeps recent logs hot and searchable, rolls older logs to cheaper warm and cold storage, and preserves an archive for compliance and late-discovered breaches. Many regimes such as PCI DSS set explicit retention floors.</p>

Why centralize logs into a SIEM?

<p>Centralizing ships logs off their source hosts into one platform that normalizes them to a common schema and makes them queryable and correlatable. The events that reveal an attack usually span multiple systems, and you cannot correlate them until they share one place and one clock. Centralization also protects the logs, because an off-host copy survives an attacker who clears the local file.</p>

How do you protect logs from tampering?

<p>Restrict read, write, and delete access to a small set of identities, ship logs off the originating host quickly so a compromised system cannot rewrite its own history, use append-only or write-once storage with integrity hashing, and encrypt logs in transit and at rest. Clearing or editing logs is a standard attacker step to cover tracks, so the log store should be treated as a security boundary in its own right.</p>

Practice track
SOC Analyst Tier 1
Build your foundational skills to monitor, detect, and escalate security alerts. This track includes essential tools, basic log analysis, and introductory incident response labs.
Browse SOC Analyst Tier 1 Labs โ†’