What Is a Log File? Anatomy, Types, and Sources
A log file is a record of events that a system, application, or device writes as those events occur, one entry per event, each stamped with the time it happened.
Every system you defend is writing down what it does, one line at a time. A failed SSH login appends a line to /var/log/auth.log. A user opening a file generates a Windows Security event. A web request lands in an access log. A firewall logs the packet it dropped. None of these systems is trying to help an investigator. They write the line because that is how the software was built to record its own behavior. The log file is the byproduct, and it is also the single most complete record of what happened on a host that you will ever get.
A log file is a record of events that a system, application, or device writes as those events occur, one entry per event, each stamped with the time it happened. It is the raw material of detection, forensics, and threat hunting. This guide covers what a log file actually is, what a single log entry contains, the main types of logs and where they come from, the difference between structured and unstructured logs, and why a defender treats log files as the first source of truth. The sibling articles cover the formats those files use, how to manage them at scale, and how to analyze them; this one is about the artifact itself.
What is a log file?
A log file is a file to which a program appends a line, or a structured record, every time a noteworthy event happens. The word "log" comes from the ship's log: a sequential, timestamped account of events in the order they occurred. A software log keeps the same discipline. Each entry describes one event, the file grows by appending to the end, and the order of the lines is the order things happened.
The unit is the event, not the session or the user. A single user action can produce many log entries across many files: a logon writes an authentication record, opening an application writes an application event, that application reading a file may write its own line, and the network stack records the connection. Read together, those scattered entries reconstruct what one person did. Read alone, each is a fragment.
Three properties make a log file what it is. It is sequential: entries are appended in time order, so the file is a timeline by construction. It is timestamped: every entry carries when the event happened, which is what lets you correlate one log against another. And it is append-only by intent: a healthy log is added to, never edited, which is exactly why a log that shows signs of being truncated or rewritten is itself an indicator of compromise. An attacker who clears the Windows Security log or trims /var/log/auth.log is not hiding; they are leaving a different, louder trace.
Anatomy of a log entry
The file is a stack of entries. The entry is where the information lives. Most log entries, whatever the system, carry the same core elements.
| Element | What it records | Why a defender cares |
|---|---|---|
| Timestamp | When the event happened, ideally with a timezone or in UTC | The backbone of correlation; lets you line up one log against another and build the timeline |
| Source | The host, service, or component that emitted the entry | Tells you where to look next and which system to trust |
| Severity / level | INFO, WARN, ERROR, DEBUG, or a numeric priority | Lets you filter noise from signal; a burst of ERROR is worth a look |
| Event identity | An event ID, message, or action name | What actually happened: a logon, a denied connection, a process start |
| Actor | The user, account, or process responsible | Ties the event to a who; the pivot from "what" to "whom" |
| Detail / payload | The specifics: source IP, file path, command line, status code | The evidence itself, where the investigation actually happens |
A Linux auth log line shows the shape directly:
Jun 20 02:14:07 web01 sshd[2841]: Failed password for invalid user admin from 203.0.113.5 port 49210 ssh2
Read it field by field. Jun 20 02:14:07 is the timestamp. web01 is the source host. sshd[2841] is the service and process ID. The message is Failed password for invalid user admin, and from 203.0.113.5 port 49210 is the payload that gives you an IP to pivot on. One line, and you have when, where, who was targeted, and from where. A hundred of these from the same IP in a minute is a brute-force attempt, and the log file is where it is written down.
Where log files come from
Almost everything in an environment produces logs. For investigation purposes, the sources group into a few broad categories, and a real case usually pulls from several at once.
Operating systems. The OS logs its own activity. Linux writes to /var/log (auth.log or secure for authentication, syslog or messages for general system events, kern.log for the kernel). Windows uses the Event Log, split into channels: Security (logons, privilege use, object access), System (drivers and services), and Application. These answer who logged on, what ran, and what the OS itself did.
Applications. Web servers, databases, mail servers, and custom software each write their own logs: what the application did, the errors it threw, the transactions it processed. A web server's access log records every request; a database logs queries and connection attempts; an application's own log records its internal state.
Network and security devices. Firewalls log allowed and denied connections. Proxies and DNS servers log the destinations clients reached. IDS/IPS sensors log the traffic they flagged. Endpoint agents log process creation, file writes, and registry changes. This is often the richest source for a hunt because it sees activity the host itself may not record.
Cloud and infrastructure. Cloud providers log API calls (AWS CloudTrail records who called which API and when), resource access, and platform events. Containers and orchestrators emit their own logs, frequently to stdout, where a collector picks them up. Cloud logging is often a feature you must turn on; if nobody enabled it before the incident, the record for that window does not exist.
The practical consequence: no single log file tells the whole story. The web access log shows the request, the OS log shows the logon it led to, the endpoint log shows the process that ran, and the firewall log shows the connection back out. Correlating across sources is the job, and it depends on every one of those files existing and sharing a consistent clock.
The main types of logs
Logs are usually grouped by what they record, not by which product wrote them. The same categories recur across vendors.
- Event logs record discrete system events: a service starting, a user logging on, a device connecting. The Windows event log is the canonical example, with each entry keyed to an event ID.
- System logs record operating-system-level activity: kernel messages, driver loads, hardware events, the daemons behind the OS. On Linux this is the domain of syslog.
- Authentication and access logs record who connected and who logged in, successful and failed: SSH sessions,
sudouse, Windows logon events, web requests to a resource. - Application logs record what an application did internally: transactions processed, exceptions thrown, jobs run. They are the first place to look when a specific service misbehaves.
- Audit logs record security-relevant changes: a permission grant, a configuration edit, an account creation. They answer "who changed what," which the other logs often do not.
- Change logs record modifications to files, configurations, or data over time, in chronological order.
- Security and threat logs record what security tooling flagged: a firewall denial, an IDS alert, an antivirus detection, an endpoint behavioral hit.
The categories overlap by design. A single Windows Security event can be both an event log entry and an authentication record. What matters for an investigation is not the taxonomy but knowing which source records which kind of activity, so you know where to look when you need to answer a specific question.
Structured, semi-structured, and unstructured logs
Log files differ not just in what they record but in how the entry is written, and that shape decides how hard the file is to work with.
Unstructured logs are free-form text. The Linux sshd line above is unstructured: a human can read it, but a machine has to parse it with patterns to pull out the fields. Most traditional system logs are unstructured, which is why log parsing exists as a discipline.
Semi-structured logs impose a loose, repeating shape without a rigid schema. Syslog is the classic example: a defined priority, timestamp, and host, followed by a free-text message. There is enough structure to route and filter, but the message body is still prose.
Structured logs encode each entry as fields, commonly JSON or key-value pairs, so every field is named and machine-readable without parsing. {"ts":"2026-06-20T02:14:07Z","event":"login_failed","user":"admin","src_ip":"203.0.113.5"} carries the same information as the sshd line, but a tool can query src_ip directly. Modern applications and cloud services increasingly log this way.
The shape matters because raw logs are scattered, inconsistent, and individually low-signal. A single failed login means little. The value appears when logs are collected, normalized into a common form, and correlated, which is the work of log analysis and, at scale, the reason a SIEM exists. Structured logs make that pipeline cheaper; unstructured logs make parsing a prerequisite. Either way, the log file is the input, and nothing downstream can analyze an event that was never written.
Why log files matter to a defender
Log files are the primary evidence in nearly every security investigation, for one reason: they are the record of what actually happened, written by the systems themselves as it happened.
In detection, patterns across log files are the signal. A brute-force attempt is a run of failed-login entries from one source. Lateral movement is a logon to a host an account never touched before. Data exfiltration is an unusually large transfer in a firewall or proxy log. The events were always being logged; detection is reading them in time to act.
In forensics and incident response, the log file is how you reconstruct an intrusion. You take a known indicator, an attacker IP, a compromised account, a malicious process name, and pull every log entry that touches it, across hosts, to build the timeline: where they got in, what they ran, what they reached, and what they took. Because logs are sequential and timestamped, the timeline assembles itself once you have the entries.
In threat hunting, log files over time establish the baseline of normal, and the hunt is a search for the entry that deviates: a service account logging on interactively, a process spawning from an unexpected parent, a connection to a destination no host has used before.
There is a precondition under all three. The log has to exist, has to be retained long enough to matter, and has to survive the attacker. Logging that was never enabled, logs rotated away before anyone looked, and logs cleared during the intrusion are the three ways the evidence is gone before you need it. The log file is only the first source of truth when someone made sure it was written and kept.
Frequently Asked Questions
What is a log file?
A log file is a record of events that a system, application, or device writes as those events happen, one entry per event, each stamped with the time it occurred. Files are appended to in time order, so a log file is effectively a timeline of what a system did. Operating systems, applications, network devices, and cloud services all produce them, and they are the primary evidence in security investigations.
What does a log entry contain?
A typical log entry contains a timestamp (when the event happened), a source (the host or component that emitted it), a severity or level, an event identity such as an event ID or message, the actor responsible (a user or process), and a detail payload with the specifics like a source IP, file path, or status code. Not every log includes every element, but the timestamp and the event description are the constants.
What are the main types of log files?
The common categories are event logs (discrete system events), system logs (OS-level activity), authentication and access logs (who connected and logged in), application logs (what an application did internally), audit logs (security-relevant changes), change logs (modifications over time), and security or threat logs (what security tools flagged). The categories overlap, and a single entry can belong to more than one.
What is the difference between structured and unstructured logs?
Unstructured logs are free-form text that a machine must parse with patterns to extract fields, like a traditional Linux syslog line. Structured logs encode each entry as named fields, commonly JSON or key-value pairs, so tools can query a field directly without parsing. Semi-structured logs sit between the two, with a loose repeating shape and a free-text message. Structured logs are easier to collect and analyze at scale.
Where are log files stored?
On Linux, most logs live under /var/log (for example auth.log, syslog, kern.log). On Windows, the Event Log holds the Security, System, and Application channels. Applications write to their own log directories, network and security devices keep their own logs, and cloud services deliver logs to storage you configure. In larger environments, logs are forwarded from these sources to a central system such as a SIEM.
Why are log files important for security?
Log files are the record of what actually happened on a system, written by the systems themselves, which makes them the primary evidence in detection, forensics, and threat hunting. They reveal brute-force attempts, lateral movement, data exfiltration, and the full path of an intrusion. Their value depends on the logs being enabled, retained long enough, and protected from tampering, since a log that was never written or was cleared by an attacker cannot help.
The bottom line
A log file is a sequential, timestamped record of events a system writes as they happen, one entry per event. Each entry carries when, where, who, and what, and the file is a timeline by construction. Operating systems, applications, network devices, and cloud platforms all produce them, in shapes that range from free-form text to structured JSON, and no single file holds the whole story.
For a defender, the log file is the first source of truth: the evidence of an intrusion is almost always sitting in plain text across these files, waiting to be correlated. The only failure modes that matter are the log that was never enabled, the one rotated away too soon, and the one an attacker cleared. Confirm logging is on, retained, and protected before you need it, because the investigation is only as good as the entries that survived to be read.
Frequently asked questions
<p>A log file is a record of events that a system, application, or device writes as those events happen, one entry per event, each stamped with the time it occurred. Files are appended to in time order, so a log file is effectively a timeline of what a system did. Operating systems, applications, network devices, and cloud services all produce them, and they are the primary evidence in security investigations.</p>
<p>A typical log entry contains a timestamp (when the event happened), a source (the host or component that emitted it), a severity or level, an event identity such as an event ID or message, the actor responsible (a user or process), and a detail payload with the specifics like a source IP, file path, or status code. Not every log includes every element, but the timestamp and the event description are the constants.</p>
<p>The common categories are event logs (discrete system events), system logs (OS-level activity), authentication and access logs (who connected and logged in), application logs (what an application did internally), audit logs (security-relevant changes), change logs (modifications over time), and security or threat logs (what security tools flagged). The categories overlap, and a single entry can belong to more than one.</p>
<p>Unstructured logs are free-form text that a machine must parse with patterns to extract fields, like a traditional Linux syslog line. Structured logs encode each entry as named fields, commonly JSON or key-value pairs, so tools can query a field directly without parsing. Semi-structured logs sit between the two, with a loose repeating shape and a free-text message. Structured logs are easier to collect and analyze at scale.</p>
<p>On Linux, most logs live under <code>/var/log</code> (for example <code>auth.log</code>, <code>syslog</code>, <code>kern.log</code>). On Windows, the Event Log holds the Security, System, and Application channels. Applications write to their own log directories, network and security devices keep their own logs, and cloud services deliver logs to storage you configure. In larger environments, logs are forwarded from these sources to a central system such as a SIEM.</p>
<p>Log files are the record of what actually happened on a system, written by the systems themselves, which makes them the primary evidence in detection, forensics, and threat hunting. They reveal brute-force attempts, lateral movement, data exfiltration, and the full path of an intrusion. Their value depends on the logs being enabled, retained long enough, and protected from tampering, since a log that was never written or was cleared by an attacker cannot help.</p>