Glossary/Detection Engineering/Access Logs

What Are Access Logs? Formats, Fields, and Sources

An access log is a file that records every request made to a resource, one line per request, capturing who asked, what they asked for, when, and how the server responded.

Pull the access log off a compromised web server and the intrusion is usually sitting in plain text. One source IP hitting /wp-login.php four hundred times in a minute, then a single 200 where every other attempt returned 401. A GET for /etc/passwd with a path-traversal string in the URI. A scanner's user-agent walking the whole site at 3 a.m. The access log does not interpret any of this. It records each request as a line and leaves the reading to you. That is exactly why it is the first artifact a responder reaches for.

An access log is a record of every request made to a resource: who asked, what they asked for, when, and how the system answered. Web servers write them, operating systems write them, cloud services write them. The lines look different across sources, but they answer the same questions. This guide covers what an access log is, the standard formats and the fields inside them, where access logs come from, what defenders do with them, and the common fields you will parse in an investigation. It is written for the people who live in these files: SOC analysts triaging an alert, threat hunters building a baseline, DFIR responders reconstructing what an attacker touched.

What is an access log?

An access log is a file in which a server appends one line per request it receives, recording the request and its outcome. The unit is the request, not the session and not the user. A single page load can produce a dozen lines, one for the HTML and one for each image, script, and stylesheet the browser fetches. Each line is written after the server finishes handling the request, so the log is an after-the-fact record of what happened, in the order it happened.

The defining trait of an access log is that it captures requests regardless of whether they succeeded. A request that returned 404 Not Found, a request blocked with 403 Forbidden, a malformed request the server rejected, all of them land in the log next to the requests that worked. That is what separates an access log from an application log, which records what the application itself did internally (a payment processed, an exception thrown), and from an audit log, which records security-relevant changes (a permission grant, a config edit). The access log sits at the front door and writes down everyone who knocked, whether or not the door opened.

For a defender, that property is the whole point. An attacker probing for a vulnerable endpoint generates a trail of 404s and 403s before the one request that works. The failures are evidence. A log that recorded only successful access would erase the reconnaissance and leave you with just the breach.

The standard access log formats

Access log · anatomy of one combined-format line
One request, one line, eight fields
A traversal probe in plain text. The status code says it failed, so the line is evidence of reconnaissance, not a breach.
203.0.113.5 - - [18/Jun/2026:02:14:07 +0000] "GET /admin/../../etc/passwd HTTP/1.1" 404 0 "-" "python-requests/2.31.0"
SOURCE IP
203.0.113.5
The client that made the request. Your pivot point.
TIMESTAMP
02:14:07 +0000
When it arrived. Off-hours, 18/Jun/2026.
REQUEST LINE
GET /admin/../../etc/passwd
Method, resource, protocol. The ../ is path traversal.
STATUS · BYTES
404 · 0
Not found, zero bytes sent. The traversal failed.
USER-AGENT
python-requests/2.31.0
A scripted client, not a browser. A tool fingerprint.
Why the access log keeps failures The 404 is the point. An access log records blocked and failed requests next to successful ones, so the reconnaissance survives. The next line from this IP returning 200 with a large response is the difference between an attempt and a breach.

Most web access logs follow one of two formats that Apache HTTP Server defined and the rest of the ecosystem adopted. Knowing the two is enough to read the majority of web logs you will ever open.

Common Log Format

The Common Log Format (CLF) is the older and leaner of the two. Apache's LogFormat directive defines it as:

LogFormat "%h %l %u %t \"%r\" %>s %b" common

Each percent directive maps to one field. A real CLF line, the example from Apache's own documentation, looks like this:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

Read left to right: 127.0.0.1 is the client IP (%h). The first - is the RFC 1413 identity (%l), almost always a dash because nobody runs identd. frank is the userid from HTTP authentication (%u), a dash when the request is unauthenticated. [10/Oct/2000:13:55:36 -0700] is the timestamp (%t). "GET /apache_pb.gif HTTP/1.0" is the request line (%r): method, resource, protocol. 200 is the status code the server returned (%>s). 2326 is the response size in bytes, excluding headers (%b).

Seven fields, one request. That is the floor for a useful access log.

Combined Log Format

The Combined Log Format is the Common Log Format plus two fields, and it is the default most administrators actually run because the two extra fields carry the context CLF drops. Apache defines it as:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined

The same frank line in Combined Format adds the referer and user-agent on the end:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"

%{Referer}i is the page the client came from. %{User-agent}i is the client's self-reported browser or tool string. Both are attacker-controlled and both are useful anyway: the user-agent is where automated scanners announce themselves (sqlmap, nikto, python-requests, a curl version), and the referer can reveal where a malicious link was hosted. Nginx and IIS produce close equivalents, and any of these formats can be customized field by field, which is why parsing should key on the configured format rather than assuming the defaults.

Common access log fields

Across web servers, cloud load balancers, and OS access records, the same handful of fields recur. These are the ones an investigation leans on.

Field What it records Why a defender cares
Source IP The client address that made the request Pivot point: group activity by IP, match against threat intel, spot one host hammering an endpoint
Timestamp When the request was received or answered Builds the timeline; correlates with other logs; reveals off-hours activity
Request line / method The HTTP verb, resource path, and protocol Shows what was targeted; POST to a login, GET with traversal strings, unusual paths
Status code The server's response code 2xx succeeded, 3xx redirected, 4xx client error, 5xx server error; bursts of 401/403/404 are recon
Bytes sent Size of the response Large responses on a normally small endpoint can flag data exfiltration
User-agent Client's self-reported software Identifies scanners and scripted tools; anomalous agents stand out from real browsers
Referer The page the request came from Context on how a resource was reached; can expose a malicious origin
Username Authenticated user, when present Ties a request to an account; a key field once authentication is in play

The status code field carries more weight than its three digits suggest. RFC 9110, the current HTTP semantics specification, groups codes into five classes by their first digit: 1xx informational, 2xx success, 3xx redirection, 4xx client error, and 5xx server error. In a hunt, the distribution matters as much as any single code. A normal endpoint returns mostly 2xx. A run of 4xx from one IP is someone trying things that do not exist or that they are not allowed to reach. A spike in 5xx can mean an attacker found an input that crashes the application, often the first sign of a working exploit attempt.

Where access logs come from

Access logs are not a web-server-only artifact. Three broad sources produce them, and a real investigation usually pulls from more than one.

Web and proxy servers. This is the canonical access log. Apache writes to access_log (default path /var/log/httpd on RHEL-family systems, /var/log/apache2 on Debian and Ubuntu). Nginx writes to /var/log/nginx. IIS writes W3C-format logs under %SystemDrive%\inetpub\logs\LogFiles\W3SVC on Windows. Reverse proxies, CDNs, and API gateways produce the same kind of record for traffic passing through them.

Operating systems. The OS keeps its own access records of who connected and who logged in. On Linux that includes the authentication records behind last and lastlog (successful and recent logins) and the auth logs that record SSH and sudo access. On Windows, the Security event log records logon and logoff events (the 4624 / 4625 family) and object-access events. These answer "who got onto the box," where the web access log answers "who hit the service."

Cloud services. Cloud providers expose access logging as a configurable feature, and the defaults matter. Amazon S3 server access logging records each request to a bucket as a single space-delimited line with the bucket owner, bucket name, request time, requester, remote IP, operation, response status, and error code, among other fields. Application Load Balancer access logs capture each request to the load balancer, including requests that never reached a target, with fields like the request timestamp in ISO 8601, the client IP and port, the target it was routed to, the load balancer status code, the target status code, bytes received and sent, the request line, and the user-agent. One detail to internalize: ALB access logs are an optional feature that is disabled by default, and when enabled they are written as compressed files to an S3 bucket every five minutes per load balancer node. If nobody turned them on before the incident, they do not exist for the window you need. Misconfigured or disabled logging is itself one of the most common AWS misconfigurations a responder runs into.

What defenders do with access logs

Access logs earn their keep in four jobs, and most SOC and DFIR work touches all four.

Detection. Patterns in access logs are signal. Brute-force login attempts show as many POSTs to an auth endpoint from one IP with a string of 401s. Web scanning shows as one source walking hundreds of paths, most returning 404. SQL injection and path traversal show as suspicious strings in the request URI. Credential-stuffing shows as logins succeeding from IPs and user-agents that never appeared before. None of these need a new tool to see; they need the log to exist and someone to query it.

Forensics and incident response. After a confirmed intrusion, the access log is how you scope it. You take the attacker's known IP or user-agent and pull every request it made, which reconstructs the path: the endpoint they probed, the one that worked, the files they pulled, the size of each response. Because the log records failures too, you also see the reconnaissance that preceded the breach, which often reveals what else they tried and where else to look.

Baselining and threat hunting. Before you can spot the anomaly you have to know normal. Access logs over time establish the baseline: which endpoints get traffic, from which networks, at which hours, with which user-agents. A hunt is then a search for deviation, a new admin endpoint suddenly receiving requests, a geography that never appeared before, a user-agent that belongs to no real client.

Feeding the SIEM. Individually these logs are text files on scattered hosts. Collected, parsed, and normalized into a SIEM, they become queryable at scale and correlatable with other telemetry. The access log entry that means little alone (200 on a login from a new IP) becomes an alert when the SIEM ties it to a failed-login burst minutes earlier from the same source. Turning raw access logs into answers is the core of log analysis, and access logs are among the highest-value inputs to it.

Reading an access log line in practice

Put the pieces together on one line. Take a Combined Format entry like this:

203.0.113.5 - - [18/Jun/2026:02:14:07 +0000] "GET /admin/../../etc/passwd HTTP/1.1" 404 0 "-" "python-requests/2.31.0"

Field by field: the source is 203.0.113.5. There is no identd and no authenticated user (both dashes). The timestamp is just after 2 a.m. UTC. The request is a GET whose URI contains ../../, a path-traversal attempt reaching for /etc/passwd. The server answered 404 and sent 0 bytes, so the traversal did not work this time. The referer is empty. The user-agent is python-requests/2.31.0, a scripted client, not a browser.

One line, and you already have an IP to pivot on, an attack technique to name, a confirmation it failed, and a tool fingerprint to hunt for elsewhere. Now imagine the next line from the same IP returns 200 with a five-megabyte response. That is the difference between an attempt and a breach, and the access log is where you see it.

The bottom line

An access log records every request to a resource, one line each, including the requests that failed. Web servers write them in the Common or Combined Log Format, operating systems write logon and connection records, and cloud services write them when you enable the feature. The fields are consistent enough to read anywhere: source IP, timestamp, request line, status code, response size, and, in the richer formats, user-agent and referer.

For a defender the value is in what the log does not filter out. The reconnaissance, the failed probes, the blocked attempts, and the one request that finally worked all sit in the same file, in order. That is what turns an access log from a server housekeeping artifact into the first thing you reach for when you need to know who hit the service, what they were after, and whether they got it. The only failure mode that matters is the log that was never written, so confirm logging is on before you need it.

Frequently asked questions

What is an access log?

<p>An access log is a file that records every request made to a resource, one line per request, capturing who made the request, what they asked for, when, and the server's response. Web servers, operating systems, and cloud services all produce them. Crucially, an access log records failed and blocked requests alongside successful ones, which is what makes it valuable for spotting reconnaissance and attacks.</p>

What is the difference between Common Log Format and Combined Log Format?

<p>The Common Log Format (CLF) records seven fields: client IP, identity, userid, timestamp, request line, status code, and response size. The Combined Log Format adds two more fields on the end, the referer and the user-agent. Combined is the format most administrators run because the user-agent and referer carry context that is useful for spotting scanners and tracing where requests came from.</p>

What fields are in an access log?

<p>The recurring fields are the source IP, the timestamp, the request line (HTTP method, resource path, and protocol), the status code, and the response size in bytes. The Combined Log Format and most cloud access logs add the user-agent and referer, and authenticated requests include a username. Cloud load balancer logs extend this with fields like target address, processing times, and TLS details.</p>

What is the difference between an access log and an audit log?

<p>An access log records requests to a resource and how the server responded, including failed and blocked attempts. An audit log records security-relevant events and changes, such as permission grants, configuration edits, or account modifications. Access logs answer "who tried to reach what," while audit logs answer "who changed what." Both matter in an investigation, but they capture different things.</p>

Where are access logs stored?

<p>On Linux, Apache writes to <code>/var/log/httpd</code> on RHEL-family systems and <code>/var/log/apache2</code> on Debian and Ubuntu, and Nginx writes to <code>/var/log/nginx</code>. On Windows, IIS writes logs under <code>%SystemDrive%\inetpub\logs\LogFiles\W3SVC</code>. Cloud access logs are delivered to storage you configure, such as an Amazon S3 bucket for S3 server access logs and Application Load Balancer logs.</p>

Are cloud access logs enabled by default?

<p>Often not. Amazon S3 server access logging and Application Load Balancer access logs are both optional features that are disabled by default and must be explicitly enabled. If logging was never turned on, no record exists for the time window you need during an incident, which is why disabled or misconfigured logging is one of the most common and costly gaps a responder finds.</p>

Practice track
SOC Analyst Tier 1
Build your foundational skills to monitor, detect, and escalate security alerts. This track includes essential tools, basic log analysis, and introductory incident response labs.
Browse SOC Analyst Tier 1 Labs →