What is SIEM? A SOC Analyst's Reference

C
CyberDefenders
Share this post:

What is SIEM? A SOC Analyst's Reference

โฑ 14 min read ยท Updated May 2025 ยท CCDL1 / CCDL2

Key points

  1. SIEM (Security Information and Event Management) is a platform that collects, normalizes, and correlates log data from across an environment to detect threats and support incident investigation.
  2. It works by ingesting logs from endpoints, network devices, identity systems, and cloud platforms, then applying correlation rules and behavioral analytics to surface anomalous activity.
  3. For SOC analysts, the SIEM is the primary investigation surface โ€” alert triage, threat hunting, and evidence collection all happen inside it.
  4. Leading platforms include Splunk Enterprise Security, Microsoft Sentinel, and Elastic Security โ€” all used directly in CyberDefenders labs.

A SIEM (Security Information and Event Management) platform is the centralized detection and investigation hub of a SOC. It ingests log data from endpoints, network devices, identity systems, and cloud services, normalizes that data into a common format, and applies correlation rules to surface suspicious activity that no single log source could reveal alone. Gartner coined the term in 2005 by combining Security Information Management (SIM) and Security Event Management (SEM) into a single category.

What is SIEM?

A SIEM (Security Information and Event Management) platform is the centralized detection and investigation hub of a SOC. It ingests log data from endpoints, network devices, identity systems, and cloud services, normalizes that data into a common format, and applies correlation rules to surface suspicious activity that no single log source could reveal alone. Gartner coined the term in 2005 by combining Security Information Management (SIM) and Security Event Management (SEM) into a single category.

For a defender, the SIEM is where alerts are born and investigations begin. An L1 analyst opens a case, runs queries against raw logs, pivots across data sources, and either closes the alert as a false positive or escalates with evidence. Without a functioning SIEM, none of that workflow exists โ€” analysts are blind to correlated attack patterns that span multiple systems over hours or days.

This article covers how a SIEM works technically, what log sources matter, how to write and evaluate detection logic, and where SIEM fits against SOAR and XDR.

Why SIEM matters

A single endpoint generating a suspicious process does not tell you much. That same process, correlated with a failed authentication 30 minutes earlier and an outbound connection to a known C2 IP, tells you everything. That correlation is only possible inside a SIEM.

The average cost of a data breach reached USD 4.88 million in 2024. Organizations with high-level security skills shortages paid USD 5.74 million on average โ€” significantly more than those with properly trained teams operating structured detection workflows. The SIEM is the platform those workflows run on.

Alert fatigue is the operational failure mode. A misconfigured SIEM generates thousands of low-fidelity alerts per day, burning analyst capacity on false positives. The IBM X-Force 2025 Threat Intelligence Index found that identity-based attacks account for 30% of all intrusions โ€” a category that only surfaces through SIEM correlation of Windows Event Logs authentication data, not through endpoint alerts alone.

SIEM also provides the log retention and search capability required for forensic investigation after a breach, and the compliance reporting required by PCI DSS, HIPAA, and SOC 2.

History and evolution of SIEM

Gartner analysts Mark Nicolett and Amrit Williams coined the term SIEM in 2005, describing a convergence of two existing product categories: Security Information Management (log collection and storage) and Security Event Management (real-time monitoring and alerting). Early SIEM platforms like ArcSight and Eniq were primarily compliance tools โ€” organizations deployed them to satisfy PCI DSS log retention requirements, not to hunt threats.

The 2010s forced a shift. APT campaigns like Aurora (2010) and the RSA SecurID breach (2011) demonstrated that signature-based detection failed against patient, low-and-slow attackers. SIEM vendors responded by adding behavioral analytics and threat intelligence feeds. The term UEBA (User and Entity Behavior Analytics) emerged around 2015 as a distinct capability increasingly folded into SIEM platforms.

Cloud migration broke the perimeter model that early SIEMs depended on. By 2020, cloud-native SIEMs โ€” Microsoft Sentinel (2019), Chronicle (2019) โ€” emerged as purpose-built alternatives to on-premises platforms, consuming cloud audit logs natively and scaling without hardware constraints. The current generation adds AI-assisted detection, risk-based alerting, and native SOAR integration, moving SIEM from a passive log store toward an active response platform.

How SIEM works

A SIEM processes data through five sequential stages:

  1. Data collection โ€” Agents, syslog forwarders, or API connectors pull logs from endpoints (Windows Event Logs, Linux syslog), network devices (firewall logs, DNS logs, NetFlow), identity systems (Active Directory, Okta, Entra ID), and cloud platforms (AWS CloudTrail, Azure Monitor, GCP Audit Logs).
  2. Normalization โ€” Raw logs arrive in dozens of different formats. The SIEM parses each log source and maps fields to a common schema (e.g., CIM in Splunk, ECS in Elastic). A Windows Event ID 4624 logon event and a Linux PAM authentication event both get mapped to a common authentication event type.
  3. Correlation โ€” The SIEM applies rules that match patterns across multiple events. A rule might fire when three failed logons (Event ID 4625) from the same source IP are followed within 60 seconds by a successful logon (Event ID 4624) โ€” a classic brute-force pattern. More advanced correlation spans multiple data sources: a new scheduled task created on a host (Event ID 4698) correlated with an anomalous outbound connection from that same host within the same time window.
  4. Alerting โ€” When a correlation rule fires, the SIEM creates an alert and routes it to the analyst queue. Risk-based alerting aggregates multiple low-fidelity observations into a single high-priority incident rather than generating individual alerts for each match.
  5. Investigation and storage โ€” Analysts query the underlying log data to build a timeline, pivot across data sources, and determine scope. Logs are retained for a defined period (typically 90 days hot, 1 year cold) to support forensic investigation and compliance audits.

A practical scenario: an EDR alert fires on mshta.exe execution. The analyst pivots to the SIEM, queries authentication logs for that hostname in the prior 24 hours, finds a successful Event ID 4624 Logon Type 3 with AuthPackage: NTLM from an unusual source IP, and expands the investigation to lateral movement.

Key components of SIEM

Component What it does for the defender
Log collector / forwarder Pulls logs from source systems and ships them to the SIEM ingestion pipeline; covers agents (Splunk UF, Elastic Agent) and agentless syslog
Normalization engine Parses raw log formats and maps fields to a common schema, making cross-source correlation possible
Correlation engine Applies detection rules across normalized events in real time; fires alerts when rule conditions are met
Alert queue / case manager Organizes fired alerts into analyst workqueues with severity, status, and assignment tracking
Search and query interface Lets analysts query raw and normalized log data using platform-specific languages (SPL, KQL, EQL, Lucene)
Threat intelligence integration Enriches events with external IOC context (known malicious IPs, domains, hashes) from feeds like MISP, VirusTotal, or commercial TI platforms
Dashboards and reporting Provides real-time security posture views and generates compliance reports for PCI DSS, HIPAA, SOC 2
UEBA module Establishes behavioral baselines per user and entity; surfaces deviations that rule-based detection misses

Key features of SIEM

Feature What it enables the analyst to do
Multi-source log ingestion Query a single platform instead of logging into 10 different systems to trace an attack chain
Real-time correlation rules Get alerted on attack patterns (brute force, lateral movement, C2 beaconing) as they happen, not hours later
Risk-based alerting Receive fewer, higher-fidelity alerts by aggregating related low-score observations into a single incident
Threat intelligence enrichment Instantly see whether an IP, domain, or file hash in an alert matches a known threat actor or campaign
Behavioral analytics (UEBA) Detect insider threats and account compromise that evade signature rules by deviating from established baselines
Historical search Reconstruct attack timelines days or weeks after initial compromise using retained log data
MITRE ATT&CK mapping Understand which tactic and technique a fired alert corresponds to, enabling faster triage decisions
Compliance reporting Generate audit-ready reports for PCI DSS Requirement 10, HIPAA ยง164.312, and SOC 2 CC7 without manual log review

SIEM use cases

Detecting lateral movement

An analyst receives a medium-severity alert for an unusual net use command execution. The SIEM correlates that event with a subsequent Event ID 4624 Logon Type 3 on a domain controller from the same source host 90 seconds later. The correlation surfaces a lateral movement chain that neither alert alone would have flagged.

Investigating phishing-triggered credential theft

A user reports a suspicious email. The analyst queries the SIEM for that user's authentication activity over the prior 6 hours, finds a successful Okta login from an unrecognized ASN, and correlates it with a file download from SharePoint minutes later. The SIEM's retention of identity and cloud audit logs makes the full scope visible in a single query session.

Threat hunting for C2 beaconing

A threat hunter writes a query looking for hosts making outbound connections to the same external IP at regular intervals (every 60 seconds, +/- 5 seconds) over a 4-hour window. The SIEM's historical search across DNS and proxy logs returns two hosts matching the pattern โ€” neither had generated an alert from existing rules.

Privileged account abuse detection

A UEBA rule fires on a service account that has never logged in interactively suddenly generating an interactive session (Event ID 4624 Logon Type 2) outside business hours. The SIEM enriches the alert with the account's historical baseline, the source workstation's risk score, and a threat intel hit on the destination IP the session later contacted.

SIEM vs SOAR vs XDR

Dimension SIEM SOAR XDR
Primary function Log aggregation, correlation, and detection Workflow automation and orchestration of response actions Unified detection and response across endpoints, network, and identity
Data scope Any and all log sources Consumes alerts from SIEM and other tools Curated telemetry from integrated security controls
Long-term log storage Yes โ€” core capability No No
Response automation Limited (alert routing) Core capability โ€” playbooks, ticketing, enrichment Built-in response actions (isolate host, block hash)
Detection language SPL, KQL, EQL, Lucene N/A (acts on existing alerts) Platform-specific, often abstracted
Primary user SOC analyst, detection engineer SOC analyst, IR engineer SOC analyst, L1/L2
Compliance reporting Yes No No
Typical deployment Standalone or with SOAR integration Paired with SIEM Standalone or alongside SIEM

How to detect with SIEM

The SIEM is the detection platform โ€” but detection quality depends entirely on what logs are onboarded and how rules are written.

Log sources to onboard first (priority order):

  • Windows Security Event Log (Event ID 4624, 4625, 4648, 4672, 4698, 4720, 4732)
  • Windows Sysmon (process creation, network connections, registry changes)
  • DNS logs (internal resolver query logs, not just firewall DNS)
  • Firewall/proxy logs (outbound connections with full URL and bytes)
  • Active Directory / Entra ID authentication logs
  • EDR telemetry (process tree, parent-child relationships)
  • Cloud audit logs (AWS CloudTrail, Azure Activity Log, GCP Audit)
  • Email gateway logs (sender, recipient, attachment hashes, click events)

Key detection patterns and indicators:

  • Brute force: 5+ Event ID 4625 from one source IP within 60 seconds, followed by Event ID 4624
  • Pass-the-Hash: Event ID 4624 Logon Type 3, AuthPackage: NTLM, source IP is a workstation (not a server or DC)
  • Kerberoasting: Event ID 4769 with TicketEncryptionType: 0x17 (RC4) requested for a service account
  • Scheduled task persistence: Event ID 4698 (task created) correlated with a new network connection from the same host
  • C2 beaconing: repeated outbound DNS queries to the same domain at fixed intervals with low TTL responses
  • Living-off-the-land: mshta.exe, wscript.exe, or certutil.exe spawned by winword.exe or outlook.exe

Detection logic structure (Sigma-compatible):

A well-formed SIEM detection rule contains: a log source definition, a filter on specific field values, a time window if correlation is required, and a threshold. For example, a Pass-the-Hash detection targets Security log, filters on EventID: 4624, LogonType: 3, AuthenticationPackageName: NTLM, and excludes known machine account patterns ($ suffix in username) and expected server-to-server sources.

Avoid writing rules that fire on any single field in isolation โ€” most attacker behaviors are only distinguishable from legitimate activity through field combination and context.

SIEM challenges

Onboarding the wrong log sources first. Teams that start with perimeter firewall logs but skip Windows Security Event Logs miss the majority of post-compromise detection signals. Identity-based attacks โ€” 30% of all intrusions โ€” are only visible in authentication logs.

Writing rules without a baseline. A rule for "unusual outbound port 443 connections" fires thousands of times per day in any organization with SaaS usage. Rules written without a behavioral baseline for what normal looks like on a given host or user generate noise that trains analysts to ignore alerts.

Skipping field normalization validation. Two different log sources map the same concept to different field names. If normalization is not validated end-to-end, a cross-source correlation rule silently fails โ€” no alert fires even when the attack pattern matches.

Alert queue depth without triage SLAs. A SIEM with no alert age or severity SLA accumulates thousands of unreviewed alerts. Attackers who trigger a low-severity alert on day one and escalate activity on day three are invisible if the day-one alert is never reviewed.

Treating SIEM as a compliance tool only. Organizations that configure SIEM exclusively for log retention and compliance reporting โ€” without active detection rules or analyst workflows โ€” have a storage system, not a detection platform.

SIEM best practices

  1. Onboard Windows Security Event Logs and Sysmon before any other source โ€” they provide the process execution, authentication, and network telemetry that most detection rules depend on.
  2. Enable Audit Process Creation with command-line logging via Group Policy (Computer Configuration > Windows Settings > Security Settings > Advanced Audit Policy > Detailed Tracking). Without command-line arguments, Event ID 4688 is nearly useless for detection.
  3. Map every detection rule to a MITRE ATT&CK technique. Unmapped rules cannot be evaluated for coverage gaps and create audit noise with no investigative value.
  4. Implement risk-based alerting: assign risk scores to individual observations and fire high-priority alerts only when a host or user accumulates a score threshold, not on every individual match.
  5. Write Sigma rules as your canonical detection format and compile them to platform-specific syntax (SPL, KQL, EQL). This keeps detections portable and version-controlled in Git.
  6. Set a maximum alert age SLA โ€” any alert older than 24 hours without analyst action should auto-escalate or auto-close with a documented reason.
  7. Validate log source coverage monthly using a MITRE ATT&CK data source heatmap. Identify which techniques have zero log coverage and prioritize onboarding the missing sources.
  8. Test detection rules against known-good attack simulations (Atomic Red Team, Caldera) before promoting to production. A rule that never fired in testing will not fire reliably in production.

SIEM frameworks and standards

Framework Reference Relevance
MITRE ATT&CK MITRE, continuously updated Maps detection rules to specific adversary techniques; ATT&CK data sources define which log types are required per technique
NIST SP 800-92 NIST, 2006 (under revision) Guide to computer security log management; defines log retention, collection, and protection requirements
CIS Control 8 CIS Controls v8, 2021 Audit Log Management control โ€” covers what to log, how long to retain, and how to review
PCI DSS Requirement 10 PCI SSC, v4.0 2022 Mandates log collection, 12-month retention (3 months immediately available), and daily log review for cardholder environments
Sigma Florian Roth / SigmaHQ Open detection rule format for writing portable SIEM detection logic independent of platform

The future of SIEM

AI-assisted detection is moving from marketing claim to functional capability. Platforms like Microsoft Sentinel and Splunk ES are integrating LLM-based natural language querying, letting analysts describe what they are looking for in plain English rather than writing SPL or KQL. This lowers the floor for L1 analysts but does not replace detection engineering โ€” someone still has to validate what the model surfaces.

The boundary between SIEM and SOAR is dissolving. Microsoft Sentinel includes native playbook automation; Splunk SOAR ships bundled with Splunk ES. The practical consequence for analysts is a unified triage-to-response workflow without context switching between platforms.

Cloud-native SIEM adoption is accelerating as on-premises Splunk and ArcSight deployments reach end-of-support milestones. Chronicle and Sentinel are ingesting cloud audit logs as primary data sources, not afterthoughts โ€” shifting the detection engineering skillset toward KQL and YARA-L rather than SPL.

Identity-centric detection is becoming a dedicated SIEM use case. The emergence of ITDR (Identity Threat Detection and Response) as a category reflects the volume of Entra ID and Okta attack paths that SIEM rules now need to cover. Detection engineers are building entire rule libraries around signin logs, conditional access failures, and OAuth token abuse patterns.

SIEM FAQ

What is SIEM used for in a SOC?

In daily SOC operations, a SIEM is the primary platform for alert triage, log investigation, and threat hunting. An L1 analyst receives alerts generated by the SIEM's correlation rules, queries the underlying log data to determine whether the alert represents a real threat, and either closes or escalates the case. An L2 analyst uses the SIEM's historical search to reconstruct attack timelines across multiple data sources.

What logs should I onboard into my SIEM first?

Start with Windows Security Event Logs and Sysmon โ€” they cover authentication, process execution, and network connections on endpoints, which is where most post-compromise activity is visible. Add DNS logs and firewall/proxy logs second. Cloud audit logs (AWS CloudTrail, Azure Activity Log) third. Do not start with perimeter firewall logs alone โ€” they provide network context but miss the identity and endpoint signals that most attacks leave behind.

What is the difference between a SIEM rule and a SIEM alert?

A rule is the detection logic โ€” it defines the conditions (field values, thresholds, time windows) that must be true for a match. An alert is the output โ€” it is created when a rule fires and is routed to the analyst queue. One rule can generate thousands of alerts if it is poorly tuned. Risk-based alerting reduces alert volume by aggregating multiple rule matches for the same entity into a single high-priority incident.

Does a SIEM replace an EDR?

No. An EDR provides deep endpoint telemetry โ€” process trees, memory analysis, file system changes โ€” that a SIEM does not generate on its own. The SIEM consumes EDR telemetry as one of many log sources and correlates it with network, identity, and cloud data. EDR is a data source and response tool; SIEM is the correlation and investigation platform. Most mature SOCs run both.

How can I practice SIEM investigation skills hands-on?

CyberDefenders labs are built around real SIEM platforms โ€” Splunk, Elastic, and Microsoft Sentinel โ€” with pre-loaded log datasets from actual attack scenarios. Working through a lab gives you the query experience, pivot technique, and alert triage workflow that reading about SIEM cannot build.

Tags:SIEMSOCCCDL1Detection EngineeringSplunkElasticCCDL2