Building a High-Fidelity Detection Library in Splunk: From Noisy Alerts to Actionable Intelligence

The average enterprise SOC receives thousands of security alerts per day. Most analysts can thoroughly investigate somewhere between 20-30% of them. The rest sit in the queue, aging out or getting bulk-dismissed, which means a meaningful fraction of real threats are going uninvestigated every single day. The problem is almost never the number of threats — it is the ratio of signal to noise. When every correlation search fires hundreds of times per day on benign activity, analysts stop trusting the alerts. When analysts stop trusting alerts, they stop investigating them with urgency. That is how organizations get breached by activity that was technically detected.

This article is about fixing that problem systematically. Not by disabling detections, but by engineering them properly — using Risk-Based Alerting to accumulate context before firing, writing correlation searches that enrich before they alert, putting detection content under version control, and measuring quality with the same rigor you would apply to any production software. Splunk Enterprise Security 8.0+ provides native tooling for all of this. Most deployments use a fraction of it.

What Makes a Good Correlation Search

A correlation search that fires on a single event is almost always wrong. A single failed logon is noise. A single PowerShell execution is noise. A single lateral movement attempt might be a pentest. The events that matter are patterns — and patterns require context, timing, and thresholds.

Before writing a new correlation search, ask these questions:

What is the base rate? How often does this event occur in your environment normally? If it occurs 500 times per day on legitimate activity, firing an alert on every occurrence is not detection — it is noise generation.
What makes the event suspicious, not just present? Process creation is not suspicious. certutil.exe -decode downloading a file to %TEMP% on a domain controller at 2 AM from a user account that has never run certutil before — that is suspicious. The detection must encode that specificity.
What context is needed to triage it? An alert that fires without user context, host criticality, or threat intel enrichment forces the analyst to go gather that information manually. That is 5-10 minutes of work per alert that should be done automatically before the alert is created.
What is the expected false positive rate after tuning? If you cannot get a correlation search below 10-15% false positives after a 2-week baseline period, either the detection logic is too broad or the data quality is insufficient.

Risk-Based Alerting: The Architecture Shift That Changes Everything

Traditional alerting is binary: an event matches a rule, a notable is created, an analyst investigates. Risk-Based Alerting (RBA) in Splunk Enterprise Security replaces this with a risk accumulation model. Individual correlation searches no longer create notables directly — they assign risk scores to entities (users, hosts, IP addresses). Only when an entity's accumulated risk score crosses a threshold over a defined time window does a single high-fidelity notable get created.

The mechanics: each correlation search is configured as a Risk Rule rather than a Notable Rule. When it fires, it writes a risk event to the risk index with:

The risk object (user, host, or IP)
A risk score (typically 10-80 based on technique severity)
MITRE ATT&CK technique annotations
The raw event context

Two built-in detection searches then monitor the risk index: "ATT&CK Tactic Threshold Exceeded" (multiple attack techniques against one entity in 7 days) and "Risk Threshold Exceeded" (cumulative risk score over a configurable window). Analysts only see a notable when a risk story has developed — not when an individual suspicious event fires.

The practical impact: instead of 200 individual notables per day from the same 15 noisy correlation searches, an RBA deployment might generate 20-30 notables per day, each representing an entity that has accumulated risk from multiple independent detections. The true positive rate increases dramatically because the notable requires corroboration from multiple signals before being created.

A Concrete RBA Implementation Example

Consider a lateral movement scenario. Three separate correlation searches each fire individually and could be noise on their own:

Remote service creation via sc.exe (Risk: 40, ATT&CK: T1021)
SMB access to Admin$ share from a non-admin workstation (Risk: 30, ATT&CK: T1021.002)
LSASS memory access by a non-standard process (Risk: 60, ATT&CK: T1003.001)

Under traditional alerting, each fires separately and the analyst may dismiss them individually. Under RBA, if all three fire against the same host within 24 hours, the host accumulates a risk score of 130+ against two distinct ATT&CK techniques, triggering a single high-fidelity notable labeled with all three contributing events. Context is pre-assembled. Investigation starts from a complete picture.

Detection-as-Code: SPL in Git

Correlation searches are code. They should be treated like code — stored in version control, reviewed before deployment, tested against known-good and known-bad data, and deployed through a pipeline rather than directly in the Splunk UI. Most organizations do none of this. Changes are made directly in Splunk's saved searches, there is no audit trail of what changed or why, and rolling back a broken detection means manually remembering what the SPL looked like before.

The architecture for detection-as-code in Splunk:

Store SPL in Git: Each correlation search lives as a YAML or conf file in a Git repository. The file contains the SPL, scheduling, risk annotations, and metadata (MITRE technique, severity, data sources required).
Use Sigma as an intermediate format: Sigma rules are vendor-agnostic detection definitions that can be converted to Splunk SPL using sigma-cli with the Splunk backend. This lets detection engineers write once and deploy across SIEM platforms. Sigma rules convert to savedsearches.conf stanzas for Splunk.
CI/CD pipeline for deployment: A GitLab CI or GitHub Actions pipeline validates syntax, runs the converted SPL against test data (using Splunk's Attack Range or a separate test environment), and deploys approved searches to production via the Splunk REST API or a packaged Splunk app (TA).
Detection versioning in ES 8.0+: Splunk Enterprise Security 8.0 introduced native detection versioning, allowing engineers to save new versions of a detection without overwriting the prior version and roll back with a single click. Enable this feature for all customer-owned detections.

Example savedsearches.conf stanza committed to Git:

[Detect Certutil Download - Red Hound]
search = index=windows EventCode=4688 \
  (CommandLine="*certutil*-decode*" OR CommandLine="*certutil*-urlcache*") \
  | eval risk_score=if(like(ParentImage,"%\\domain controllers\\%"),80,50) \
  | lookup asset_lookup ip as dest_ip OUTPUT asset_criticality \
  | lookup threat_intel_lookup src_ip OUTPUT is_known_bad \
  | table _time, host, user, CommandLine, ParentImage, risk_score, asset_criticality
dispatch.earliest_time = -15m
dispatch.latest_time = now
cron_schedule = */15 * * * *
action.risk = 1
action.risk.param._risk_score = 50
action.risk.param._risk_object = user
action.risk.param._risk_object_type = user
action.risk.param.mitre_technique_id = T1140

The key principle: every change to a detection goes through a pull request, gets reviewed by a second engineer, and is tested before it reaches production. This eliminates the "someone broke the detection at 3 PM on a Friday" scenario that is endemic to environments where searches are edited directly in the UI.

Starting Point: Splunk Security Essentials and ESCU

If you are building a detection library from scratch, start with Splunk Security Essentials and the Enterprise Security Content Update (ESCU) rather than writing everything from scratch. Splunk ES 8.0 ships with over 1,700 detections organized into 225 analytic stories. Not all of them will apply to your environment, but a substantial subset will, and they are written to RBA standards with MITRE ATT&CK annotations pre-populated.

The workflow for adopting ESCU detections:

Enable a detection in "Development" mode — it runs but generates no alerts, only risk events in a staging risk index.
Run for 7-14 days and review the risk events generated. What percentage are against known-good activity? What assets are triggering it?
Use the built-in filter macro pattern (<detection_name>_filter) to add exceptions for confirmed false positive patterns without modifying the core SPL.
Promote to "Production" after false positive rate is acceptable.

Every ESCU detection includes a filter macro stub. To exclude a specific user or process from a detection:

# In macros.conf — adds exclusion to the detection filter without modifying SPL
[detect_certutil_download_filter]
definition = NOT (user="svc_backup" OR CommandLine="*certutil* -verify*")

Three Example SPL Queries Worth Building On

1. Detecting Living-off-the-Land Binary Abuse

index=windows EventCode=4688
| where match(CommandLine, "(?i)(certutil|mshta|wscript|cscript|regsvr32|rundll32)")
  AND match(CommandLine, "(?i)(http|ftp|\\\\|base64|-decode|-urlcache)")
| lookup asset_lookup ip as dest_ip OUTPUT asset_criticality, business_unit
| where asset_criticality="high" OR asset_criticality="critical"
| stats count, values(CommandLine) as commands, dc(host) as host_count by user
| where count > 3 OR host_count > 1
| sort -count

2. Detecting Anomalous Authentication Patterns

index=windows EventCode=4624 Logon_Type=3
| bucket _time span=1h
| stats dc(src_ip) as unique_sources, count as logon_count by user, _time
| eventstats avg(unique_sources) as avg_sources, stdev(unique_sources) as stdev_sources by user
| where unique_sources > (avg_sources + (3 * stdev_sources))
| lookup identity_lookup user OUTPUT department, manager, account_type
| where account_type != "service"
| table _time, user, unique_sources, avg_sources, department, manager

3. Detecting Bulk Mailbox Access Post-Authentication

index=o365 sourcetype=o365:management:activity Operation=MailItemsAccessed
| bucket _time span=5m
| stats dc(ClientIPAddress) as src_count, count as item_count by UserId, _time
| where item_count > 100 AND src_count > 1
| lookup entra_risky_users_lookup UserId OUTPUT risk_level, risk_reason
| eval is_aitm_indicator=if(risk_level="high" AND src_count > 2, "yes", "no")
| where is_aitm_indicator="yes"
| table _time, UserId, item_count, src_count, risk_level, risk_reason

Measuring Detection Quality

A detection library without quality metrics is a guess. You need to measure performance to improve it. The three metrics that matter most:

True Positive Rate (TPR): Of all notables created, what percentage resulted in a confirmed security incident or policy violation? Track this by analyst disposition in the ES incident review workflow. A well-tuned RBA deployment should sustain 60-80% TPR on high-severity notables.
Mean Time to Detect (MTTD): From the time a malicious action occurs to the time a notable is created. Measure this during red team exercises and tabletop simulations. If your correlation searches have a 15-minute scheduling window, your minimum MTTD for any detection is 15 minutes — which has real implications for fast-moving attacks.
Coverage Gaps: Map your active detections against MITRE ATT&CK. Splunk Security Essentials provides a native ATT&CK heatmap visualization that shows which techniques you have coverage for. Techniques with no coverage and high threat actor usage in your sector are your highest-priority gaps.

Review these metrics monthly. Detections that have not fired in 90 days but are scheduled and consuming search concurrency should be reviewed — either the behavior they detect never occurs in your environment (possibly a coverage gap worth validating), or the data source they depend on is no longer being ingested.

The Tuning Workflow

New detections always start loud. The tuning workflow is not optional — it is the process by which a noisy search becomes a reliable one.

Baseline period (7-14 days): Run the search in shadow mode (risk events only, no notables) and collect all results.
Cluster false positives by pattern: Group false positive results by user, host, process, or other attributes. Patterns are usually obvious — a specific IT admin account, a patch management tool, a known-good service.
Add exceptions via filter macros: Never modify the core detection SPL to add exceptions. Use the filter macro pattern so exceptions are tracked separately and can be reviewed independently.
Threshold adjustment: If the detection is behavioral (e.g., "more than N failed logons"), set the initial threshold high and walk it down as you understand the distribution of legitimate activity.
Promote and monitor: Move to production and monitor the false positive rate weekly for the first 30 days.

The difference between a SOC that is overwhelmed by alerts and one that operates efficiently is almost entirely attributable to detection engineering discipline. The tooling in Splunk Enterprise Security exists to support this workflow. Building a library that analysts trust — and that catches real attacks — is an engineering effort, not a deployment step.

Is your detection library generating signal or noise?

We build and tune Splunk Enterprise Security detection libraries for organizations that are drowning in false positives — designing Risk-Based Alerting implementations that give your analysts something they can actually act on. Book a session with our team.

Book a Session