SIEM

Splunk on a Budget: How to Cut Log Volume by 60% Without Losing Visibility

Splunk licenses are priced by daily ingestion volume. If you are paying for a 100 GB/day license and ingesting 100 GB of data, the question is not how much data you have — it is whether all of it is doing useful work. Industry benchmarks consistently show that 70-80% of the data organizations send to Splunk is never queried by a search, never triggers a detection, and never appears in a dashboard. It just burns license quota and generates storage costs. You are essentially paying enterprise SIEM prices to archive noise.

This article is about fixing that. Not by degrading your security visibility — by being precise about what belongs in Splunk and what does not. The goal is not to delete logs; it is to route them intelligently, compress them efficiently, and store expensive data only where it earns its keep. Organizations that implement these techniques regularly achieve 40-60% reductions in daily ingestion volume without any reduction in detection fidelity.

Step 1: Find Where Your License Is Going

Before you filter anything, you need to know what is actually consuming your license. Splunk's built-in License Usage Report (Settings > Licensing > Usage Report) shows you ingestion volume by index, source type, host, and source over rolling 30 or 60-day windows. Start there.

Run this search to get a ranked breakdown of ingestion by source type over the last 7 days:

index=_internal source=*license_usage.log type=Usage
| stats sum(b) as bytes by st
| sort -bytes
| eval GB = round(bytes/1024/1024/1024, 2)
| rename st as sourcetype
| table sourcetype GB
| head 25

Do the same by host to identify individual machines that are outliers:

index=_internal source=*license_usage.log type=Usage
| stats sum(b) as bytes by h
| sort -bytes
| eval GB = round(bytes/1024/1024/1024, 2)
| rename h as host
| table host GB
| head 25

What you are looking for are the sources that consume disproportionate volume relative to their security value. Common culprits you will find in almost every environment:

  • Windows Verbose Event Logs — Event IDs 4688 (process creation without command line), 5156/5158 (Windows Filtering Platform allow events), and 4634/4647 (logoff events) can constitute 30-40% of total Windows event ingestion with minimal detection value on their own.
  • DNS Query Logs — Full DNS query logging generates enormous volume. A busy environment can push 20+ GB/day from DNS alone. The vast majority is legitimate recursive lookups with no threat relevance.
  • Firewall Allow Traffic — Permitted east-west traffic between internal segments is rarely useful for detection but often forwarded in its entirety. Firewall deny logs are far more valuable per gigabyte.
  • Application DEBUG and TRACE logs — Dev teams forward application logs and forget to filter verbosity levels. DEBUG-level application logs can be 10x the volume of ERROR and WARN logs, with essentially zero security value.
  • Infrastructure health checks and heartbeats — Load balancers, monitoring agents, and health probes generate constant low-value traffic that adds up quickly.

Step 2: Filter at Index Time — Not at Search Time

The cardinal rule of Splunk cost optimization: if you filter data during a search, you already paid for it. If you filter data before it reaches the indexer, you never pay for it at all. Data that is dropped via nullQueue at index time does not count against your license.

Index-time filtering uses props.conf to match events and transforms.conf to route them. Here is the pattern:

In props.conf (match on source type):

[WinEventLog:Security]
TRANSFORMS-drop-low-value = drop-4634-logoff, drop-5156-fw-allow

In transforms.conf (define the drop rules):

[drop-4634-logoff]
REGEX = EventCode=4634
DEST_KEY = queue
FORMAT = nullQueue

[drop-5156-fw-allow]
REGEX = EventCode=5156
DEST_KEY = queue
FORMAT = nullQueue

For DNS, you likely want to keep NXDOMAIN responses (useful for detecting DGA and C2 beaconing) but drop successful recursive resolutions for known-good internal domains:

[drop-dns-internal-success]
REGEX = QueryType=A.*corp\.internal\|corp\.local
DEST_KEY = queue
FORMAT = nullQueue

For application logs, use SEDCMD in props.conf to strip verbose content before indexing, reducing event size even for events you keep:

[source::*/app/logs/*.log]
SEDCMD-strip-debug = s/\bDEBUG\b.*$//g
TRANSFORMS-drop-debug = drop-app-debug-lines

[drop-app-debug-lines]
REGEX = ^\s*$
DEST_KEY = queue
FORMAT = nullQueue

Alternatively, use the modern Ingest Actions feature (available in Splunk Enterprise 9.0+ and most Splunk Cloud tiers), which provides a GUI for authoring these rules. Ingest Actions supports filter, mask, and route rules with a preview panel that shows you exactly how many events would be affected before you deploy. It is particularly useful for teams that do not want to hand-edit conf files in production.

Step 3: Route vs. Drop — Know the Difference

Not every low-value event should be deleted permanently. Some data has compliance or forensic value even if it has no daily operational value. For this category, the right answer is routing to cheaper storage rather than dropping entirely.

Splunk's SmartStore (available in Enterprise) lets you configure indexes to use Amazon S3, Azure Blob, or Google Cloud Storage as the storage backend for warm and cold buckets. Warm and cold data lives in object storage at a fraction of on-premises disk cost. Federated Search (available in Splunk Cloud and Enterprise) lets you query data stored in S3 directly without ingesting it into Splunk at all — zero license consumption for that data.

The practical architecture: define a tiered data strategy for each source type.

  • Hot (full indexing, fast search): Security-critical event types, authentication logs, EDR telemetry, IDS/IPS alerts, Active Directory events — everything your detections run against daily.
  • Warm/Cold (SmartStore, cheaper storage): Firewall allow logs, DNS successful resolutions, application INFO logs, VPN usage logs — available for investigation but not actively searched.
  • External (S3 with Federated Search or no ingestion): Compliance archives, audit trails, historical firewall data older than 90 days — searchable on demand, never counted against your license.

Step 4: Use Summary Indexing for Expensive Scheduled Searches

If you have scheduled searches that run every hour against large time windows to produce metrics or trends, those searches consume significant concurrent search slots and CPU. Summary indexing pre-computes the results of expensive searches and stores aggregated metrics in a dedicated summary index. Future searches run against the summary index (small, fast) rather than the raw event index (large, slow).

Example: instead of running a scheduled search every hour that scans 7 days of authentication logs to count failed logons per user, run it once and store the hourly count in a summary index:

index=windows EventCode=4625
| stats count as failed_logons by user, hour
| collect index=summary source="failed_logon_hourly"

Your dashboard or detection then queries index=summary source="failed_logon_hourly" — millisecond response times against a tiny dataset rather than multi-minute scans of raw logs.

Summary indexing is also the right approach for computing expensive baselines: average number of processes per endpoint per hour, typical data transfer volumes per user, standard login times by account. These baselines feed behavioral detections without requiring real-time full-scan searches.

Step 5: Accelerate Data Models, Scope Them Tightly

If you use Splunk Enterprise Security or any app built on the Common Information Model (CIM), data model acceleration is running in your environment. By default, accelerated data models search all indexes (index=* OR index=_*). This is almost always wrong for your environment and wastes significant compute resources.

Scope each data model to only the indexes that contain relevant data. Use the CIM Configuration app in Splunk ES to edit the cim_<datamodel>_indexes macros:

# Example: limit Authentication data model to only relevant indexes
# Edit in: Splunk ES CIM Configuration > Data Model Setup
# Macro: cim_Authentication_indexes
# Change from: index=* OR index=_*
# Change to: (index=windows OR index=linux OR index=okta OR index=entra)

Also set the backfill time for data model acceleration to 4-12 hours rather than the default. This limits how many index buckets the acceleration process scans on each cycle, reducing background CPU consumption significantly without affecting search performance for recent data.

Step 6: Leverage Metrics Indexes for Infrastructure Telemetry

CPU utilization, memory usage, network throughput, and disk I/O from infrastructure monitoring tools (Prometheus, Datadog, Telegraf) should live in Splunk metrics indexes, not event indexes. Metrics indexes use a compressed columnar storage format that is 10-50x more space-efficient than event indexes for this type of data. Searches against metrics indexes using the mstats command are also significantly faster.

# Example: query CPU utilization from a metrics index
| mstats avg(_value) as avg_cpu WHERE index=infrastructure_metrics
  metric_name="cpu.percent" BY host span=5m
| timechart avg(avg_cpu) BY host

Converting infrastructure telemetry from event to metrics ingestion alone commonly saves 20-30% of total license volume in environments with heavy monitoring agent deployment.

Step 7: Build a Data Quality Scorecard

Log optimization is not a one-time project. Sources change, new applications get onboarded, and old filters become stale. You need a persistent mechanism to track data quality and catch license creep before it becomes a budget problem.

Build a simple data quality scorecard dashboard with the following panels:

  • Top 20 sources by ingestion volume (rolling 7 days) — Any source that moves into the top 20 unexpectedly should trigger a review.
  • Sources with active detections vs. sources with no associated searches — Sources that consume license but have no correlation searches, dashboards, or alerts referencing them are candidates for tiering or removal.
  • License usage trend (30-day): Daily GB by index, with 80% threshold alerts configured. Five license warnings in a 30-day period triggers a Splunk license violation.
  • New source types (last 7 days) — Catch undocumented sources being added without review.
# New source types added in the last 7 days
index=_internal source=*license_usage.log type=Usage
| stats min(_time) as first_seen by st
| where first_seen > relative_time(now(), "-7d@d")
| convert ctime(first_seen)
| rename st as sourcetype
| sort first_seen

Schedule this search to run weekly and send results to a designated Slack channel or email distribution list. When a new source type appears, the team reviews it against your data classification policy before it becomes an established (and unreviewed) budget item.

Index Lifecycle Management

Define explicit retention policies for every index. Splunk's default is to retain data until disk space runs out, which is not a policy — it is an accident waiting to happen. Set retention by index based on business need:

  • Security critical events (authentication, EDR, AD): 365 days hot/warm, frozen to cold storage for 2+ years
  • Firewall and network logs: 90 days hot/warm, frozen for 1 year
  • Application logs: 30 days hot, frozen for 90 days, then deleted
  • Infrastructure metrics: 90 days, then deleted

Configure frozen bucket handling to copy to SmartStore or an external path rather than simply deleting. That gives you a recovery path for forensic investigations without the cost of keeping everything in searchable warm storage.

Putting It Together

The 60% log volume reduction is achievable and common, but it requires treating data ingestion as an engineering discipline rather than a set-and-forget configuration. Start with the license usage report to find your biggest consumers. Apply index-time filtering for confirmed low-value event types. Tier compliance data to object storage. Convert infrastructure telemetry to metrics indexes. Scope data model acceleration to relevant indexes only. Build a scorecard to monitor the results and catch drift over time.

The outcome is not just cost savings — it is a faster, more responsive Splunk environment where searches run against smaller, cleaner datasets, detection engineers spend less time tuning noise, and your SOC can actually find the signal in what remains.

Splunk license costs eating your security budget?

We audit Splunk deployments for data ingestion efficiency, detection coverage gaps, and log quality — and deliver a concrete roadmap for reducing costs without compromising visibility. Book a session with our team.