AI Agents in the SOC: Automating Repetitive Security Operations Without Losing Control

The math in most SOCs does not work. The average enterprise generates thousands of security alerts per day. Industry research consistently shows that SOC analysts can thoroughly investigate somewhere between 20-25% of them. The rest sit uninvestigated, time-out, or get bulk-dismissed. When 75% of alerts go uninvestigated, you are not running a security operations center — you are running a very expensive noise filter. The threat actors know this. They count on it.

AI agents offer a genuine solution to the volume problem, but only if they are deployed with clarity about what they should and should not do autonomously. The organizations getting value from AI in their SOC right now are not the ones that replaced analysts with AI — they are the ones that delegated specific, well-defined, repetitive tasks to AI agents so analysts can spend their time on the investigations that actually require human judgment. That is the frame this article operates from.

The SOC Automation Maturity Model

Before deploying any AI capability in a SOC, establish where you are and where you are trying to go. There are three meaningful maturity levels:

Level 1: Assisted (AI as Co-Pilot)

The AI agent presents information and recommendations to the analyst, who retains all decision authority. The agent enriches an alert with threat intel, summarizes context, and suggests a severity rating — but the analyst reads that output and decides what to do. No automated actions occur without explicit human approval. This is the right starting point for every organization. It builds trust in the agent's output before expanding its authority.

Level 2: Semi-Autonomous (AI with Approval Gates)

The agent executes defined investigative steps automatically — querying SIEM, enriching IOCs, running playbook procedures — but requires analyst approval before any action that modifies the environment. Isolating an endpoint, disabling an account, or blocking an IP requires a human to review the agent's reasoning and click approve. The analyst's role shifts from investigation to decision review. This is where most mature SOC AI deployments should operate.

Level 3: Autonomous with Guardrails

The agent executes a defined subset of responses autonomously — containment actions within pre-approved playbooks, for a defined set of high-confidence scenarios, within business hours or with on-call approval for after-hours actions. This level is appropriate only after Level 2 has operated reliably for a meaningful period (typically 6-12 months) and the agent's confidence scoring has been validated against analyst review records. Very few organizations should operate at full Level 3 across all use cases.

The critical mistake organizations make is skipping directly to Level 3 before establishing trust in the agent's reasoning. An automated containment action based on a hallucinated finding or a misclassified alert can cause more damage than the original threat.

Use Case 1: Alert Triage and Initial Investigation

Alert triage is the highest-value, lowest-risk starting point for SOC AI agents. The agent receives an alert from the SIEM, enriches it with relevant context, and produces a structured triage summary that an analyst can act on in minutes rather than starting from scratch.

What the agent does automatically:

Queries SIEM for historical context on the involved entities (user, host, IP address) — prior alerts, authentication patterns, recent changes
Looks up involved IP addresses against VirusTotal, AbuseIPDB, and Shodan
Retrieves the user's identity context from Entra ID or Active Directory — job role, department, manager, recent login patterns
Checks the host's asset inventory record — business criticality, owner, patch status
Reviews related alerts for the same entities in the past 72 hours
Generates a severity recommendation with explicit reasoning

The output the analyst receives is not a raw SIEM alert — it is a structured investigation summary that a competent Level 1 analyst would have taken 25-35 minutes to assemble. The agent does it in under 2 minutes, consistently, for every single alert. The analyst reviews the summary and decides: escalate, close as false positive, or continue investigating.

Google's Gemini in Security (announced at RSAC 2025) implements exactly this pattern as an alert triage agent that autonomously performs dynamic investigations and provides verdicts in Google Security Operations. Dropzone AI, one of the commercial AI SOC platforms, reports investigation times of 3-10 minutes compared to 30-40 minutes for manual analysis — and critically, covers 100% of alerts rather than the 22% industry average for manual-only SOCs.

Use Case 2: IOC Enrichment

IOC enrichment is the most straightforward automation candidate in a SOC workflow. Given an IP address, domain, file hash, or URL, the agent queries a fixed set of threat intelligence sources and returns a consolidated verdict. No judgment required — just API calls and result normalization.

A basic IOC enrichment agent tool set:

ENRICHMENT_TOOLS = [
    {
        "name": "check_virustotal",
        "description": "Query VirusTotal for IP, domain, URL, or file hash reputation",
        "parameters": {
            "ioc_value": "string",
            "ioc_type": "enum: [ip, domain, url, file_hash]"
        }
    },
    {
        "name": "check_abuseipdb",
        "description": "Query AbuseIPDB for IP address abuse reports and confidence score",
        "parameters": {"ip_address": "string"}
    },
    {
        "name": "check_shodan",
        "description": "Query Shodan for open ports, services, and historical data for an IP",
        "parameters": {"ip_address": "string"}
    },
    {
        "name": "query_internal_threat_intel",
        "description": "Check internal MISP instance or threat intel platform for IOC matches",
        "parameters": {"ioc_value": "string"}
    },
    {
        "name": "check_passive_dns",
        "description": "Query passive DNS for historical domain-to-IP resolution records",
        "parameters": {"domain": "string"}
    }
]

The agent calls each relevant tool, receives the results, and generates a consolidated verdict: Malicious (multiple sources confirm, high confidence), Suspicious (one or more sources flag, low confidence), or Clean (no flags across all sources). The verdict includes the specific findings from each source and a confidence score. The analyst sees one structured output instead of having to query five different portals manually.

Important: the enrichment agent should never auto-block based on IOC lookups alone. VirusTotal flags produce false positives. AbuseIPDB scores fluctuate. A malicious verdict from an enrichment agent should flow to an analyst queue for review before any blocking action is taken. Auto-blocking based on threat intel lookups is a Level 3 decision that requires extensive false positive validation first.

Use Case 3: Playbook Execution Assistance

Incident response playbooks describe what to do when a specific alert type fires. In a manual SOC, an analyst reads the playbook, executes each step, records the results, and moves to the next step. Most of those steps are mechanical: query the SIEM for related events, check the endpoint's process list, look up the user's recent activity, dump the email headers.

An AI agent can execute mechanical playbook steps automatically and present the results in sequence. The analyst supervises the execution, reviews each step's output, and makes the judgment calls — does this process list look normal? Does this email header indicate spoofing? Should this account be disabled? The agent handles the execution; the analyst handles the interpretation and authorization.

Integration with SOAR platforms makes this practical. Splunk SOAR supports over 300 third-party tool integrations and 2,800+ automated actions, all accessible via API. Cortex XSOAR provides similar breadth. Splunk's DSDL (Data Science and Deep Learning) framework allows LLM function calling directly from SOAR playbooks, enabling agentic workflows where the LLM decides which SOAR actions to invoke based on intermediate results.

A phishing investigation playbook executed by an AI agent might proceed as follows:

Extract all IOCs from the email (URLs, sender domain, attachment hashes) — automated
Enrich all IOCs via the enrichment tools — automated
Check all recipients' mailboxes for similar emails in the past 7 days — automated
Determine if any recipients clicked links or opened attachments — automated
Check clicked recipients for post-click activity (new processes, outbound connections) — automated
[HUMAN REVIEW GATE]: Present findings and recommend: monitor, quarantine emails, or escalate to incident
Execute analyst-approved action — semi-automated with approval

Steps 1-5 take a skilled analyst 20-30 minutes. An agent completes them in 2-3 minutes, every time, without variance. The analyst reviews a complete picture and makes the decision in step 6.

Use Case 4: Report Generation

Incident reports, shift handoff summaries, and executive briefings are time-consuming to write. They draw on the same structured data that was collected during investigation — event timelines, affected systems, actions taken, remediation status. This is exactly the kind of structured-to-narrative conversion that LLMs do well.

An agent given the structured incident record can draft a technically accurate incident summary, an executive briefing for non-technical leadership, and a lessons-learned document in minutes. These drafts require analyst review and approval before distribution — the agent is drafting, not authoring. But reducing a 45-minute report writing task to a 5-minute review task is real analyst time recovered.

For ongoing threat intel, a daily digest agent can query configured threat intelligence feeds (TAXII servers, commercial TI APIs, vendor blog RSS feeds), summarize new developments relevant to your sector and technology stack, and deliver a morning briefing to the team. The analyst reads a 2-minute digest instead of spending 30 minutes reading raw feeds.

Building Guardrails: The Non-Negotiable Requirements

Every AI agent deployed in a SOC needs four guardrail categories implemented before it touches any production environment:

1. Human-in-the-Loop Approval for Destructive Actions

Define explicitly which actions are destructive and require human approval: account disablement, endpoint isolation, firewall rule changes, email quarantine, process termination. These actions can cause disruption if they are wrong. No AI agent should execute them without explicit analyst approval, regardless of the agent's confidence score. Implement approval queues in your SOAR platform with a clear audit trail — who approved, when, and based on what agent output.

2. Confidence Scoring with Fallback

Every AI agent decision should include a confidence score. When the agent's confidence falls below a defined threshold (typically 70-75%), the case routes directly to a human analyst with the agent's partial work attached. Do not let low-confidence agent outputs drive automated actions. Implement this as an explicit conditional in every agentic workflow.

# Confidence-gated action execution
def execute_with_approval_gate(action: str, reasoning: str, confidence: float):
    if confidence >= 0.85 and action in APPROVED_AUTONOMOUS_ACTIONS:
        # Execute autonomously and log
        result = execute_action(action)
        audit_log.record(action, reasoning, confidence, "autonomous", result)
        return result
    elif confidence >= 0.70:
        # Queue for analyst approval
        approval_queue.submit(action, reasoning, confidence)
        audit_log.record(action, reasoning, confidence, "queued_for_approval")
        return "PENDING_APPROVAL"
    else:
        # Route directly to analyst with full context
        analyst_queue.escalate(action, reasoning, confidence)
        audit_log.record(action, reasoning, confidence, "escalated_low_confidence")
        return "ESCALATED"

3. Audit Logging of Agent Decisions

Every agent action must be logged with: the triggering alert, the tool calls made and their results, the reasoning chain, the confidence score, the recommendation made, the action taken (or queued), and the identity of any human who approved an action. This is not optional. When something goes wrong — and occasionally it will — you need to reconstruct exactly what the agent did and why. Without a complete audit trail, post-incident analysis is impossible.

4. Scope Boundaries

The agent's service account should have read-only access to most systems and write access only to the specific systems it needs to take approved actions. Principle of least privilege applies to AI agents exactly as it applies to human users. An alert triage agent does not need write access to Active Directory. An IOC enrichment agent does not need access to your source code repository. Scope the agent's permissions to the minimum required for its defined function.

Data Privacy and LLM API Considerations

If you are sending alert data to a cloud LLM API (OpenAI, Anthropic, etc.), you are potentially sending PII, internal system names, IP addresses, and incident details to a third-party service. Evaluate this against your data classification policy and regulatory requirements before deployment.

Options to manage the risk:

Sanitization before API calls: Strip or hash identifiers that are not needed for the LLM's reasoning. A triage agent analyzing process behavior does not need the real username — an anonymized identifier is sufficient for reasoning about the pattern.
Azure OpenAI Service: Microsoft's hosted version of OpenAI models processes data within your Azure tenant, providing data residency controls and the same enterprise security commitments as other Azure services. This is the preferred option for regulated industries.
Local/on-premises models: Smaller LLMs (Llama 3, Mistral, Phi-4) can run on-premises without any data leaving your environment. Performance is lower than GPT-4o or Claude 3.5 for complex reasoning tasks, but for structured triage and enrichment workflows with well-defined tool calls, they are often sufficient.

Hallucination in the Security Context

Hallucination rates for leading models on factual tasks have improved substantially — independent benchmarks show GPT-4o at approximately 1.5% and Claude 3.5 Sonnet at approximately 4.6% on standardized factual assessments. Retrieval-augmented generation (grounding the model in verified data before it responds) reduces hallucination rates by up to 71% in well-implemented systems.

In practice, the most common hallucination failure mode in security agents is schema drift — the model generates a query or tool call using field names that do not exist in your actual data model. An agent trained on general Splunk SPL knowledge might generate SPL that references src_ip when your data model uses source_ip, or reference a field from a different CIM data model than the one available. Mitigate this by providing the agent with your actual data schema in the system prompt, and validating tool call arguments against that schema before execution.

The organizations making AI work in security operations are not waiting for perfect models. They are building architectures where imperfect models are safe to deploy: deterministic evidence gathering, structured tool calls, confidence scoring, human review for consequential actions, and complete audit trails. The AI provides throughput; the architecture provides reliability.

Starting Your AI SOC Deployment

If you are starting from zero, the sequence that works:

Week 1-2: Deploy a read-only alert triage agent in assisted mode (Level 1). It enriches alerts and presents summaries. Analysts review every output. Measure how often the agent's severity recommendation matches the analyst's final disposition.
Week 3-6: Measure the accuracy rate. If it exceeds 80% agreement, move to Level 2 for low-severity alert categories. High-severity alerts remain at Level 1 (analyst reviews every one).
Month 2-3: Expand to IOC enrichment and playbook step execution for defined alert types. Build the approval queue for semi-autonomous actions.
Month 3-6: Evaluate Level 3 for a single, well-defined, high-confidence scenario (e.g., auto-quarantine phishing emails that score above 95% malicious across all enrichment sources). Require leadership approval before enabling any Level 3 autonomy.

IBM research estimates that organizations using AI-powered security see nearly $2 million in reduced breach costs and 80 days faster response times. Those numbers come from real deployments. But they also come from deployments that were built carefully, with appropriate guardrails, and expanded incrementally. The SOC AI deployments that fail are the ones that skip from Level 1 to Level 3 before the foundation is established.

Your analysts are not your cost problem — your alert volume and your analyst time allocation are. AI agents solve the volume problem. The human judgment problem remains yours to solve through hiring, training, and operational design. Use AI to give your analysts back the time they need to do the work that actually requires them.

Ready to bring AI to your security operations?

We design AI agent architectures for SOC automation — from alert triage and IOC enrichment to SOAR integration and guardrail frameworks — built for the real-world constraints of enterprise security operations. Book a session with our team.

Book a Session