AI Agent Detection: Your SOC's Blind Spot

Dark cyberpunk illustration of a cyan radar light sweeping a field of dark servers while one node glows orange and unseen in the unlit gap behind the beam.

A coding agent with write access to your repository reads a dependency's README, follows an instruction buried three paragraphs into the file, and opens a pull request that quietly adds an attacker's deploy key. Your web application firewall logged nothing. Your endpoint agent logged nothing. The model did the one thing models do: it read text and acted on it. Somewhere in an application log, in JSON nobody is watching, there is a perfect record of the whole thing.

That distance between what AI agents do and what the security operations center can see is the defining detection problem of 2026. The OWASP Top 10 for Agentic Applications now maps prompt injection to six of its ten categories, which makes it the single most common weakness across the agent stack. And these agents are not a pilot anymore. Of 53 agentic projects that analysis tracked, 28 were coding agents, with Claude Code, Gemini CLI, Codex, Cline, and Aider among the fastest growing. Your developers are running these against real repositories right now, with or without your sign-off.

Who this is for, and who can skip it. If any team in your business has wired a model to a tool - a coding assistant that can write files or run commands, a support bot that queries your database, a retrieval assistant that reads internal documents, an MCP server connected to email or code - this is your problem, and your SOC almost certainly cannot see it today. If your only use of AI is staff pasting text into a consumer chat window with no tool access, no API agents, and no internal data wired in, the attack described here is not yours yet; your exposure is data leakage through shadow AI, which is a different article. The line is tools. The moment a model can take an action on your behalf, its tool calls become a log source you have to defend.

Why your perimeter stack is blind to this

Traditional controls fail here for a structural reason: the attack arrives as content the system is supposed to process. A WAF inspects requests for malformed input and known signatures. An EDR watches process and file behavior on the host. Prompt injection is neither of those things. The malicious instruction is ordinary prose sitting inside a PDF, a Jira ticket, a web search result, or a knowledge-base page that the model was told to read and summarize. There is no exploit string to match and no anomalous binary to flag. The payload is the legitimate input.

This is why "we have an AI gateway" or "the model has guardrails" does not close the gap. Input filters catch the crude attempts that show up in the user's own message. The dangerous ones are indirect: the instruction never appears in the user's prompt at all; it rides in on data the agent fetches later, and the model carries it forward with the privileges of whatever session is running. Detecting that requires watching the agent's own behavior - what it retrieved, what tools it called, how its reasoning shifted - and that telemetry lives in application logs your SOC does not ingest. The work ahead is detection engineering applied to three surfaces most teams have never logged.

Surface one: the retrieval pipeline

Start where the cheapest win is. Any agent that does retrieval - the "R" in RAG - pulls documents from somewhere before it answers. That is the entry point for indirect injection: an attacker plants instructions in a document they know the model will read, and the model runs them with the access of the retrieval session. As the team at Cybersecurity Insiders documented, this is the pattern that perimeter tools cannot see, because the telemetry source is the RAG pipeline, not the network. Most deployments log none of it.

Instrument the retrieval step to emit, for every fetch, the source identifier (URL, file path, or document ID) and a content hash. With that one stream you can do two things you cannot do today: rebuild exactly which documents fed a bad answer, and alert on retrieved content carrying high-density imperative-verb structure measured against a set of known injection signatures. You will not catch everything on day one. The first month of that log is the baseline you tune against, and you cannot build a baseline for a source you never recorded.

Surface two: the tool-call boundary

The highest-value surface is where the agent stops reasoning and acts. Frameworks like LangChain, LangGraph, and AutoGen emit a tool-call event every time the model invokes a function: the tool name, the input, and the output it received. Second-order injection lives here. The agent calls a legitimate tool - a web search, a database query - receives attacker-controlled text in the result, and then carries an instruction from that result into its next step. The tool call was authorized. The content that came back was not.

The detection that works is a correlation: flag a tool output that contains instruction-formatted text when it lands within a few turns of a sensitive action. Define "sensitive" concretely for your environment - credential retrieval, a file write, an outbound API call, a shell command. The cheapest way to feed that detection is a callback that ships every tool call to your log pipeline as structured JSON:

import json, hashlib, time
from langchain_core.callbacks import BaseCallbackHandler

class SIEMToolLogger(BaseCallbackHandler):
    """One structured event per tool call, so the SOC - not just the
    app log - sees what the agent actually did."""
    SENSITIVE = {"shell", "write_file", "http_request", "get_secret"}

    def on_tool_start(self, serialized, input_str, **kw):
        self._t0 = time.time()
        self._tool = serialized.get("name", "unknown")

    def on_tool_end(self, output, **kw):
        text = str(output)
        event = {
            "ts": time.time(),
            "source": "agent.tool_call",
            "tool": self._tool,
            "sensitive": self._tool in self.SENSITIVE,
            "output_sha256": hashlib.sha256(text.encode()).hexdigest(),
            "imperative_density": imperative_density(text),
            "turn": next_turn_index(),
        }
        print(json.dumps(event))   # -> log shipper -> SIEM

What a usable tool-call event needs

the tool name, and whether it is on your sensitive list;
a hash of the output, so you can correlate without storing customer data;
a cheap signal for instruction-like content (imperative-verb density works as a first pass);
a turn index, so a correlation rule can measure "within three turns".

Once those events reach the SIEM, the rule is ordinary detection engineering. This Sigma correlation fires when instruction-dense tool output precedes a sensitive call inside the same short window:

title: Instruction-formatted tool output before a sensitive agent action
logsource:
  product: ai_agent
  service: tool_call
detection:
  injected:
    imperative_density|gte: 0.15
  sensitive:
    sensitive: true
  condition: injected | near sensitive within 3 turns
level: high
falsepositives:
  - legitimate tasks that summarize instructions then act

This is deliberately simple. It will be noisy at first, the same way any new detection is, and you will spend a couple of weeks raising the imperative-density floor and refining what counts as sensitive. That tuning is the work. The point is that the event exists at all - today, for most teams, it never leaves the application.

Surface three: the conversation session

The patient attacks do not inject a single payload. They poison the conversation. An attacker uses a run of benign, on-topic turns to shift the session's context, so that by the time the harmful instruction arrives it reads as consistent with everything before it. No single message looks anomalous; the shape of the whole session does.

Catch it with behavioral analysis over a rolling window of roughly ten turns. The signals that hold up: a sharp inversion in the ratio of user-message length to assistant-message length, topic entropy collapsing toward zero as the attacker narrows the model onto one objective, and system-role language ("ignore previous instructions", "you are now...") appearing in turns that are supposed to be user content. These are session-metadata detections. They work even when each individual turn reads as harmless, and they are most mature today in financial-services deployments where agents drive customer workflows.

Scope what the agent can reach

Detection tells you when something went wrong. Scoping decides how bad it gets. Every one of these attacks ends at a tool call, so the controls that cap the damage are the unglamorous ones: give each agent its own short-lived, narrowly scoped credentials instead of a shared service account; gate file writes, outbound network calls, and secret access behind allowlists; and put egress filtering on the runners and servers where agents execute. An agent that cannot reach your secrets store cannot exfiltrate it, no matter how convincingly it is talked into trying.

This matters more every month because the surface is expanding fast. Internet scans this year counted over 12,000 reachable Model Context Protocol servers, and security researchers found that roughly 40 percent of remote servers expose their tools with no authentication at all. Each unauthenticated server is a set of actions an attacker can drive directly, before any prompt injection is even needed. Treat an MCP server you expose the way you would treat any other internet-facing application that can run code: authenticated, network-segmented, and logged.

Put the agent's tool calls in front of your SOC

The offensive side of this is automating quickly - one major model provider reported that two-thirds of the threat-actor accounts it banned were using AI to build their tooling - while the defensive side still treats agent logs as application debug output. Close that gap this quarter, and you do not need a new platform to do it. Pick the single agent in your environment with the most access, usually a coding assistant wired to your repositories and CI, and route its retrieval and tool-call events into the SIEM you already run. Write the two correlation rules above against them. Give the agent its own scoped credentials. That is about a week of work, and at the end of it your SOC can finally watch the layer of your business that is now making decisions and taking actions on its own. Right now, almost no one can.

Drowning in alerts? We can help.

We help security teams build detection pipelines that cover the AI agents now running in their environment, not just the network and the endpoints. Book a session to talk through your SIEM and where your agent telemetry is going today.

Book a Session

Your AI Agents Emit Security Events. Your SIEM Never Sees Them.