First In-the-Wild LLM-Agent Intrusion

Dark cyberpunk illustration of a glowing orange spider-like machine intelligence threading at speed through a cyan lattice of servers and a locked vault, with a still human silhouette watching from the shadows.

Less than an hour. That is how long it took an intruder to go from an exposed Python notebook to a complete copy of an internal Postgres database, and a human never typed most of the commands in between. On May 10, 2026, the Sysdig Threat Research Team watched a large language model agent run the post-exploitation phase of a live intrusion - reading the host, pulling credentials, pivoting through the cloud, and exfiltrating the data - composing each step in real time rather than replaying a script. Sysdig describes it as the first AI-agent-driven intrusion its team has captured in the wild, and published the full breakdown on May 30. The exploit was ordinary. The operator was not.

The way in was CVE-2026-39987, an unauthenticated remote code execution flaw in marimo, a widely used open-source Python notebook. A WebSocket endpoint, /terminal/ws, skipped the authentication check that the notebook's other endpoints enforced, so anyone who could reach the port got a full interactive shell - and in the documented case a root shell, with no token required even when authentication was switched on. marimo's security advisory (GHSA-2679-6mx9-h9xc) rates it CVSS 9.3 and closed the gap in version 0.23.0; public reporting put exploitation within hours of disclosure. If that were the whole story, this would be a patch-and-move-on note. What earns it your attention is who - or what - showed up after the shell opened.

Here is the honest scope. If you run internet-reachable developer tooling - notebooks, dashboards, internal apps stood up "just for the team" - on hosts that carry cloud credentials, and you keep secrets in a managed vault like AWS Secrets Manager, the chain in this report reads like a survey of your own environment. If you have no cloud footprint and nothing exposed past a patched perimeter, the marimo flaw is not on your list this week, and you can skip the patch section. The piece that should hold everyone's attention is the tempo. The whole intrusion moved at the speed of an automated tool loop, and most detection programs are still tuned for an adversary who works at the speed of a person.

From one shell to a database dump in four pivots

Strip the novelty away and the kill chain is a textbook cloud-pivot, executed with unusual discipline. The agent landed on the marimo host through the RCE, then worked outward in four moves, each one short and purposeful.

The chain Sysdig reconstructed

Harvested two cloud credentials directly from the compromised notebook host.
Replayed those credentials through a fanned-out egress pool built on Cloudflare Workers - twelve cloud API calls spread across eleven distinct source IPs in twenty-two seconds, which defeats any detection keyed to a single source address.
Used that access to pull an SSH private key out of AWS Secrets Manager.
Drove eight short SSH sessions against a downstream bastion host with the stolen key, then dumped the target Postgres schema and its full contents in under two minutes.

Every credential it used was legitimate. Every API call was a call the environment was built to allow. There was no exotic zero-day after the front door; there was an operator who already understood that the fastest route to the data ran through the cloud control plane and the secrets vault, and who took it without a wasted step. The Hacker News coverage of the report frames the same sequence: initial access to exfiltrated database in well under sixty minutes, post-exploitation included.

How Sysdig knew a model was driving

Attribution to an agent rather than a fast human is the part of the report worth reading twice, because it tells you what to look for in your own logs. Sysdig points to several properties of the bastion transcript that line up with real-time composition by an LLM agent rather than a person at a keyboard or a canned script.

The tell that travels best: the operator bundled six independent SELECT statements into a single psql invocation. A human running reconnaissance against an unfamiliar database tends to poke one query at a time, read the result, and decide the next step. A pre-written script tends to carry hard-coded table names from a prior run. An agent does neither. It wants the whole answer returned in one tool call so it can reason over the result in a single turn, so it composes one fat query that grabs everything at once. Add the absence of typos, the lack of idle "thinking" gaps, the lack of exploratory dead ends, and the machine-orchestrated egress fan-out, and the behavioral fingerprint points away from a manual operator.

I read that fingerprint as the actually durable signal here. Indicators of compromise from this campaign will rotate within the week. The behavior - efficiency without hesitation, parallelism a human hand cannot produce, reconnaissance collapsed into one shot - is far harder for the attacker to fake away, and it is what your detection logic can anchor on.

The dwell time you were counting on just shrank

Most response plans I review at small and mid-size shops quietly assume a window. Initial access on day one, discovery and lateral movement over the following days, exfiltration later still - enough time for an alert to fire, an analyst to notice, and someone to start pulling plugs. That assumption is load-bearing, and an agent operator removes the load. When discovery, credential theft, lateral movement, and exfiltration all happen inside one hour, the gap your process needs simply is not there.

Two common controls degrade in particular. Rate limits and geo-velocity rules keyed to a single source IP lose to an egress pool that sprays twelve calls across eleven addresses in twenty-two seconds. And the comfortable plan of "we will catch it during lateral movement" depends on lateral movement being slow enough to catch. The position the facts have earned: stop building your last line of defense around catching a human mid-session, and move it onto the cloud control plane, where an agent has to make noise no matter how fast it goes. Reads from a secrets vault, brand-new principals exercising old credentials, and impossible-velocity API patterns are signals an agent cannot avoid generating on its way to the data. This CloudWatch Logs Insights query over CloudTrail flags the egress-pool tell - one identity pulling secrets from several source IPs in a tight window:

fields @timestamp, sourceIPAddress, userIdentity.arn, requestParameters.secretId
| filter eventSource = "secretsmanager.amazonaws.com"
| filter eventName = "GetSecretValue"
| stats count_distinct(sourceIPAddress) as src_ips,
        count(*) as calls,
        earliest(@timestamp) as first_seen,
        latest(@timestamp) as last_seen
        by userIdentity.arn, bin(5m)
| filter src_ips >= 3
| sort calls desc

Tune the src_ips threshold to your environment, but the shape is the point: a single principal reaching into Secrets Manager from three or more addresses inside five minutes is not how your application servers behave on a normal Tuesday.

Controls that survive a machine-speed operator

None of the fixes here are new. What changes is the order you put them in and how little time you now have to rely on the slow ones. Work down this list in the next two weeks.

1. Get developer tooling off the public internet, then patch it

Update marimo to 0.23.0 or later, then ask the harder question: why was a notebook reachable from the internet at all. Notebooks, dashboards, and internal apps belong behind SSO, a VPN, or a ZTNA gateway, not on a public port. Inventory what is exposed before you assume the answer is "nothing." marimo defaults to port 2718:

# Inventory marimo notebooks listening across your ranges
nmap -p 2718 --open -oG - 10.0.0.0/8 \
  | awk '/2718\/open/ {print $2}'

2. Stop parking standing cloud credentials on exposed hosts

The intrusion turned a single RCE into a cloud foothold because long-lived credentials were sitting on the notebook host. Replace them with short-lived instance roles scoped to exactly what the host needs, so a shell on that box yields minutes of narrow access instead of a reusable key to the account.

3. Alarm on the secrets vault by principal and source

A web or application host pulling an SSH private key out of Secrets Manager is the inflection point in this whole story, and it is observable. Alert when a principal reads a secret it has never read before, or reads one from an unexpected source. The query above is the starting detection; wire it to a real alarm, not a dashboard nobody watches.

4. Centralize bastion session logs and watch for bursts

Eight rapid SSH sessions from one key against a bastion is a pattern, but only if those logs leave the bastion. Ship session records to your SIEM or log store and alert on burst behavior - many short sessions from a single key in a small window - because by the time an agent is on the bastion, the database is minutes away.

5. Automate the first containment step

Against an operator that finishes inside an hour, a runbook that pages a human to revoke a key by hand is too slow. Pre-build the automation: on a high-confidence secrets-abuse alert, revoke the key, disable the principal, and kill active sessions without waiting for someone to wake up. You can always re-enable; you cannot un-exfiltrate.

Put a tripwire on the secrets vault this week

The marimo bug will be patched and forgotten. The operating model behind it will not be. Treat this report as the floor, not the ceiling: assume your next intruder reaches the database faster than your on-call can read the first alert, and design backwards from that. Concretely, before Friday, put one real alarm on Secrets Manager access by principal and source, and confirm a burst of bastion logins would actually page someone. Those two tripwires sit exactly where a fast agent has to step. If you want a second set of eyes on whether your detection and response can keep pace with automated post-exploitation, that is the kind of pressure-test we run.

Can your detections keep up with a machine-speed intrusion?

We help security teams build detection and response for fast, automated post-exploitation - the cloud control-plane signals an agent trips on its way to your data. Book a session to pressure-test your environment against an operator that moves at tool-call speed.

Book a Session

When the Attacker Is an Agent: Inside the First In-the-Wild LLM-Driven Intrusion