Endpoint Security

Copy Fail (CVE-2026-31431): A 732-Byte Python Script Roots Linux

The Linux local privilege escalation playbook used to come with caveats. Build the right kernel offsets. Win the right race condition. Hope the heap layout cooperates. Hope SMAP and KASLR are not in the way. The honest practitioner's expectation, on a hardened modern distro, was that local privilege escalation took some work and might not pay off on the first try.

Copy Fail breaks that frame. CVE-2026-31431, disclosed by the Xint Code research team on April 29, 2026, is a deterministic logic flaw in the Linux kernel's cryptographic subsystem. The published proof of concept is a single 732-byte Python script using only standard library modules. It works unmodified on Ubuntu 24.04 LTS, Amazon Linux 2023, Red Hat Enterprise Linux 10.1, SUSE 16, Debian, Fedora, and Arch. It does not rely on race conditions or kernel offsets. It fires on the first try. CISA added it to the Known Exploited Vulnerabilities catalog on April 30 — one day after public disclosure. Microsoft Defender, Unit 42, ExtraHop, and every major distribution have published advisories within the same week.

If your environment runs Linux — servers, cloud VMs, Kubernetes nodes, CI runners, developer laptops, IoT devices — you have until your next patch window to remove a primitive that turns any foothold into root and, on container hosts, into a node-level compromise.

The shape of the bug

The flaw lives in the algif_aead module of the AF_ALG socket interface — the userspace crypto API that lets unprivileged programs perform Authenticated Encryption with Associated Data operations through kernel-managed cryptographic primitives. The bug is not a single mistake. According to Unit 42's analysis, it is the cumulative result of three independent kernel updates that, taken together, removed an invariant the original code relied on.

The mechanic, simplified: when an unprivileged user submits an AEAD operation through AF_ALG and uses the splice() system call to feed in source data, the kernel performs an in-place optimization that reuses source memory as the destination. A logic error in the way associated-length and copy-length parameters interact lets the attacker direct a controlled four-byte overwrite into the kernel's page cache.

Four bytes in the wrong place is enough. The page cache is the in-memory copy of file contents that the kernel reads when it executes a binary. The attacker targets the cached page that backs a setuid-root binary like /usr/bin/su, /usr/bin/sudo, or /usr/bin/passwd, overwrites four bytes of its .text section with shellcode, and then executes the binary normally. The cached, modified copy runs. The on-disk file is untouched. The user's privileges become root. The attacker exits the shell. The page eventually gets evicted from cache. The on-disk file reloads clean. There is no forensic trace on disk.

Three properties make this exceptionally bad:

  • Deterministic. No race condition. No information leak prerequisite. No KASLR break. The exploit fires on the first attempt with 100% reliability.
  • Portable. The same 732-byte Python script works across distributions and kernel versions because the targeted logic flaw is upstream — every kernel since the 2017 in-place optimization landed is vulnerable.
  • Container-aware. The page cache is shared between containers and the host. An unprivileged process inside a container, with no special capabilities, no kernel modules, no network access, can corrupt the host's view of /usr/bin/su. The vulnerability functions as a container escape.

Microsoft Defender's writeup is blunt about the implication: "When chained with an initial access vector such as SSH access, malicious CI job execution, or container footholds, the vulnerability allows an unprivileged user to break the system's privilege boundary." The footholds attackers already have — a leaked CI runner credential, a low-privilege SSH user, a compromised container — were previously hard-stuck at user level. Now they are root.

Where the impact actually lands

Most LPE coverage frames the risk in terms of "an attacker on the box can become root." That framing is correct but understates the operational impact. The real exposure has three concentric rings.

The first ring is shared-tenancy systems. Any system where multiple low-privilege users share a Linux host is now a system where any of those users can become root. That includes university research clusters, shared CI runners (GitHub Actions self-hosted runners, GitLab runners, Jenkins agents), shared developer jump boxes, multi-tenant SaaS architectures that run customer code on shared workers, and virtual private server environments. If the threat model assumed that user isolation kept tenants separate, the threat model needs an update.

The second ring is container hosts. Kubernetes nodes, Docker hosts, ECS/Fargate-style worker fleets — anywhere containers run. Because the page cache is shared between the host and every container on the host, a single compromised container becomes a full node compromise. The attacker does not need CAP_SYS_ADMIN. They do not need the host's /proc, /sys, or any file system mount. They need an unprivileged shell inside a single container. From there, they own the node, every other container on the node, and the credentials those containers were given.

The third ring is the long tail of "Linux somewhere we forgot about." Embedded devices that boot a 2020-vintage kernel and never receive updates. Network appliances. Legacy build servers running 2018 RHEL and pinned to that kernel because something downstream broke when last upgraded. The 2017 lower bound means that anything on a kernel newer than 4.14 and older than the patched build is exposed.

Berkeley's information security office summarized the worst-case in one sentence: "A local logged-in user can run a single command and gain root access to the system, including breaking out of a container." That is the right operational summary.

What to do this week

Layer 1: Patch every Linux kernel

The fixed kernels were pushed by upstream and the major distributions over April 30 to May 4. Apply them. The work is conventional package update plus reboot, but coverage matters more than speed.

# Ubuntu / Debian
apt update && apt install --only-upgrade linux-image-generic linux-headers-generic
# Reboot afterward; verify uname -r matches the patched build
reboot

# Red Hat / Rocky / AlmaLinux / Oracle Linux
dnf update kernel kernel-core kernel-modules
reboot

# SUSE
zypper update kernel-default
reboot

# Amazon Linux 2023
dnf update kernel
reboot

# Verify the running kernel includes the fix.
# Compare against your distribution's advisory:
# - Ubuntu: USN-7XXX series, May 2026
# - RHEL:   RHSA-2026:XXXX
# - SUSE:   SUSE-SU-2026:XXXX
# - Amazon: ALAS2023-2026-XXX
uname -r

Coverage is the hard part. The right inventory is "every Linux instance," not "every production server." Build runners, dev VMs, vendor appliances, the Raspberry Pi running a dashboard in the office, the OpenWrt router that someone added opkg to last year — all of it. Run the inventory before declaring victory.

Layer 2: Mitigate while patching is in flight

If you cannot reboot every system today, the recommended workaround from the Berkeley advisory and several distribution advisories is to disable the algif_aead kernel module. This removes the vulnerable code path. The caveat is that on RHEL 9 the module is built into the kernel and cannot be unloaded; on most other distributions it is loadable.

# Check whether algif_aead is loaded
lsmod | grep algif

# Unload (works on most distros, not RHEL 9)
modprobe -r algif_aead

# Block the module from being autoloaded on boot
cat > /etc/modprobe.d/disable-algif-aead.conf <<'EOF'
install algif_aead /bin/true
blacklist algif_aead
EOF

# Verify the install rule applies
modprobe -n -v algif_aead

# RHEL 9 (cannot unload — patch is the only option)
# Check whether the module is built-in:
grep -E "CONFIG_CRYPTO_USER_API_AEAD" /boot/config-$(uname -r)
# =y means built-in; =m means loadable module

Confirm with application owners that nothing legitimate uses AF_ALG AEAD on the host. Most workloads do not. A small set — certain crypto-accelerated VPN configurations, some kernel-mode IPsec setups, hardware security module integrations — do. If anything breaks after disabling the module, that is the signal to prioritize patching that host instead of relying on the workaround.

Layer 3: Detection

The exploit is stealthy by design. The disk forensic trail is empty. The runtime trail is not. Two signals are practical to monitor without specialized tooling:

AF_ALG socket creation by unexpected users. Legitimate AF_ALG usage on a typical Linux server is rare. Most application stacks never touch it. An auditd rule on socket creation gives you a coarse but useful tripwire:

# Add to /etc/audit/rules.d/copy-fail.rules
-a always,exit -F arch=b64 -S socket -F a0=38 -k af_alg_socket
-a always,exit -F arch=b32 -S socket -F a0=38 -k af_alg_socket

# Reload rules
augenrules --load
auditctl -l | grep af_alg

# Hunt: AF_ALG socket creation by non-system users in the last 7 days
ausearch -k af_alg_socket --start week-ago | grep -v "uid=0\|uid=999"

Sysmon for Linux / Falco rules on suspicious page cache modification. The Falco community has been publishing detection rules for Copy Fail since May 1. The basic pattern: an unprivileged process opens an AF_ALG socket, calls splice(), and immediately afterward a setuid binary is invoked by that user with elevated privileges. The full rule lives in the Falco rules repository under copy_fail_lpe.

# Falco rule excerpt
- rule: AF_ALG AEAD socket abuse (Copy Fail)
  desc: Detect unprivileged AF_ALG AEAD socket creation followed by splice
  condition: >
    evt.type = socket and
    evt.arg.domain = AF_ALG and
    proc.name in (suspicious_users) and
    not user.uid in (system_users)
  output: >
    AF_ALG socket created by unprivileged user
    (user=%user.name pid=%proc.pid command=%proc.cmdline)
  priority: WARNING
  tags: [cve-2026-31431, copy-fail, lpe]

Page cache integrity sweep. Where the attacker has succeeded, the cached binary differs from the on-disk file. Comparing the in-memory hash of /usr/bin/su, /usr/bin/sudo, and /usr/bin/passwd to the on-disk hash, then re-reading and re-hashing after a forced cache flush, surfaces tampering. echo 1 > /proc/sys/vm/drop_caches flushes the page cache; a divergence between the two hashes is a strong signal.

Layer 4: Reduce the value of "any foothold"

Copy Fail is a teachable moment about defense in depth. The vulnerability turns "low-privilege user on the box" into "root on the box." The defensive answer is to make the first half of that chain harder.

  • SSH discipline. No shared SSH credentials. No long-lived bastion accounts. Tie SSH access to identity-aware proxies (Teleport, BeyondCorp-style brokers) so that "low-privilege user on the box" requires authentication that scales the attacker's effort.
  • CI runner isolation. Self-hosted GitHub Actions and GitLab runners that execute customer or contributor code should run on ephemeral, single-job-per-host VMs — not on shared persistent runners. The container-escape primitive turns shared runners into shared compromises.
  • Container security baselines. Read-only root file systems, dropped capabilities, seccomp profiles, AppArmor or SELinux mandatory access controls. None of these defeat Copy Fail directly. All of them slow the post-root activity and improve detection.
  • Egress filtering on Linux fleets. Once the attacker is root, the next move is to call home. Default-deny egress with explicit allowlists removes the exfiltration channel even when the privilege boundary fails.

The AI-assisted-discovery footnote

One detail in Unit 42's writeup deserves attention beyond the immediate response. The Xint Code researchers credit AI-assisted analysis with surfacing the bug in roughly an hour — by feeding the kernel's crypto subsystem source into a model and asking it to look for inconsistencies in how the in-place optimization interacted with surrounding code. The researchers did the verification and exploitation work; the model surfaced the candidate.

This is not a one-off. The same workflow is generating findings against ffmpeg, OpenSSL, and the userspace half of every kernel system call interface. Defenders should expect that the rate of "deterministic, broadly portable, easy-to-weaponize" vulnerabilities will increase. The defensive posture that handles that reality is not "patch each one." It is "make any single vulnerability worth less" — by reducing footholds, segmenting blast radius, and treating every Linux host as a potentially hostile environment for the workloads on top of it.

For the next 72 hours, though, patch.

The patch is the table stakes. The harder question is which Linux hosts you do not have inventoried.

Red Hound's internal recon engagements use open-source tooling — naabu and nuclei for unauthenticated banner sweeps, custom scripts for kernel version inventory across cloud accounts, and Falco rule packs for post-exploitation detection on Kubernetes nodes. The tooling is public; we run it on a schedule and produce a written report you can hand to your engineering leadership. Book a 30-minute walkthrough on what your Linux footprint actually looks like and where Copy Fail-class exposure remains.