CVE-2026-45311: CodeWhale: run_tests Tool Enables RCE via Malicious Repository Without Approval
CodeWhale is a DeepSeek + MiMo coding agent in terminal. From 0.3.0 to 0.8.23, the run_tests tool executes cargo test in the workspace with ApprovalRequirement::Auto, meaning it runs without any user approval prompt. cargo test compiles and executes arbitrary code: test binaries, build.rs build scripts, and proc macros. While auto-approving test execution is a deliberate design choice, it creates an inconsistency in the security boundary. However, in a malicious repository, test code can execute arbitrary shell commands, exfiltrate credentials, or establish persistence with zero approval. The attack is amplified by AGENTS.md (auto-loaded into the system prompt), which can instruct the model to run tests proactively at session start. This vulnerability is fixed in 0.8.23.
HarborGuard Analysis
HarborGuard analysisSynopsis
Remote code execution vulnerability in CodeWhale, a terminal-based AI coding agent, affects versions 0.3.0 through 0.8.23. The run_tests tool executes cargo test with automatic approval (no user prompt), meaning that a malicious repository can embed arbitrary code in test binaries, build scripts, or proc macros that runs immediately when the agent processes the repo. A victim must open the malicious repository with CodeWhale, but once they do, an attacker achieves full code execution on the host with no further barriers. No fix version has been published yet; HarborGuard is tracking the advisory for patch availability.
HarborGuard Coverage
Detection of CVE-2026-45311 is available across every HarborGuard environment: the CVE is ingested from upstream advisory feeds within minutes of publication and matched against all customer images, including custom-built images that bundle the CodeWhale binary. Any image containing a CodeWhale version in the affected range (0.3.0 to 0.8.23) is flagged immediately.
AvailableTriage is available with the full CVSS v3.1 score of 9.6 (Critical), and each finding can be weighted against per-environment compliance policies to determine priority and routing. Alerts are directed to the appropriate team inbox within each customer organization based on configured ownership rules.
AvailableBecause no upstream fix has been published, HarborGuard re-checks the advisory on every ingest cycle and will make a patched-image rebuild available the moment a fix version is released. In the interim, customers can apply compensating controls through HarborGuard policy rules, such as flagging or blocking deployment of any image containing affected CodeWhale versions.
Pending upstreamExploit Conditions
- Network reachabilityRequired
The attacker delivers the malicious repository over the network; the victim must fetch or open a remote repo, exposing the attack surface to any network-accessible source.
- AuthenticationNot required
No credentials or account privileges are needed; any unauthenticated party can publish or share a malicious repository.
- Victim interactionRequired
The victim must open the malicious repository with CodeWhale, making social engineering (e.g., sharing a repo link) the required delivery mechanism.
- Attack complexityDetail
Exploitation is reliable and condition-free once the repository is opened; no race conditions, memory layout dependencies, or special environmental factors apply.
Blast Radius
- Arbitrary shell commands execute on the victim's host inside the terminal session with the victim's user privileges.
- Test code or build scripts read and exfiltrate credentials, SSH keys, API tokens, or environment variables stored on the machine.
- An attacker establishes persistence by writing files, installing cron jobs, or modifying shell profiles on the host.
- The AGENTS.md mechanism allows the malicious repo to instruct the AI model to trigger test execution automatically at session start, removing even the implicit friction of a manual command.
How HarborGuard Handles This
Available on HarborGuard: affected images containing CodeWhale 0.3.0 through 0.8.23 are identified automatically as each customer's registry and pipeline images are scanned. Because no upstream fix exists at this time, HarborGuard monitors the advisory on every ingest cycle and will surface a patched-image rebuild opportunity the moment a fix version is published. For customers with auto-remediation enabled, the rebuild, regression run, and PR flow will trigger automatically against affected workloads once a fix is available. In the interim, recommended compensating controls include blocking deployment of images containing affected CodeWhale versions via HarborGuard policy rules, restricting the agent to trusted internal repositories through network egress filtering, and auditing any AGENTS.md files in repositories the agent is permitted to access.
Metrics
- CVSS v3.1
- 9.6
- Severity
- CRITICAL
- Fixed in
- —
- Affected Products
- 1
- Hmbown / CodeWhale>= 0.3.0, < 0.8.23
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H