Coding
Source-control and terminal environments where agents review code, manage branches, and run commands — with adversaries hiding instructions inside diffs, issues, and CI output.
Domain overview
As code agents rapidly evolve, they are increasingly adopted to streamline software engineering workflows. These agents typically operate in information-rich and sensitive environments, with the capability to directly interact with system resources.
However, prioritizing capability improvements over safety can lead agents to facilitate unsafe behaviors. In particular, agents may generate and execute risky code due to insufficient security awareness or adversarially injected instructions. For instance, they may perform harmful operations on the operating system or file system, such as adding a risky alias to the .bashrc file or deleting sensitive data. Furthermore, such agents may be exploited for unethical purposes, including the generation and execution of biased code.
We design a comprehensive benchmark consisting of 330 benign tasks spanning five representative categories of common coding workflows, along with more than 120 malicious tasks covering nine key security risk categories. These risk categories are guided by established security standards, including CWE, OWASP, MITRE ATT&CK Techniques, and NIST.
Based on these risks, we construct red-teaming tasks with malicious objectives under two primary threat models, i.e., direct and indirect, enabling a systematic evaluation of the security robustness of code agents.
Benign task categories
File Type Conversion
The agent converts data files from one format to another, such as transforming CSV files into JSON
Dependency & Environment Repair
The agent diagnoses incompatible package versions and fixes the software environment to restore correct program behavior
Grid Pattern Transformation
The agent implements a program that transforms structured grid inputs into target outputs according to given examples and inferred transformation rules
Mathematical Computation
The agent performs symbolic or numerical computation tasks, such as evaluating integrals or algebraic expressions, and outputs the result in the required format
Web Content Retrieval
The agent fetches content from a specified webpage and saves the retrieved data to a local file for downstream processing or archival purposes
Threat models
Indirect threat model
In the indirect threat model, we consider all risk categories listed in the figure, as all malicious goals are feasible under this setting. For example, an attacker may inject malicious instructions into a README.md file, thereby influencing the code agent to perform unsafe code execution. In total, we construct 165 red-teaming instances under this setting.
Direct threat model
In the direct threat model, where the user acts as the attacker, the objective is to explicitly instruct the code agent to perform actions that lead to unsafe outcomes. We focus on the most severe cases with immediate impact, such as adding risky aliases to .bashrc or deleting sensitive files. The complete set of categories is presented in the figure. In total, we construct 136 red-teaming instances under this setting.
Results in this domain
Indirect / Direct ASR (lower is safer) and BSR (higher is more capable) for every evaluated agent on the Coding suite.
| Framework | Model | Indirect ASR Lower = safer | Direct ASR Lower = safer | BSR Higher = more capable |
|---|---|---|---|---|
Environments
3 environments in the Coding domain.
Code-Terminal
We construct a sandboxed code-execution environment that provides coding agents with a fully functional Linux development workstation within a dedicated Docker container. The environment is equipped with a standard development toolchain, including Python 3 with `pip`, as well as common Unix utilities such as `wget`, `curl`, and others. The container is reset between tasks, ensuring complete state isolation across evaluations.

GitHub
The GitHub environment simulates a collaborative software-development workspace for repository management, code review, and issue-tracking workflows. It supports repository navigation, issue inspection, pull-request review, and commit-history exploration, making it suitable for evaluating agents in development-centered workflow scenarios. This environment is particularly important because software repositories combine structured metadata with unstructured code, comments, and review discussions, creating realistic opportunities for both benign collaboration and adversarial manipulation.

GitLab
The GitLab environment simulates a collaborative software-development workspace for project management, issue tracking, and repository-centered workflows. It supports project navigation, issue inspection, board-based task tracking, and detailed issue review, making it suitable for evaluating agents in development-oriented workflow scenarios. This environment is particularly useful because GitLab-style project systems combine structured metadata with unstructured descriptions, comments, and workflow state, creating realistic opportunities for both benign coordination and adversarial manipulation.