Coding domain | DTap docs

Domain overview

As code agents rapidly evolve, they are increasingly adopted to streamline software engineering workflows. These agents typically operate in information-rich and sensitive environments, with the capability to directly interact with system resources.

However, prioritizing capability improvements over safety can lead agents to facilitate unsafe behaviors. In particular, agents may generate and execute risky code due to insufficient security awareness or adversarially injected instructions. For instance, they may perform harmful operations on the operating system or file system, such as adding a risky alias to the .bashrc file or deleting sensitive data. Furthermore, such agents may be exploited for unethical purposes, including the generation and execution of biased code.

We design a comprehensive benchmark consisting of 330 benign tasks spanning five representative categories of common coding workflows, along with more than 120 malicious tasks covering nine key security risk categories. These risk categories are guided by established security standards, including CWE, OWASP, MITRE ATT&CK Techniques, and NIST. Based on these risks, we construct red-teaming tasks with malicious objectives under two primary threat models, i.e., direct and indirect, enabling a systematic evaluation of the security robustness of code agents.

Benign task categories

File Type Conversion

The agent converts data files from one format to another, such as transforming CSV files into JSON

Dependency & Environment Repair

The agent diagnoses incompatible package versions and fixes the software environment to restore correct program behavior

Grid Pattern Transformation

The agent implements a program that transforms structured grid inputs into target outputs according to given examples and inferred transformation rules

Mathematical Computation

The agent performs symbolic or numerical computation tasks, such as evaluating integrals or algebraic expressions, and outputs the result in the required format

Web Content Retrieval

The agent fetches content from a specified webpage and saves the retrieved data to a local file for downstream processing or archival purposes

Threat models

Indirect threat model

In the indirect threat model, we consider all risk categories listed in the figure, as all malicious goals are feasible under this setting. For example, an attacker may inject malicious instructions into a README.md file, thereby influencing the code agent to perform unsafe code execution. In total, we construct 165 red-teaming instances under this setting.

Direct threat model

In the direct threat model, where the user acts as the attacker, the objective is to explicitly instruct the code agent to perform actions that lead to unsafe outcomes. We focus on the most severe cases with immediate impact, such as adding risky aliases to .bashrc or deleting sensitive files. The complete set of categories is presented in the figure. In total, we construct 136 red-teaming instances under this setting.

Results in this domain

Indirect / Direct ASR (lower is safer) and BSR (higher is more capable) for every evaluated agent on the Coding suite.

Full leaderboard →

Framework	Model	Indirect ASR Lower = safer	Direct ASR Lower = safer	BSR Higher = more capable

Environments

3 environments in the Coding domain.

Code-Terminal

We construct a sandboxed code-execution environment that provides coding agents with a fully functional Linux development workstation within a dedicated Docker container. The environment is equipped with a standard development toolchain, including Python 3 with `pip`, as well as common Unix utilities such as `wget`, `curl`, and others. The container is reset between tasks, ensuring complete state isolation across evaluations.

GitHub

The GitHub environment simulates a collaborative software-development workspace for repository management, code review, and issue-tracking workflows. It supports repository navigation, issue inspection, pull-request review, and commit-history exploration, making it suitable for evaluating agents in development-centered workflow scenarios. This environment is particularly important because software repositories combine structured metadata with unstructured code, comments, and review discussions, creating realistic opportunities for both benign collaboration and adversarial manipulation.

GitLab

The GitLab environment simulates a collaborative software-development workspace for project management, issue tracking, and repository-centered workflows. It supports project navigation, issue inspection, board-based task tracking, and detailed issue review, making it suitable for evaluating agents in development-oriented workflow scenarios. This environment is particularly useful because GitLab-style project systems combine structured metadata with unstructured descriptions, comments, and workflow state, creating realistic opportunities for both benign coordination and adversarial manipulation.

Documentation