v1.0 ReleasedRead the announcement
DTap Logo

DecodingTrust Agent Platform

A Controllable and InteractiveRed-Teaming Platform for AI Agents

The first dynamic red-teaming framework against AI Agents across 14 domains and 50+ sandboxed environments.

Covering Diverse Indirect Injections in Environments, Tools, Skills, and Direct Prompt Injections.

Paper available on arXiv

Agents Ranked by Security Vulnerability

Ranked by average attack success rate (higher = more vulnerable).

VulnerabilityRank
AgentModel
Indirect ASR
Lower = safer
Direct ASR
Lower = safer
BSR
Higher = more capable
View Full Leaderboard

The Safety–Capability Trade-off

Each point is an agent. The bottom-right corner — high benign success, low attack success — is the ideal frontier. The dashed line traces the Pareto-optimal agents: nothing else is both safer and more capable.

01020304050607080901000102030405060708090100BSR — Benign Success Rate (%) →Indirect ASR (%)⚠ Worst: vulnerable & weak★ Ideal: capable & safe
ChampionPareto-optimalOptimal frontier

Citation

If you use DecodingTrust-Agent in your research, please cite our paper.

@article{chen2026decodingtrust,
  title={DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents},
  author={Chen, Zhaorun and Liu, Xun and Tong, Haibo and Guo, Chengquan and Nie, Yuzhou and Zhang, Jiawei and Kang, Mintong and Xu, Chejian and Liu, Qichang and Liu, Xiaogeng and others},
  journal={arXiv preprint arXiv:2605.04808},
  year={2026}
}