Getting Started

Quickstart Guide

Install the SDK, set your provider key, pull the benchmark dataset, and run an evaluation. Docker sandboxes are pulled on demand by the first task that needs them.

1. Install the SDK
Includes the dtap CLI and every supported agent framework
pip install decodingtrust-agent-sdk
2. Set provider API keys
Only the providers you'll actually use are required
# Export the key(s) for the providers you'll evaluate against
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=...
3. Download the dataset from HuggingFace
Per-task config.yaml / judge.py / setup.sh files used by the evaluator
# Pull the dataset/ folder from HuggingFace (~216 MB)
hf download AI-Secure/DecodingTrust-Agent-Platform \
  --repo-type dataset \
  --local-dir dataset
4. Run an evaluation
Single benign CRM task on the OpenAI Agents SDK backbone
dtap eval \
  --task-list benchmark/crm/benign.jsonl \
  --agent-type openaisdk \
  --model gpt-4o \
  --max-parallel 4