Getting Started

Quickstart Guide

Install the SDK, set your provider key, pull the benchmark dataset, and run an evaluation. Docker sandboxes are pulled on demand by the first task that needs them.

1. Install the SDK

Includes the dtap CLI and every supported agent framework

pip install decodingtrust-agent-sdk

2. Set provider API keys

Only the providers you'll actually use are required

# Export the key(s) for the providers you'll evaluate against
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=...

3. Download the dataset from HuggingFace

Per-task config.yaml / judge.py / setup.sh files used by the evaluator

# Pull the dataset/ folder from HuggingFace (~216 MB)
hf download AI-Secure/DecodingTrust-Agent-Platform \
  --repo-type dataset \
  --local-dir dataset

4. Run an evaluation

Single benign CRM task on the OpenAI Agents SDK backbone

dtap eval \
  --task-list benchmark/crm/benign.jsonl \
  --agent-type openaisdk \
  --model gpt-4o \
  --max-parallel 4

Next Steps

Full SDK installation guide
dtap eval reference and flag-by-flag walkthrough
Wrap your own agent with build_agent
Per-environment Docker setup (advanced)