Off-the-Shelf Agents

Build agents from scratch using our framework wrappers. Each wrapper provides a standardized interface with automatic MCP server management, trajectory tracking, and multi-turn conversation support.

When to Use This
Choose this approach when starting a new project and you want to build an agent specifically for evaluation with DTap. You get full control over agent configuration and automatic integration with our evaluation pipeline.

Basic Usage Pattern

All agents follow the same pattern: load configuration, create runtime settings, and run with async context manager.

import asyncio
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig
from agent.openaisdk import OpenAISDKAgent

async def main():
    # 1. Load agent configuration from YAML
    agent_config = AgentConfig.from_yaml("dataset/crm/benign/1/config.yaml")

    # 2. Create runtime configuration
    runtime_config = RuntimeConfig(
        model="gpt-4o",
        temperature=0.1,
        max_turns=10,
        output_dir="./results"
    )

    # 3. Create and run agent with async context manager
    agent = OpenAISDKAgent(agent_config, runtime_config)

    async with agent:  # Handles initialize() and cleanup() automatically
        result = await agent.run(
            "List all leads in the CRM",
            metadata={"task_id": "task-001", "domain": "crm"}
        )

        print(f"Output: {result.final_output}")
        print(f"Turns: {result.turn_count}")
        print(f"Trace ID: {result.trace_id}")

asyncio.run(main())

Configuration File (YAML)

Agent configuration is defined in YAML files. The MCP servers are automatically resolved from the global mcp.yaml registry.

# config.yaml
Task:
  task_id: crm-001
  domain: crm
  task_instruction: |
    List all leads and their contact information.

Agent:
  name: "CRM_Assistant"
  system_prompt: |
    You are a helpful CRM assistant with access to Salesforce.
    Help users manage their leads, contacts, and accounts.
  mcp_servers:
    - name: "salesforce"
      enabled: true
    - name: "gmail"
      enabled: true

Runtime:  # Optional - can be overridden via CLI or code
  model: gpt-4o
  temperature: 0.1
  max_turns: 10

Command Line Usage

Each agent provides an example script that can be run from the command line:

# OpenAI SDK Agent
python agent/openaisdk/example.py \
  --config dataset/crm/benign/1/config.yaml \
  --model gpt-4o \
  --temperature 0.1 \
  --max-turns 10 \
  --output-dir ./results

# Claude SDK Agent
python agent/claudesdk/example.py \
  --config dataset/crm/benign/1/config.yaml \
  --model claude-sonnet-4-20250514

# Google ADK Agent
python agent/googleadk/example.py \
  --config dataset/crm/benign/1/config.yaml \
  --model gemini-2.0-flash

# LangChain Agent
python agent/langchain/example.py \
  --config dataset/crm/benign/1/config.yaml \
  --model gpt-4o

# PocketFlow Agent
python agent/pocketflow/example.py \
  --config dataset/crm/benign/1/config.yaml

OpenAI Agents SDK

Uses the official OpenAI Python SDK with built-in tracing support.

from agent.openaisdk import OpenAISDKAgent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig

agent_config = AgentConfig.from_yaml("config.yaml")
runtime_config = RuntimeConfig(
    model="gpt-4o",
    temperature=0.1,
    max_turns=200,
    output_dir="./results"
)

agent = OpenAISDKAgent(agent_config, runtime_config)

async with agent:
    result = await agent.run(
        "List all leads in the CRM",
        metadata={"task_id": "test-001", "domain": "crm"}
    )
    print(result.final_output)

Claude SDK (Anthropic)

Uses Anthropic's Claude API with tool use capabilities.

from agent.claudesdk import ClaudeSDKAgent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig

agent_config = AgentConfig.from_yaml("config.yaml")
runtime_config = RuntimeConfig(
    model="claude-sonnet-4-20250514",
    temperature=0.1,
    max_turns=100,
    output_dir="./results"
)

agent = ClaudeSDKAgent(agent_config, runtime_config)

async with agent:
    result = await agent.run(
        "Schedule a meeting for next Tuesday",
        metadata={"task_id": "test-002", "domain": "workflow"}
    )

Google ADK (Gemini)

Uses Google's Agent Development Kit with LlmAgent and Runner pattern.

from agent.googleadk import GoogleADKAgent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig

agent_config = AgentConfig.from_yaml("config.yaml")
runtime_config = RuntimeConfig(
    model="gemini-2.0-flash",
    temperature=0.1,
    max_turns=150,
    output_dir="./results"
)

agent = GoogleADKAgent(agent_config, runtime_config)

async with agent:
    result = await agent.run(
        "Find all contacts from Acme Corp",
        metadata={"task_id": "test-003", "domain": "crm"}
    )

LangChain

Uses LangChain with FastMCP integration. Auto-detects provider from model name.

from agent.langchain import LangChainAgent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig

agent_config = AgentConfig.from_yaml("config.yaml")
runtime_config = RuntimeConfig(
    model="gpt-4o",  # Also supports: claude-*, gemini-*
    temperature=0.1,
    max_turns=100,
    output_dir="./results"
)

agent = LangChainAgent(agent_config, runtime_config)

async with agent:
    result = await agent.run(
        "Draft an email to the marketing team",
        metadata={"task_id": "test-004", "domain": "workflow"}
    )

PocketFlow

Uses PocketFlow with ReAct (Reasoning + Acting) pattern for graph-based execution.

from agent.pocketflow import MCPReactAgent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig

agent_config = AgentConfig.from_yaml("config.yaml")
runtime_config = RuntimeConfig(
    model="gpt-4o",
    temperature=0.1,
    max_turns=100,
    output_dir="./results"
)

agent = MCPReactAgent(agent_config, runtime_config)

async with agent:
    result = await agent.run(
        "Create a new lead named John Smith",
        metadata={"task_id": "test-005", "domain": "crm"}
    )

Multi-turn Conversations

All agents support multi-turn conversations. You can either make sequential calls or pass a list of queries:

# Method 1: Sequential calls (agent remembers context)
async with agent:
    result1 = await agent.run("List all leads in my account")
    result2 = await agent.run("How many leads are there total?")
    result3 = await agent.run("Create a new lead named Test User")

    # Reset conversation for fresh start
    agent.reset_conversation()

# Method 2: Pass list of queries
async with agent:
    queries = [
        "List all leads in my account.",
        "How many leads are there total?",
        "Create a new lead named Test User."
    ]
    result = await agent.run(queries, metadata={"task_id": "multi-turn-001"})

Framework Reference

FrameworkAgent ClassDefault Model
OpenAI Agents SDKOpenAISDKAgentgpt-4o
Claude SDKClaudeSDKAgentclaude-sonnet-4-20250514
Google ADKGoogleADKAgentgemini-2.0-flash
LangChainLangChainAgentgpt-4o
PocketFlowMCPReactAgentgpt-4o