Use Custom Models

Want to use a different LLM provider, local model, or custom endpoint? Configure our existing agent wrappers to use your custom model while keeping all the evaluation infrastructure.

Flexibility

Our agent wrappers accept model configuration through RuntimeConfig. You can specify any model string supported by the underlying SDK.

OpenAI-Compatible Endpoints

Use any OpenAI-compatible API (vLLM, Ollama, Together AI, Groq, etc.) with our OpenAI SDK agent:

import os
from agent.openaisdk import OpenAISDKAgent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig

# Set custom base URL for OpenAI-compatible endpoint
os.environ["OPENAI_BASE_URL"] = "http://localhost:8000/v1"  # vLLM
# Or: "http://localhost:11434/v1"  # Ollama
# Or: "https://api.together.xyz/v1"  # Together AI
# Or: "https://api.groq.com/openai/v1"  # Groq

agent_config = AgentConfig.from_yaml("dataset/crm/benign/1/config.yaml")
runtime_config = RuntimeConfig(
    model="meta-llama/Llama-3.1-70B-Instruct",  # Your model name
    temperature=0.1,
    max_turns=100,
    output_dir="./results"
)

agent = OpenAISDKAgent(agent_config, runtime_config)
await agent.initialize()

result = await agent.run("List all contacts")
await agent.cleanup()

Local Models with Ollama

# First, start Ollama with your model
# ollama run llama3.1:70b

import os
from agent.openaisdk import OpenAISDKAgent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig

os.environ["OPENAI_BASE_URL"] = "http://localhost:11434/v1"
os.environ["OPENAI_API_KEY"] = "ollama"  # Ollama doesn't need a real key

runtime_config = RuntimeConfig(
    model="llama3.1:70b",
    temperature=0.1,
    max_turns=100,
    output_dir="./results"
)

agent = OpenAISDKAgent(
    AgentConfig.from_yaml("dataset/crm/malicious/1/config.yaml"),
    runtime_config
)
await agent.initialize()

result = await agent.run("Search for leads named John")
await agent.cleanup()

High-Performance with vLLM

# Start vLLM server
# python -m vllm.entrypoints.openai.api_server \
#     --model meta-llama/Llama-3.1-70B-Instruct \
#     --port 8000 \
#     --tensor-parallel-size 4

import os
from agent.openaisdk import OpenAISDKAgent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig

os.environ["OPENAI_BASE_URL"] = "http://localhost:8000/v1"
os.environ["OPENAI_API_KEY"] = "dummy"  # vLLM doesn't validate keys

runtime_config = RuntimeConfig(
    model="meta-llama/Llama-3.1-70B-Instruct",
    temperature=0.1,
    max_turns=200,
    output_dir="./results"
)

agent = OpenAISDKAgent(
    AgentConfig.from_yaml("dataset/workflow/benign/1/config.yaml"),
    runtime_config
)
await agent.initialize()

result = await agent.run("Draft an email to sales team")
await agent.cleanup()

Claude Model Variants

Use different Claude models with our Claude SDK agent:

from agent.claudesdk import ClaudeSDKAgent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig

# Available Claude models
models = [
    "claude-opus-4-20250514",      # Most capable
    "claude-sonnet-4-20250514",    # Balanced
    "claude-3-5-haiku-20241022",   # Fast and efficient
]

runtime_config = RuntimeConfig(
    model="claude-opus-4-20250514",  # Choose your model
    temperature=0.1,
    max_turns=100,
    output_dir="./results"
)

agent = ClaudeSDKAgent(
    AgentConfig.from_yaml("dataset/crm/malicious/2/config.yaml"),
    runtime_config
)
await agent.initialize()

result = await agent.run("Update lead status to qualified")
await agent.cleanup()

Gemini Model Variants

from agent.googleadk import GoogleADKAgent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig

# Available Gemini models
models = [
    "gemini-2.0-flash-exp",        # Latest experimental
    "gemini-1.5-pro",              # Production-ready
    "gemini-1.5-flash",            # Fast inference
]

runtime_config = RuntimeConfig(
    model="gemini-2.0-flash-exp",
    temperature=0.1,
    max_turns=150,
    output_dir="./results"
)

agent = GoogleADKAgent(
    AgentConfig.from_yaml("dataset/workflow/benign/2/config.yaml"),
    runtime_config
)
await agent.initialize()

result = await agent.run("Schedule a meeting for Friday")
await agent.cleanup()

Model Configuration Reference

Provider	Environment Variable	Example Models
OpenAI	`OPENAI_API_KEY`	gpt-4o, gpt-4o-mini, o1-preview
Anthropic	`ANTHROPIC_API_KEY`	claude-opus-4, claude-sonnet-4
Google	`GOOGLE_API_KEY`	gemini-2.0-flash, gemini-1.5-pro
Together AI	`OPENAI_BASE_URL` + `OPENAI_API_KEY`	meta-llama/Llama-3.1-70B
Groq	`OPENAI_BASE_URL` + `GROQ_API_KEY`	llama-3.1-70b-versatile
Local (Ollama)	`OPENAI_BASE_URL`	llama3.1:70b, mixtral:8x7b
Local (vLLM)	`OPENAI_BASE_URL`	Any HuggingFace model

Tips for Custom Models

1.Ensure your model supports function/tool calling for MCP tool integration
2.For local models, allocate sufficient GPU memory for tool-heavy tasks
3.Test with benign scenarios first before running adversarial evaluations
4.Monitor token usage - some evaluations can be token-intensive

Documentation