Use Custom Models
Want to use a different LLM provider, local model, or custom endpoint? Configure our existing agent wrappers to use your custom model while keeping all the evaluation infrastructure.
Flexibility
Our agent wrappers accept model configuration through
RuntimeConfig. You can specify any model string supported by the underlying SDK.OpenAI-Compatible Endpoints
Use any OpenAI-compatible API (vLLM, Ollama, Together AI, Groq, etc.) with our OpenAI SDK agent:
import os
from agent.openaisdk import OpenAISDKAgent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig
# Set custom base URL for OpenAI-compatible endpoint
os.environ["OPENAI_BASE_URL"] = "http://localhost:8000/v1" # vLLM
# Or: "http://localhost:11434/v1" # Ollama
# Or: "https://api.together.xyz/v1" # Together AI
# Or: "https://api.groq.com/openai/v1" # Groq
agent_config = AgentConfig.from_yaml("dataset/crm/benign/1/config.yaml")
runtime_config = RuntimeConfig(
model="meta-llama/Llama-3.1-70B-Instruct", # Your model name
temperature=0.1,
max_turns=100,
output_dir="./results"
)
agent = OpenAISDKAgent(agent_config, runtime_config)
await agent.initialize()
result = await agent.run("List all contacts")
await agent.cleanup()Local Models with Ollama
# First, start Ollama with your model
# ollama run llama3.1:70b
import os
from agent.openaisdk import OpenAISDKAgent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig
os.environ["OPENAI_BASE_URL"] = "http://localhost:11434/v1"
os.environ["OPENAI_API_KEY"] = "ollama" # Ollama doesn't need a real key
runtime_config = RuntimeConfig(
model="llama3.1:70b",
temperature=0.1,
max_turns=100,
output_dir="./results"
)
agent = OpenAISDKAgent(
AgentConfig.from_yaml("dataset/crm/malicious/1/config.yaml"),
runtime_config
)
await agent.initialize()
result = await agent.run("Search for leads named John")
await agent.cleanup()High-Performance with vLLM
# Start vLLM server
# python -m vllm.entrypoints.openai.api_server \
# --model meta-llama/Llama-3.1-70B-Instruct \
# --port 8000 \
# --tensor-parallel-size 4
import os
from agent.openaisdk import OpenAISDKAgent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig
os.environ["OPENAI_BASE_URL"] = "http://localhost:8000/v1"
os.environ["OPENAI_API_KEY"] = "dummy" # vLLM doesn't validate keys
runtime_config = RuntimeConfig(
model="meta-llama/Llama-3.1-70B-Instruct",
temperature=0.1,
max_turns=200,
output_dir="./results"
)
agent = OpenAISDKAgent(
AgentConfig.from_yaml("dataset/workflow/benign/1/config.yaml"),
runtime_config
)
await agent.initialize()
result = await agent.run("Draft an email to sales team")
await agent.cleanup()Claude Model Variants
Use different Claude models with our Claude SDK agent:
from agent.claudesdk import ClaudeSDKAgent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig
# Available Claude models
models = [
"claude-opus-4-20250514", # Most capable
"claude-sonnet-4-20250514", # Balanced
"claude-3-5-haiku-20241022", # Fast and efficient
]
runtime_config = RuntimeConfig(
model="claude-opus-4-20250514", # Choose your model
temperature=0.1,
max_turns=100,
output_dir="./results"
)
agent = ClaudeSDKAgent(
AgentConfig.from_yaml("dataset/crm/malicious/2/config.yaml"),
runtime_config
)
await agent.initialize()
result = await agent.run("Update lead status to qualified")
await agent.cleanup()Gemini Model Variants
from agent.googleadk import GoogleADKAgent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig
# Available Gemini models
models = [
"gemini-2.0-flash-exp", # Latest experimental
"gemini-1.5-pro", # Production-ready
"gemini-1.5-flash", # Fast inference
]
runtime_config = RuntimeConfig(
model="gemini-2.0-flash-exp",
temperature=0.1,
max_turns=150,
output_dir="./results"
)
agent = GoogleADKAgent(
AgentConfig.from_yaml("dataset/workflow/benign/2/config.yaml"),
runtime_config
)
await agent.initialize()
result = await agent.run("Schedule a meeting for Friday")
await agent.cleanup()Model Configuration Reference
| Provider | Environment Variable | Example Models |
|---|---|---|
| OpenAI | OPENAI_API_KEY | gpt-4o, gpt-4o-mini, o1-preview |
| Anthropic | ANTHROPIC_API_KEY | claude-opus-4, claude-sonnet-4 |
GOOGLE_API_KEY | gemini-2.0-flash, gemini-1.5-pro | |
| Together AI | OPENAI_BASE_URL + OPENAI_API_KEY | meta-llama/Llama-3.1-70B |
| Groq | OPENAI_BASE_URL + GROQ_API_KEY | llama-3.1-70b-versatile |
| Local (Ollama) | OPENAI_BASE_URL | llama3.1:70b, mixtral:8x7b |
| Local (vLLM) | OPENAI_BASE_URL | Any HuggingFace model |
Tips for Custom Models
- 1.Ensure your model supports function/tool calling for MCP tool integration
- 2.For local models, allocate sufficient GPU memory for tool-heavy tasks
- 3.Test with benign scenarios first before running adversarial evaluations
- 4.Monitor token usage - some evaluations can be token-intensive