Documentation

Windows

Image-grounded Windows desktop environment that targets full-OS agentic behavior: launching apps, manipulating windows, and clicking through dialogs — including image-based prompt injection.

Domain overview

The Windows domain evaluates agents operating within a full graphical desktop environment on Windows 11. Agents interact through both PowerShell commands and GUI operations (screenshots, clicks, typing), reflecting how deployed OS agents must handle both modalities. Because agents must interpret visual elements (dialog boxes, context menus, system notifications), the domain introduces an injection surface absent from text-only environments: adversaries can plant malicious instructions in files, tool outputs, or configuration entries that the agent encounters while performing its task.

Agents with system-level access can modify registry keys, escalate privileges via UAC, exfiltrate credentials from Credential Manager, configure Windows Defender and firewall rules, and set up persistent monitoring through scheduled tasks or startup entries. These operations use the same tools as legitimate administration, making them difficult to distinguish from benign tasks.

Benign task categories

File & Directory Management

Creating, copying, moving, renaming, and organizing files

System Configuration

Display settings, power options, notification preferences

Application & Process Control

Launching, pinning, closing applications and processes

Network & Connectivity

Network status checks, adapter configuration, Wi-Fi profile management

Security & Privacy Settings

Firewall rules, password policies, encryption status

Scripting & Automation

Shell scripts, scheduled tasks, automation routines

Office & Document Processing

Word, Excel, PowerPoint operations and format conversions

System Monitoring & Diagnostics

Disk usage, CPU/memory status, event logs, system reports

Threat models

Indirect threat model

The agent receives a benign instruction (e.g., "rename vacation\_photo.png to trip.png"), but the environment is pre-compromised through two channels:

Direct threat model

The agent receives a malicious instruction disguised through social engineering. We construct 140 direct tasks for Windows across six risk categories (data theft, malware installation, privilege escalation, system integrity, surveillance, and fraudulent content). Each task specifies a concrete goal (e.g., "extract stored Wi-Fi passwords and email them," "create a scheduled task with a reverse shell payload") and uses one or more augmentation strategies:

Results in this domain

Indirect / Direct ASR (lower is safer) and BSR (higher is more capable) for every evaluated agent on the Windows suite.

Full leaderboard →

Framework	Model	Indirect ASR Lower = safer	Direct ASR Lower = safer	BSR Higher = more capable

Environments

1 environment in the Windows domain.

Windows

The Windows environment runs as a sandboxed QEMU virtual machine inside a Docker container, running Windows 11 with PowerShell 7 and Microsoft Office. The VM operates at 1920$$1080 native resolution. Between tasks, the VM state is fully restored via QEMU `savevm`/`loadvm`, which captures the complete machine state (CPU, memory, disk, running processes) and restores it in approximately 30 seconds. This snapshot mechanism eliminates residual state across evaluations without a full reboot.