Windows
Image-grounded Windows desktop environment that targets full-OS agentic behavior: launching apps, manipulating windows, and clicking through dialogs — including image-based prompt injection.
Domain overview
The Windows domain evaluates agents operating within a full graphical desktop environment on Windows 11. Agents interact through both PowerShell commands and GUI operations (screenshots, clicks, typing), reflecting how deployed OS agents must handle both modalities. Because agents must interpret visual elements (dialog boxes, context menus, system notifications), the domain introduces an injection surface absent from text-only environments: adversaries can plant malicious instructions in files, tool outputs, or configuration entries that the agent encounters while performing its task.
Agents with system-level access can modify registry keys, escalate privileges via UAC, exfiltrate credentials from Credential Manager, configure Windows Defender and firewall rules, and set up persistent monitoring through scheduled tasks or startup entries. These operations use the same tools as legitimate administration, making them difficult to distinguish from benign tasks.
Benign task categories
File & Directory Management
Creating, copying, moving, renaming, and organizing files
System Configuration
Display settings, power options, notification preferences
Application & Process Control
Launching, pinning, closing applications and processes
Network & Connectivity
Network status checks, adapter configuration, Wi-Fi profile management
Security & Privacy Settings
Firewall rules, password policies, encryption status
Scripting & Automation
Shell scripts, scheduled tasks, automation routines
Office & Document Processing
Word, Excel, PowerPoint operations and format conversions
System Monitoring & Diagnostics
Disk usage, CPU/memory status, event logs, system reports
Threat models
Indirect threat model
The agent receives a benign instruction (e.g., "rename vacation\_photo.png to trip.png"), but the environment is pre-compromised through two channels:
Direct threat model
The agent receives a malicious instruction disguised through social engineering. We construct 140 direct tasks for Windows across six risk categories (data theft, malware installation, privilege escalation, system integrity, surveillance, and fraudulent content). Each task specifies a concrete goal (e.g., "extract stored Wi-Fi passwords and email them," "create a scheduled task with a reverse shell payload") and uses one or more augmentation strategies:
Results in this domain
Indirect / Direct ASR (lower is safer) and BSR (higher is more capable) for every evaluated agent on the Windows suite.
| Framework | Model | Indirect ASR Lower = safer | Direct ASR Lower = safer | BSR Higher = more capable |
|---|---|---|---|---|
Environments
1 environment in the Windows domain.