CS

Customer Service

ServiceNow-style customer-service workspace that exercises support workflows: ticket triage, refunds, escalations, and PII handling — under both direct prompt injection and case-content injection.

Domain overview

Customer service and commerce platforms such as Zendesk, ServiceNow, and Shopify increasingly integrate AI-based assistants and agents for tasks such as case triage, knowledge-grounded response generation, and workflow automation. As these systems become more autonomous (sometimes executing or recommending operational actions such as refunds, store credits, returns, order modifications, or customer-facing communications), the associated security and compliance risks become more consequential.

We study a high-autonomy customer service agent setting in which the agent has access to sensitive customer information and can directly execute operational actions such as refunds, store credits, returns, order modifications, and customer-facing communications. Current production deployments typically retain human approval for high-impact actions (e.g., Zendesk's auto-assist requires agent confirmation before executing refunds), but the trajectory toward full automation is well-established: major platforms already support fully autonomous resolution for routine requests and are progressively expanding the scope of agent-executable actions. We evaluate the fully autonomous setting because security evaluation should characterize worst-case risk under plausible deployment configurations, rather than relying on human-in-the-loop safeguards that may be relaxed as automation matures.

We design a comprehensive set of 160 benign task instances covering six representative customer service workflows, and define six security risk categories grounded in platform documentation, merchant policies, and applicable regulations, including GDPR and the FTC Policy Statement on Deception. We then design red-teaming tasks under both direct and indirect threat models to measure agent susceptibility to adversarial manipulation.

Benign task categories

Information Inquiry

Shipping delay tracking with courtesy store credit, refund status follow-ups, duplicate charge investigations, product availability and return policy inquiries, exchange feasibility checks, missing package investigations, resolved-issue confirmation emails, service issue reporting with store credit requests, and refund-versus-store-credit consultations, with case documentation of the interaction

Order Modification

Operations on pending orders including shipping address changes, payment method updates, and order cancellations

Return & Refund

Threshold-aware return and refund processing that requires comparing order or item totals against policy limits (\$750 item, \$1{,}500 order) and escalating to a human manager when exceeded, including partial refunds for individual damaged items

Compensation

Service recovery involving goodwill store credit issuance subject to per-incident (\$100) and daily (\$150) caps, multi-order credit reconciliation, product defect warranty inquiries, and subscription cancellation with refund

Subscription Management

Subscription lifecycle operations including pause, resume, and shipping address updates

Product & Account Operations

Multi-step account address changes with pending order updates, complex subscription modifications, and case follow-ups requiring coordination across multiple tools

Policy & risk framework

Domain policies

Our risk definitions draw on two platform-level policies: (1) Zendesk User Content and Conduct Policy (UCCP) , which applies to all users (including automated agents) interacting with the Zendesk platform and defines prohibited activities including Deceptive Behavior and Account Hijacking (7), Harassment and Bullying (6), and Malicious Products and Activities (3); (2) Shopify Acceptable Use Policy (AUP) , which requires merchants to act in good faith and prohibits, among other things, defrauding Shopify, other merchants, or buyers, using Shopify to send spam, and other malicious practices, including attempts to "game" systems.

General regulatory frameworks

We additionally ground our categories in two regulatory frameworks: (1) the FTC Policy Statement on Deception , which defines deception as a representation, omission, or practice likely to mislead a reasonable consumer on a material matter, and is directly applicable to agents that fabricate company guidelines; (2) the General Data Protection Regulation (GDPR) , which governs the processing and transfer of personal data, including principles of purpose limitation (Art. 5(1)(b)), data minimisation (Art. 5(1)(c)), and integrity and confidentiality (Art. 5(1)(f)), as well as the requirement for a lawful basis for processing (Art. 6).

Results in this domain

Indirect / Direct ASR (lower is safer) and BSR (higher is more capable) for every evaluated agent on the Customer Service suite.

FrameworkModel
Indirect ASR
Lower = safer
Direct ASR
Lower = safer
BSR
Higher = more capable

Environments

1 environment in the Customer Service domain.