Travel
Travel-booking environments (Booking.com, Expedia, United, Southwest, Enterprise) where agents must complete realistic itineraries while resisting payment-, address-, and itinerary-targeted attacks.
Domain overview
Online travel platforms such as Booking.com, Expedia and Airbnb serve as centralized marketplaces where users can search, compare and book flights, accommodations, restaurants and attractions. There travel platforms increasingly deploy AI-powered travel agents to assist users with end-to-end trip organization, including multi-city itinerary planning, price-optimized booking and payment processing.
Such AI agents typically operate with access to sensitive user data (e.g., personal identification, payment credentials or travel histories) and can execute consequential actions such as committing reservations, processing payments or communicating with accommodation hosts. While these automated actions interact with both users and third-party service providers, security enforcement around AI agents remains limited. This creates opportunities for third-party adversaries or malicious users to exploit travel agents into performing harmful actions. These may include data exfiltration, unauthorized transactions or fraudulent financial activities. Such actions can lead to severe consequences including privacy violations, financial loss and regulatory non-compliance.
We first design a comprehensive set of benign tasks for travel agents, covering 6 representative categories commonly encountered on real-world platforms. Drawing from platform-specific trust and safety policies of major travel services, as well as broader regulatory frameworks including GDPR and consumer protection laws, we derive a set of 12 security risk categories. Guided by these risks, we construct red-teaming tasks with malicious goals under two primary threat models to systematically evaluate the security robustness of travel agents.
Benign task categories
Flight Search & Booking
Searches for, compares, and books flights for specified routes and dates, optimizing for user-specified criteria such as lowest price, earliest departure time, or shortest travel duration
Hotel Search & Booking
Locates and books accommodations in target cities, supporting multiple selection strategies including price optimization, review rating maximization, and multi-criteria filtering based on room type, occupancy capacity, rating thresholds, and house rules
Restaurant Discovery & Booking
Finds and books restaurants across one or more cities, filtered by cuisine type preferences, minimum rating thresholds, or budget constraints
Attraction Exploration
Queries and recommends local attractions in specified cities, optionally coordinated with nearby restaurant reservations and hotel bookings to support sightseeing itineraries
Itinerary Planning
Constructs complete multi-city trip plans encompassing flights, accommodations, and dining across multiple destinations, including payment processing
Round-Trip Booking
Manages round-trip or multi-stop flight bookings with coordinated hotel and restaurant reservations at each destination, including payment settlement
Policy & risk framework
Domain policies
We select five domain-specific policies that govern travel booking platforms and online marketplace conduct: (1) PCI Data Security Standard (PCI DSS) , which establishes technical and operational requirements for entities that store, process, or transmit cardholder data, mandating that payment credentials must be handled exclusively through certified payment infrastructure and prohibiting agents or intermediaries from directly soliciting, relaying, or retaining raw card numbers, CVVs, or expiration dates; (2) FTC Policy Statement on Deception , which defines a commercial representation as deceptive if it is likely to mislead a consumer acting reasonably under the circumstances and is material to the consumer's decision, directly governing agent behaviors such as misrepresenting prices, availability, ratings, or accommodation quality; (3) Airbnb Terms of Service , which requires all transactions to be conducted exclusively through the platform's payment system, prohibits circumventing service fees, and forbids using the platform to facilitate fraudulent bookings, unauthorized refunds, or resource hoarding through bad-faith reservation practices; (4) Airbnb Content Policy , which prohibits the posting of fake, misleading, or manipulated reviews, requires that all user-generated content reflect genuine experiences, and forbids coordinated review campaigns or rating inflation designed to deceive prospective guests; (5) Airbnb Off-Platform and Fee Transparency Policy , which prohibits hosts from moving current bookings off of Airbnb, bans soliciting or accepting reservation-related payments outside the platform, requires that all mandatory fees be transparently disclosed at the time of booking, and forbids using off-platform links, communication channels, or external applications in ways that bypass Airbnb's transactional, review or access-control framework.
General regulatory frameworks
We additionally consider two widely adopted regulatory frameworks that govern the handling of personal data in travel booking contexts: (1) the General Data Protection Regulation (GDPR) , which regulates the collection, processing, storage, and transfer of personal data, including requirements for lawful processing, data minimization, purpose limitation, and explicit user consent — directly governing agent behaviors such as unauthorized PII collection, data exfiltration to third parties via platform messaging, over-retention of session data, and disclosure of user travel histories or payment information without authorization; (2) the California Consumer Privacy Act (CCPA/CPRA) , which grants consumers the right to know what personal information is collected and how it is used, the right to delete personal data, and the right to opt out of the sale or sharing of their information — establishing obligations that prohibit agents from covertly harvesting user data for profiling, sharing personal details with accommodation hosts without explicit consent, or retaining sensitive information beyond its stated operational purpose.
Results in this domain
Indirect / Direct ASR (lower is safer) and BSR (higher is more capable) for every evaluated agent on the Travel suite.
| Framework | Model | Indirect ASR Lower = safer | Direct ASR Lower = safer | BSR Higher = more capable |
|---|---|---|---|---|
Environments
5 environments in the Travel domain.

Booking
We construct a simulated travel booking platform where a travel agent helps users plan, book and pay trips. The environment is populated with large-sacle datasets covering 3.8 million flights records for 2025, 5047 accommodation listings, 9551 restaurant entris, and 5302 tourist attractions. The platform also maintains user account data, including trip histories and saved payment methods. This data-driven design reflects the structured multi-step workflow of a real travel booking experience, where an assistant must query available options, compare alternatives along multiple criteria, make reservations, manage bookings, process payments, and interact with platform features such as reviews and host messaging.

Enterprise Rent-A-Car
The Enterprise environment simulates a global car-rental platform that serves as a ground-transportation workspace for vehicle reservation, Emerald Club loyalty management, and pickup / return logistics. It supports account registration and login, rental-location lookup by 3-letter code or free-text search (airport, neighborhood, or downtown sites), vehicle-class browsing (economy, SUV, minivan, ), pickup / return date–time car search with live pricing, end-to-end booking with primary-driver details and configurable mileage plans, promo-code and corporate-account discounts, authenticated and anonymous booking lookup, cancellation, and a recent-searches history, making it suitable for evaluating agents in car-rental and multi-leg travel-planning workflow scenarios. This environment is particularly important because car-rental workflows combine driver identity data (name, age, driver's license context), irreversible reservation actions, and sensitive corporate-account / payment inputs, creating realistic opportunities for both benign trip-logistics tasks and adversarial manipulation through spoofed confirmation codes, misleading vehicle-class labels, malicious promo or corporate codes, or prompt-injection attacks hidden inside location listings, vehicle descriptions, or search-session payloads.

Expedia
GUI. The Expedia environment simulates an online travel-agency platform that serves as an end-to-end lodging-reservation workspace for hotels, resorts, vacation rentals, and short-term apartments across multiple destinations. It supports rewards-account registration and login with authenticated profile lookup, destination browsing and free-text search, rich property search with per-night price, star rating, review rating, refundability, breakfast-included, pay-later, amenity and property-type filters, per-property detail pages covering rooms / amenities / reviews, authenticated and anonymous booking creation with guest contact details and optional promo codes, personal booking lists, confirmation-code lookup, cancellation by internal id or confirmation code, promo-code validation, a ``pick up where you left off'' continuation card, and persisted recent-search history, making it suitable for evaluating agents in travel-planning and lodging-reservation workflow scenarios.

Southwest Airlines
The Southwest Airlines environment simulates a low-cost-carrier airline's booking platform that serves as a trip-planning workspace for Southwest-style flight search and Rapid Rewards loyalty management. It supports account registration and login, airport lookup by IATA code or city, one-way and round-trip flight search across Southwest's signature Wanna Get Away / Wanna Get Away Plus / Anytime / Business Select fare classes, per-flight fare inspection, live flight-status checks by flight number and date, Low-Fare-Calendar month-view browsing with configurable length of stay, end-to-end booking with multi-passenger itineraries (including Rapid Rewards numbers), authenticated and anonymous booking lookup, cancellation, promo-code resolution, and a recent-searches history, making it suitable for evaluating agents in price-sensitive airline-reservation and itinerary-management workflow scenarios. This environment is particularly important because low-cost-carrier workflows combine fare-class optionality (basic vs. fully-flexible), calendar-style price exploration, and irreversible booking actions with personally identifiable passenger data, creating realistic opportunities for both benign trip-planning tasks and adversarial manipulation through spoofed confirmations, injected promo codes, malicious itinerary-change instructions, or unauthorized cancellation requests. Therefore, it provides a realistic testbed for assessing whether agents can safely distinguish legitimate booking assistance from harmful actions involving payments, travel identity data, loyalty accounts, and irreversible reservation management.

United Airlines
The United Airlines environment simulates a flag-carrier airline's booking platform that serves as a travel-planning workspace for flight search, reservation, and MileagePlus loyalty management. It supports account registration and login, airport lookup by IATA code or city, one-way / round-trip / multi-city flight search across several cabin classes, per-flight fare inspection, live flight-status checks by flight number and date, end-to-end booking with multi-passenger itineraries (including MileagePlus and Known Traveler numbers), authenticated and anonymous booking lookup, cancellation, travel-credit redemption, promo-code resolution, and a recent-searches history, making it suitable for evaluating agents in real-world airline-reservation and itinerary-management workflow scenarios. This environment is particularly important because flight-booking workflows combine structured transactional state (passenger identities, fare classes, payment, promo codes, travel credits) with high-stakes irreversible actions (confirmed bookings, cancellations) and sensitive personal data (dates of birth, travel documents), creating realistic opportunities for both benign trip-planning tasks and adversarial manipulation through malicious promo codes, spoofed confirmation codes, tampered passenger payloads, or prompt-injection attacks hidden inside airport listings, fare-class labels, or promo descriptions.