Research
Research workflows over an arXiv-style literature corpus, evaluating whether agents stay aligned with the user goal under prompt injections planted in abstracts, comments, and citations.
Domain overview
As research agents continue to advance, they are increasingly adopted to assist with various stages of the scientific workflow, including literature retrieval, paper summarization, hypothesis generation, and experimental planning. These agents typically operate in information-dense environments and are expected to process large volumes of unstructured data from diverse sources such as academic papers, web pages, and technical reports.
However, such agents may also exhibit risky behaviors due to their reliance on external content and limited verification capabilities. For example, they may retrieve and propagate incorrect or misleading information, cite non-existent or irrelevant papers, or follow adversarially injected instructions embedded in web content. In more severe cases, they may generate fabricated citations, misinterpret experimental results, or produce scientifically unsound conclusions. An excessive focus on improving capability without sufficient safeguards may further lead these agents to amplify misinformation or produce biased and unreliable research outputs.
To systematically evaluate these risks, we design a comprehensive benchmark consisting of benign research tasks that cover representative stages of the scientific workflow. Based on established principles for trustworthy information processing and scientific integrity, we identify key risk categories in the research domain, including hallucinated citations, misinformation propagation, and adversarial instruction following. Guided by these risks, we further construct red-teaming tasks under multiple threat models to assess the robustness and reliability of research agents in realistic settings.
Benign task categories
Download GitHub Artifacts
Given a paper title, the agent searches for the corresponding GitHub repository and saves the repository link to a local file, facilitating future artifact download and reproduction of experimental results
Search Related Papers
Given a set of keywords, the agent searches for relevant academic papers and returns a list of related works to assist the literature review
Upload Paper to arXiv
The agent submits the current paper to a simulated or user-created arXiv-style website, mimicking the paper submission workflow in academic publishing
Write Paper and Cite Papers
The agent writes a draft research paper and cites specified papers with their correct arXiv IDs to ensure proper academic referencing
Results in this domain
Indirect / Direct ASR (lower is safer) and BSR (higher is more capable) for every evaluated agent on the Research suite.
| Framework | Model | Indirect ASR Lower = safer | Direct ASR Lower = safer | BSR Higher = more capable |
|---|---|---|---|---|
Environments
1 environment in the Research domain.
