Skip to content

Commit

Permalink
Update add_paper_here.md
Browse files Browse the repository at this point in the history
  • Loading branch information
boyugou authored Nov 2, 2024
1 parent f812fa6 commit 4d2e326
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions add_paper_here.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
- [ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents](https://sites.google.com/view/st-webagentbench/home)
- Ido Levy, Ben Wiesel, Sami Marreed, Alon Oved, Avi Yaeli, Segev Shlomov
- 🏛️ Institutions: IBM Research
- 📅 Date: October 9, 2024
- 📑 Publisher: arXiv
- 💻 Env: [Web]
- 🔑 Key: [benchmark], [safety], [trustworthiness], [ST-WebAgentBench]
- 📖 TLDR: This paper introduces **ST-WebAgentBench**, a benchmark designed to evaluate the safety and trustworthiness of web agents in enterprise contexts. It defines safe and trustworthy agent behavior, outlines the structure of safety policies, and introduces the "Completion under Policies" metric to assess agent performance. The study reveals that current state-of-the-art agents struggle with policy adherence, highlighting the need for improved policy awareness and compliance in web agents.


- [From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents](https://arxiv.org/abs/2409.13701)
- Nalin Tiwary, Vardhan Dongre, Sanil Arun Chawla, Ashwin Lamani, Dilek Hakkani-Tür
- 🏛️ Institutions: UIUC
Expand Down

0 comments on commit 4d2e326

Please sign in to comment.