Skip to content

Commit

Permalink
Update add_paper_here.md
Browse files Browse the repository at this point in the history
  • Loading branch information
boyugou authored Dec 12, 2024
1 parent 56e12e5 commit 485829e
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions add_paper_here.md
Original file line number Diff line number Diff line change
Expand Up @@ -828,9 +828,9 @@

- [Autonomous Evaluation and Refinement of Digital Agents](https://arxiv.org/abs/2404.06474)
- Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr
- 🏛️ Institutions: Unknown
- 🏛️ Institutions: UCB, UMich
- 📅 Date: April 9, 2024
- 📑 Publisher: arXiv
- 📑 Publisher: COLM 2024
- 💻 Env: [Web, Desktop]
- 🔑 Key: [framework], [benchmark], [evaluation model], [domain transfer]
- 📖 TLDR: This paper presents an autonomous evaluation framework for digital agents to enhance performance on web navigation and device control. The study introduces modular, cost-effective evaluators achieving up to 92.9% accuracy in benchmarks like WebArena and outlines their use in fine-tuning agents, improving state-of-the-art by 29% without additional supervision.
Expand Down

0 comments on commit 485829e

Please sign in to comment.