From 485829e076bfda52dd874526910812c7b6e1885b Mon Sep 17 00:00:00 2001 From: Boyu Gou <103808989+boyugou@users.noreply.github.com> Date: Thu, 12 Dec 2024 03:42:36 -0500 Subject: [PATCH] Update add_paper_here.md --- add_paper_here.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/add_paper_here.md b/add_paper_here.md index 2f90f8f..de65486 100644 --- a/add_paper_here.md +++ b/add_paper_here.md @@ -828,9 +828,9 @@ - [Autonomous Evaluation and Refinement of Digital Agents](https://arxiv.org/abs/2404.06474) - Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr - - 🏛️ Institutions: Unknown + - 🏛️ Institutions: UCB, UMich - 📅 Date: April 9, 2024 - - 📑 Publisher: arXiv + - 📑 Publisher: COLM 2024 - 💻 Env: [Web, Desktop] - 🔑 Key: [framework], [benchmark], [evaluation model], [domain transfer] - 📖 TLDR: This paper presents an autonomous evaluation framework for digital agents to enhance performance on web navigation and device control. The study introduces modular, cost-effective evaluators achieving up to 92.9% accuracy in benchmarks like WebArena and outlines their use in fine-tuning agents, improving state-of-the-art by 29% without additional supervision.