From 1ab7663fcb81efe620a2cd7abfa4ec93eebb1acf Mon Sep 17 00:00:00 2001 From: Boyu Gou <103808989+boyugou@users.noreply.github.com> Date: Sat, 4 Jan 2025 10:55:57 -0500 Subject: [PATCH] Update update_paper_list.md --- update_template_or_data/update_paper_list.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/update_template_or_data/update_paper_list.md b/update_template_or_data/update_paper_list.md index e0157d0..4cc9795 100644 --- a/update_template_or_data/update_paper_list.md +++ b/update_template_or_data/update_paper_list.md @@ -8,6 +8,16 @@ - ๐Ÿ“– TLDR: This paper conducts a comprehensive survey on OS Agents, which are (M)LLM-based agents that use computing devices (e.g., computers and mobile phones) by operating within the environments and interfaces (e.g., Graphical User Interface (GUI)) provided by operating systems (OS) to automate tasks. The survey begins by elucidating the fundamentals of OS Agents, exploring their key components including the environment, observation space, and action space, and outlining essential capabilities such as understanding, planning, and grounding. Methodologies for constructing OS Agents are examined, with a focus on domain-specific foundation models and agent frameworks. A detailed review of evaluation protocols and benchmarks highlights how OS Agents are assessed across diverse tasks. Finally, current challenges and promising future research directions, including safety and privacy, personalization and self-evolution, are discussed. +- [Language Agents: Foundations, Prospects, and Risks](https://aclanthology.org/2024.emnlp-tutorials.3/) + - Yu Su, Diyi Yang, Shunyu Yao, Tao Yu + - ๐Ÿ›๏ธ Institutions: OSU, Stanford, Princeton, HKU + - ๐Ÿ“… Date: November 2024 + - ๐Ÿ“‘ Publisher: EMNLP 2024 + - ๐Ÿ’ป Env: [Misc] + - ๐Ÿ”‘ Key: [survey], [tutorial], [reasoning], [planning], [memory], [multi-agent systems], [safty] + - ๐Ÿ“– TLDR: This tutorial provides a comprehensive exploration of language agentsโ€”autonomous systems powered by large language models capable of executing complex tasks through language instructions. It delves into their theoretical foundations, potential applications, associated risks, and future directions, covering topics such as reasoning, memory, planning, tool augmentation, grounding, multi-agent systems, and safety considerations. + + - [Ponder & Press: Advancing Visual GUI Agent towards General Computer Control](https://arxiv.org/abs/2412.01268) - Yiqin Wang, Haoji Zhang, Jingqi Tian, Yansong Tang - ๐Ÿ›๏ธ Institutions: Tsinghua University