Skip to content

Commit

Permalink
ver Apr30th
Browse files Browse the repository at this point in the history
updated documents and version number
  • Loading branch information
zdy023 committed Apr 30, 2024
1 parent b352978 commit 5621818
Show file tree
Hide file tree
Showing 9 changed files with 64 additions and 28 deletions.
17 changes: 11 additions & 6 deletions Changelog
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
2024-03-25 Danyang Zhang <[email protected]>
2024-04-30 Danyang Zhang <[email protected]>

Fixed bugs in VhIoWrapper w.r.t. null view hierarchy from OS
v3.6

* android_env/wrappers/vh_io_wrapper.py
Updated documents.

2024-03-23 Danyang Zhang <[email protected]>
2024-03-25 Danyang Zhang <[email protected]>

Fixed bugs

* android_env/environment.py
* demos/openmoneybox.add_billings.textproto
* (Bugs caused by new remote_path arg): android_env/environment.py
* (Typos, bugs w.r.t. new episode end event, removed reset steps which
cause AVD haulting): demos/openmoneybox.add_billings.textproto
* (Bugs w.r.t null VH from OS): android_env/wrappers/vh_io_wrapper.py

Updated new types of ADB operations for SetupStep

Expand Down Expand Up @@ -91,6 +93,9 @@

Fixed bugs w.r.t. ResponseEvent

Updated episode end event slots by making it triggered only when the
returned value is True

* android_env/proto/task.proto
* android_env/components/event_listeners.py
* android_env/components/task_manager.py
Expand Down
12 changes: 10 additions & 2 deletions README-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,13 @@

## 最近更新

* (2024-04-30 v3.6)
* 更新了加载远程模拟器的函数,用以为远程资源提供不同于本地任务定义文件所在目录的路径
* 更新了任务模板工具,增加了新的槽位修饰符与任务配置文件语法
* 修复了已知的问题

具体信息请查看[更新日志](Changelog)和相关文档。

* (2023-12-18 v3.5)
* 由于检查视图框架和屏幕图像耗时较长,因此更新了机制来更灵活地管理在什么时机检查视图框架与屏幕图像,以平衡对历程事件的充分检查的需求和所带来交互时延升高
*`ResponseEvent`(回复事件)添加了多种评分方式:正则匹配、模糊匹配、向量编码匹配
Expand Down Expand Up @@ -139,10 +146,11 @@ pip install .

```bibtex
@article{DanyangZhang2023_MobileEnv,
title = {{Mobile-Env}: An Evaluation Platform and Benchmark for Interactive Agents in LLM Era},
title = {{Mobile-Env}: An Evaluation Platform and Benchmark for LLM-GUI Interaction},
author = {Danyang Zhang and
Lu Chen and
Hongshen Xu and
Zihan Zhao and
Lu Chen and
Ruisheng Cao and
Kai Yu},
journal = {CoRR},
Expand Down
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,14 @@

## NEWS!!

* (2024-04-30 v3.6)
* Updated function to load a remote simulator to enable providing the remote
resources with a different path with the path of the local task definition
file.
* Updated task template toolkit, added new slot modifiers and sytaxes for
task config file.
* Fixed known bugs.

* (2023-12-18 v3.5)
* Owing to the long time delay of VH check and screenshot check, we updated
the mechanism of managing the check time. By this way, the requirement of
Expand Down Expand Up @@ -217,8 +225,9 @@ the following BibTeX:
@article{DanyangZhang2023_MobileEnv,
title = {{Mobile-Env}: An Evaluation Platform and Benchmark for LLM-GUI Interaction},
author = {Danyang Zhang and
Lu Chen and
Hongshen Xu and
Zihan Zhao and
Lu Chen and
Ruisheng Cao and
Kai Yu},
journal = {CoRR},
Expand Down
17 changes: 13 additions & 4 deletions docs/other-tools-en.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,15 +207,24 @@ keywords: bake,lobster,tails
```

As for the config file of task token combination `<taskname>.task`, each line
in the file should specify a file name of the task token config like:
in the file should specify a file name of the task token config and an optional
combination option like:

```
search-lobster
access_author-Bob
access_author-Bob sr
```

The instances will be combined in order during instantiation and become the
small steps of the final large multi-step task.
Currently, there are two combination options: `s` and `r`. If `s` is specified,
the setup steps (`setup_steps`) of the current task token will be appended into
the final task definition. Similarly, if `r` is specified, the reset steps
(`reset_steps`) of the current task token will be appended. If no combination
options are specified, any setup or reset steps won't be appended, *i.e.*, only
the steps added for the preceding task tokens are preserved. By default, the
setup ans reset steps of the first task token will always be preserved in the
final combined task definition. The instances will be combined in the
declaration order during instantiation and become the small steps of the final
large multi-step task.

##### The Syntax of the Modifiers

Expand Down
6 changes: 3 additions & 3 deletions docs/other-tools-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,14 +148,14 @@ name: search-task
keywords: bake,lobster,tails
```

组合任务元的配置文件`<taskname>.task`,每行指定一个任务元配置的文件名,如:
组合任务元的配置文件`<taskname>.task`,每行指定一个任务元配置的文件名,以及一个可选的组合选项,如:

```
search-lobster
access_author-Bob
access_author-Bob sr
```

实例化时,各实例会按序组合,成为最终的大的多步任务中的小步骤。
目前有两种组合选项:`s``r`。若指定了`s`选项,则组合该任务元时会在最终任务定义的任务载入操作(`setup_steps`)中追加该任务元的载入操作;若指定了`r`选项,则会类似地追加该任务元的重置操作(`reset_steps`);若未指定任何组合选项,则不会加入该任务元的任何载入或重置操作,即,只保留前缀任务元已加入的那些操作。每个组合任务的首个任务元的载入与重置操作都会默认保留在最终的组合任务中。各任务元会在实例化时按声明顺序组合,成为最终的大的多步任务中的小步骤。

##### 修饰符语法

Expand Down
19 changes: 12 additions & 7 deletions docs/task-definition-en.md
Original file line number Diff line number Diff line change
Expand Up @@ -766,7 +766,10 @@ is the same with the screen text event sources.

The response event source will react to the response from the agent to human
user. If the response matches with the defined pattern, the event source will
be triggered and return a tuple comprising all the regex-captured groups.
be triggered. Mobile-Env supports various matching methods, spanning regex
match, fuzzy match, and embedding match. If regex match is adopted, the source
will return a tuple comprising all the regex-captured groups, or the source
will return the numeric match score.

##### The Event Slots

Expand All @@ -789,8 +792,9 @@ The episode end event slot (`episode_end_listener`) indicated if the episode
comes to the end and the platform will restart the task at the next step. This
usually means that the agent has just achieved the task target. But it is also
possible that several severe errors have occured and the system cannot resume
and has to restart. Only the triggering flag of this event slot makes sense and
it returns no further values to the agent.
and has to restart. ~~Only the triggering flag of this event slot makes sense
and it returns no further values to the agent.~~ Only when the event slot is
triggered and the returned value is `True`, the episode will be restarted.

The instruction event slot (`instruction_listener`) gives the agent the novel
step supplementary instructions during the interaction. This slot accepts and
Expand Down Expand Up @@ -876,10 +880,11 @@ The options of `event` is the aforementioned event sources:
* `floating` - A floating reference
+ `log_event` - Matches the system log lines and requires two fields:
- `filters` - An array of string for the log filters like `jd:D`. The system
logs are obtained by the command `adb logcat -v epoch FILTERS *:S`, where
`FILTERS` is all the filter names declared in the definition. All the
filters declared across the log event sources in the definition file will
be merged (with duplicates removed) before invoking the ADB command.
logs are obtained by the command [`adb logcat -v epoch FILTERS
*:S`](https://developer.android.com/tools/logcat), where `FILTERS` is all
the filter names declared in the definition. All the filters declared
across the log event sources in the definition file will be merged (with
duplicates removed) before invoking the ADB command.
- `pattern` - The regex for the expected log line.
+ `response_event` - Matches the response to human user.
- `mode` - Enum indicating the matching method. Valid options are: `REGEX`,
Expand Down
6 changes: 3 additions & 3 deletions docs/task-definition-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -580,15 +580,15 @@ $$

日志事件源会监听系统运行日志,检查每一行是否匹配定义的正则表达式。若匹配,且满足触发条件,则事件会触发,并返回所有正则捕获组构成的元组,这与两类屏幕文本事件源相同。

回复事件源会响应智能体给人类用户的回复。若回复匹配上了定义好的模式,则会触发该事件,并返回所有正则捕获组构成的元组
回复事件源会响应智能体给人类用户的回复。若回复匹配上了定义好的模式,则会触发该事件。本平台支持不同的匹配方式:正则匹配、模糊匹配、向量匹配。采用正则匹配方式时,会返回所有正则捕获组构成的元组;采用其他匹配方式(模糊匹配、向量匹配)时,则会返回匹配度数值

##### 事件槽

此节将介绍平台定义的六个事件槽。每个事件槽关联于一类智能体能够感知到的历程信号。

分数事件槽(`score_listener`)和回报事件槽(`reward_listener`),都用于解析要反馈给智能体的回报。不同之处在于,其对接收到的信号的解释:分数事件槽将接收到的信号解释为一个累积的分数,若有分数事件触发,则平台会将当前读到的新分数与上次触发时记录的分数作差,作为该步骤的单步回报;而回报事件槽则认为事件树呈递上来的信号就是单步回报,因此会直接返回。两个事件槽计算出的单步回报会在相加后,反馈给智能体。

历程结束事件槽(`episode_end_listener`),用来指示交互到达了终点,平台会在下一步重启任务。这通常是由于智能体已达成了任务目标,但也可能是由于系统出现了难以恢复的错误而需要重启系统。历程终点事件槽,仅其触发与否的状态有意义,而不会给智能体反馈任何其他值。
历程结束事件槽(`episode_end_listener`),用来指示交互到达了终点,平台会在下一步重启任务。这通常是由于智能体已达成了任务目标,但也可能是由于系统出现了难以恢复的错误而需要重启系统。~~历程终点事件槽,仅其触发与否的状态有意义,而不会给智能体反馈任何其他值~~当且仅当历程结束事件槽触发且得到的值为`True`时,平台会重启任务

指令事件槽(`instruction_listener`),用来定义任务进行中,某些关键步骤需要补充给智能体的新步骤指令。其要接收、返回的都是字符串列表,每个列表元素代表一行或一句指令。

Expand Down Expand Up @@ -626,7 +626,7 @@ $$
* `integer` - 提供整数参考值
* `floating` - 提供浮点数参考值
+ `log_event` - 识别系统日志中的行,提供两个字段:
- `filters` - 字符串数组,提供要使用的过滤器,如`jd:D`;系统日志是由`adb logcat -v epoch FILTERS *:S`命令获得的,其中`FILTERS`为任务定义中指定的所有过滤器;该定义文件中所有的日志事件源中声明了的过滤器,会去重后混合在一起用于调用该ADB命令
- `filters` - 字符串数组,提供要使用的过滤器,如`jd:D`;系统日志是由[`adb logcat -v epoch FILTERS *:S`](https://developer.android.com/tools/logcat)命令获得的,其中`FILTERS`为任务定义中指定的所有过滤器;该定义文件中所有的日志事件源中声明了的过滤器,会去重后混合在一起用于调用该ADB命令
- `pattern` - 提供要识别的日志行的正则表达式
+ `response_event` - 识别智能体给人类用户的回复
- `mode` - 枚举值,指定匹配事件的方式,可选`REGEX``DIFFLIB``FUZZ``SBERT``REGEX`为采用正则匹配,`DIFFLIB`为采用`difflib`做模糊匹配,`FUZZ`为采用`rapidfuzz`库做模糊匹配,`SBERT`为采用`sentence-tranformers`库计算嵌入向量匹配(`FUZZ`模式下的匹配度范围为0~100)
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "mobile-env-rl"
version = "3.5"
version = "3.6"
authors = [{name = "Danyang Zhang @X-Lance", email = "[email protected]"}]
license = {file = "LICENSE"}
description = "A Universal Platform for Training and Evaluation of Mobile Interaction"
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ def run(self):

setup(
name='mobile-env-rl',
version='3.5',
version='3.6',
description='Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction',
long_description=description,
author='Danyang Zhang @X-Lance',
Expand Down

0 comments on commit 5621818

Please sign in to comment.