如何得到训练rl模型对内置ai的胜率 #34

Zhouchenhaoustc · 2023-02-10T02:21:21Z

我希望能得到模型对ai的胜率，因此我将
finally:
print("Parameter_Server end:")
print("--------------------")
print("mean_points_list:", mean_points_list)
print("latest_mean_points:", latest_mean_points)
print("max_mean_points:", max_mean_points)
print("win_rate_list:", win_rate_list)
print("latest_win_rate:", latest_win_rate)
print("max_win_rate:", max_win_rate)
print("--------------------")
这一段代码反注释并重新训练rl模型，但是最后输出的结果为0，我想知道怎么做才能得到您文章中的胜率结果

liuruoze · 2023-02-10T07:00:44Z

这些内容不需要反注释

你只要按照在说明中给的步骤，跑run.py就可以了

建议跑rl的时候用我们给的sl模型

sl模型也可以自己训练，可是这个训练的过拟合的概率可能会较高，用我们给的sl模型较为保险

Zhouchenhaoustc · 2023-02-10T07:56:48Z

这些内容不需要反注释

你只要按照在说明中给的步骤，跑run.py就可以了

建议跑rl的时候用我们给的sl模型

sl模型也可以自己训练，可是这个训练的过拟合的概率可能会较高，用我们给的sl模型较为保险

那我之前直接跑run.py的情况下好像得不到胜率，请问怎么才能看到rl模型的胜率呢

liuruoze · 2023-02-10T08:55:11Z

胜率用log来看，你把训练后的log曲线用tensorboard打开，整个胜率的增长过程（一开始多少，然后涨到了多少，最终多少）都一清二楚，通过一条线来表达。

胜率不是用output的输出来看的。

Zhouchenhaoustc · 2023-02-18T03:33:51Z

胜率用log来看，你把训练后的log曲线用tensorboard打开，整个胜率的增长过程（一开始多少，然后涨到了多少，最终多少）都一清二楚，通过一条线来表达。

胜率不是用output的输出来看的。

您好，我用tensorboard打开log时，发现没有显示胜率曲线，而是只有loss函数的曲线，想问一下是因为什么原因

liuruoze · 2023-02-20T01:35:17Z

你的程序跑了多久？使用了多少进程和线程跑的？

我现在猜测你跑的游戏局数（episode）可能根本没到100。训练程序会每100局统计一次胜率，即用当前胜的局数除以100就是胜率。假设这100局有32局获胜了，胜率就是0.32。如果要跑出一条曲线的话，至少要好几百局（这样才有好几个点可以连成一条线）。

你现在的情况可能是连100局都没有跑到，因此连最开始的统计胜率的那个节点都没有运行到。

Zhouchenhaoustc · 2023-02-22T14:24:45Z

你的程序跑了多久？使用了多少进程和线程跑的？

我现在猜测你跑的游戏局数（episode）可能根本没到100。训练程序会每100局统计一次胜率，即用当前胜的局数除以100就是胜率。假设这100局有32局获胜了，胜率就是0.32。如果要跑出一条曲线的话，至少要好几百局（这样才有好几个点可以连成一条线）。

你现在的情况可能是连100局都没有跑到，因此连最开始的统计胜率的那个节点都没有运行到。

我也认为是局数不足。我的运行时间是30分钟，进程应该是8个，运行结束时提示 Beyond the max_episodes, return! ，因此我尝试修改MAX_EPISODES的值，但是我发现将其从44500改为442000后，运行时间并没有变化。那我怎么才能运行正常的局数呢？

agent_0 get final reward 1.0
agent_0 get outcome 1
Parameter_Server winloss_list [1]
Parameter_Server points_list [1.0]
Parameter_Server Exception cause return, Detials of the Exception: index -1 is out of bounds for axis 0 with size 0
Beyond the max_episodes, return!
RequestQuit command received.
Traceback (most recent call last):
File "/home/ustc-lc1/zhouch/AlphaStar/alphastarmini/core/rl/rl_vs_inner_bot_mp.py", line 786, in Parameter_Server
episode_outcome[row, col] = outcome
IndexError: index -1 is out of bounds for axis 0 with size 0
Closing Application...

Parameter_Server end:

mean_points_list: []
unable to parse websocket frame.
latest_mean_points: 0.0
max_mean_points: 0.0
win_rate_list: []
latest_win_rate: 0.0
max_win_rate: 0.0

liuruoze · 2023-02-23T00:30:37Z

在param.py里有个on_server，你要改成ture（false的情况是用来你在本地测试用的，只有改成ture才会运行大批量测试，所以在服务器上的param.py和在本地的param.py的代码是不一样的，专门用来区分服务器和本地，其它代码则是一样的，这样可以简化部署）

当on_server = True以后，在rl_vs_inner_bot_mp.py里最上面几行

SIMPLE_TEST = not P.on_server if SIMPLE_TEST: MAX_EPISODES = 1 ACTOR_NUMS = 1 PARALLEL = 1 GAME_STEPS_PER_EPISODE = 18000 MAX_FRAMES = 18000 * 5 else: MAX_EPISODES = 4 * 4 * 500 ACTOR_NUMS = 2 # 2 PARALLEL = 8 + 7 * 1 GAME_STEPS_PER_EPISODE = 18000 MAX_FRAMES = 18000 * MAX_EPISODES

中‘MAX_EPISODES = 4 * 4 * 500’ 这个值是你需要调整的episodes数量

4 * 4 * 500一般来说，你运行不到这么多，所以你根据需要调整

另外，推荐用nohup指令来运行这种多进程的训练，这样你可以随时通过kill掉这个nohup的主进程来停止训练（当kill掉主进程以后，其它的进程会同时结束）

Zhouchenhaoustc · 2023-03-02T14:16:43Z

在param.py里有个on_server，你要改成ture（false的情况是用来你在本地测试用的，只有改成ture才会运行大批量测试，所以在服务器上的param.py和在本地的param.py的代码是不一样的，专门用来区分服务器和本地，其它代码则是一样的，这样可以简化部署）

当on_server = True以后，在rl_vs_inner_bot_mp.py里最上面几行

SIMPLE_TEST = not P.on_server if SIMPLE_TEST: MAX_EPISODES = 1 ACTOR_NUMS = 1 PARALLEL = 1 GAME_STEPS_PER_EPISODE = 18000 MAX_FRAMES = 18000 * 5 else: MAX_EPISODES = 4 * 4 * 500 ACTOR_NUMS = 2 # 2 PARALLEL = 8 + 7 * 1 GAME_STEPS_PER_EPISODE = 18000 MAX_FRAMES = 18000 * MAX_EPISODES

中‘MAX_EPISODES = 4 * 4 * 500’ 这个值是你需要调整的episodes数量

4 * 4 * 500一般来说，你运行不到这么多，所以你根据需要调整

另外，推荐用nohup指令来运行这种多进程的训练，这样你可以随时通过kill掉这个nohup的主进程来停止训练（当kill掉主进程以后，其它的进程会同时结束）

您好，我使用了在'MAX_EPISODES = 4 * 4 * 500'下训练了24h，但是得到的胜率只有一个值，但正常应该一百个episode输出一个胜率，这是因为什么问题

liuruoze · 2023-03-08T06:28:00Z

有一个值是代表超过100局了，但没有运行到200局，

你的

ACTOR_NUMS

与

PARALLEL

这两个参数设置的是多少？

Zhouchenhaoustc · 2023-03-08T12:05:18Z

ACTOR_NUMS = 2

PARALLEL = 8 + 7 * 1

我没有改过这两个值

liuruoze · 2023-03-13T10:03:57Z

有点奇怪，你能跟我说下你的电脑配置么？

如果电脑的GPU卡不多的，建议把PARALLEL设置低点

PARALLEL可以设置成跟你机器拥有的GPU卡的数目一样的值。

liuruoze · 2023-03-16T04:25:49Z

如果出来的比较慢，可以把这个值STATIC_NUM设置的小点，例如30，不能太小，否则结果会比较抖动。

Zhouchenhaoustc · 2023-03-16T13:06:05Z

我按照服务器GPU数修改PARALLEL的值为2，但是运行时间和次数变少了，tensorbored上显示的运行时间从25h变成了14h

liuruoze · 2023-03-17T01:20:01Z

winrate能够看到有线么？

liuruoze · 2023-03-17T01:21:09Z

两个GPU确实有点少了，而且你的GPU估计也不如V100。

我们这里是8个V100全力并行运行一天左右的结果。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

如何得到训练rl模型对内置ai的胜率 #34

如何得到训练rl模型对内置ai的胜率 #34

Zhouchenhaoustc commented Feb 10, 2023

liuruoze commented Feb 10, 2023

Zhouchenhaoustc commented Feb 10, 2023

liuruoze commented Feb 10, 2023

Zhouchenhaoustc commented Feb 18, 2023

liuruoze commented Feb 20, 2023

Zhouchenhaoustc commented Feb 22, 2023

liuruoze commented Feb 23, 2023

Zhouchenhaoustc commented Mar 2, 2023

liuruoze commented Mar 8, 2023

Zhouchenhaoustc commented Mar 8, 2023

liuruoze commented Mar 13, 2023

liuruoze commented Mar 16, 2023

Zhouchenhaoustc commented Mar 16, 2023

liuruoze commented Mar 17, 2023

liuruoze commented Mar 17, 2023

如何得到训练rl模型对内置ai的胜率 #34

如何得到训练rl模型对内置ai的胜率 #34

Comments

Zhouchenhaoustc commented Feb 10, 2023

liuruoze commented Feb 10, 2023

Zhouchenhaoustc commented Feb 10, 2023

liuruoze commented Feb 10, 2023

Zhouchenhaoustc commented Feb 18, 2023

liuruoze commented Feb 20, 2023

Zhouchenhaoustc commented Feb 22, 2023

Parameter_Server end:

mean_points_list: [] unable to parse websocket frame. latest_mean_points: 0.0 max_mean_points: 0.0 win_rate_list: [] latest_win_rate: 0.0 max_win_rate: 0.0

liuruoze commented Feb 23, 2023

Zhouchenhaoustc commented Mar 2, 2023

liuruoze commented Mar 8, 2023

Zhouchenhaoustc commented Mar 8, 2023

liuruoze commented Mar 13, 2023

liuruoze commented Mar 16, 2023

Zhouchenhaoustc commented Mar 16, 2023

liuruoze commented Mar 17, 2023

liuruoze commented Mar 17, 2023

mean_points_list: []
unable to parse websocket frame.
latest_mean_points: 0.0
max_mean_points: 0.0
win_rate_list: []
latest_win_rate: 0.0
max_win_rate: 0.0