It seems training not working(rewards don't converge at all) #9

zhiyiZeng · 2022-08-26T05:14:10Z

This repo is pretty awesome. I'm trying to run a basic demo, but the training process seems not working at all (rewards don't converge at all). However, the agent still outperforms B&H a lot.(even when the reward is negative! ) I'm confused by this situation. Is there an explanation about this?

The graph is rewards with training epochs=50.

zhiyiZeng · 2022-08-26T05:35:27Z

I tries to make learning rate smaller, but it seems that the performance gets worse. Therefore, I don't think learning rate is the issue here.

zhiyiZeng · 2022-08-26T05:36:38Z

I think the model learns something, but that something doesn't reflect by the rewards. I wonder why is that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It seems training not working(rewards don't converge at all) #9

It seems training not working(rewards don't converge at all) #9

zhiyiZeng commented Aug 26, 2022

zhiyiZeng commented Aug 26, 2022

zhiyiZeng commented Aug 26, 2022

It seems training not working(rewards don't converge at all) #9

It seems training not working(rewards don't converge at all) #9

Comments

zhiyiZeng commented Aug 26, 2022

zhiyiZeng commented Aug 26, 2022

zhiyiZeng commented Aug 26, 2022