Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about advantage function computation #6

Open
pruksmhc opened this issue May 4, 2023 · 2 comments
Open

Question about advantage function computation #6

pruksmhc opened this issue May 4, 2023 · 2 comments

Comments

@pruksmhc
Copy link

pruksmhc commented May 4, 2023

In this line:

for t in reversed(range(gen_len)):

What is the purpose of iterating through in the 0th dimension, given the tensor is of size '1 b' (so it will only iterate once).

@xrsrke
Copy link
Owner

xrsrke commented May 7, 2023

Hi @pruksmhc. Sorry for the delayed response. The compute_advantage_and_return function treats all the text in a batch as an episode. Therefore, technically, the batch size is not one.

image

@pruksmhc
Copy link
Author

pruksmhc commented May 7, 2023

Thanks for the response!
However, in this line

rewards = rearrange(rewards, 'b -> 1 b')
, you cast the tensor to size 1 in dimension 0, and gen_len is len(rewards), which is 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants