Question about advantage function computation #6

pruksmhc · 2023-05-04T18:29:41Z

In this line:

Line 49 in 5b9ac3f

for t in reversed(range(gen_len)):

What is the purpose of iterating through in the 0th dimension, given the tensor is of size '1 b' (so it will only iterate once).

xrsrke · 2023-05-07T09:05:40Z

Hi @pruksmhc. Sorry for the delayed response. The compute_advantage_and_return function treats all the text in a batch as an episode. Therefore, technically, the batch size is not one.

pruksmhc · 2023-05-07T14:31:05Z

Thanks for the response!
However, in this line

instructGOOSE/instruct_goose/trainer.py

Line 39 in 5b9ac3f

rewards = rearrange(rewards, 'b -> 1 b')

, you cast the tensor to size 1 in dimension 0, and gen_len is len(rewards), which is 1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about advantage function computation #6

Question about advantage function computation #6

pruksmhc commented May 4, 2023

xrsrke commented May 7, 2023

pruksmhc commented May 7, 2023

Question about advantage function computation #6

Question about advantage function computation #6

Comments

pruksmhc commented May 4, 2023

xrsrke commented May 7, 2023

pruksmhc commented May 7, 2023