[BugFix] More flexible episode_reward computation in logger #136
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes the way episode rewards are computed in BenchMARL
Here is an overview:
BenchMARL will be looking at the global
done
(always assumed to be set), which can usually be computed usingany
orall
over the single agents dones.In all cases the global done is what is used to compute the episode reward.
We log
episode_reward
min, mean, max over episodes at three different levels:Requiremment
When agents are done and the global done is not set, agents should be getting a reward of 0 (if you are not using global rewards)
Fixes #135