You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am very confused about this code segment in backpropagate method in self_play.py:
when len(self.config.players) is 2,
in line 423: min_max_stats.update(node.reward + self.config.discount * -node.value()),
why we use -node.value()) rather than node.value()) here,
in my understanding, node.value() is calculated from the perspective of the player corresponding to the node .
in line 425: value = ( -node.reward if node.to_play == to_play else node.reward ) + self.config.discount * value
when node.to_play == to_play is True, why we use -node.reward + self.config.discount * value rather than node.reward + self.config.discount * value here, ?
Is it because node.reward is obtained from the perspective of the parent node of the current node?
Looking forward to your reply!
The text was updated successfully, but these errors were encountered:
Thanks for you open-sourced code very much.
I am very confused about this code segment in backpropagate method in self_play.py:
when len(self.config.players) is 2,
in line 423:
min_max_stats.update(node.reward + self.config.discount * -node.value())
,why we use
-node.value())
rather thannode.value())
here,in my understanding,
node.value()
is calculated from the perspective of the player corresponding to thenode
.in line 425:
value = ( -node.reward if node.to_play == to_play else node.reward ) + self.config.discount * value
when
node.to_play == to_play is True
, why we use-node.reward + self.config.discount * value
rather thannode.reward + self.config.discount * value
here, ?Is it because
node.reward
is obtained from the perspective of the parent node of the currentnode
?Looking forward to your reply!
The text was updated successfully, but these errors were encountered: