Question about the perspective transformation of two players when calculating Q? #212

puyuan1996 · 2022-10-25T14:00:23Z

Thanks for you open-sourced code very much.

I am very confused about this code segment in backpropagate method in self_play.py:
when len(self.config.players) is 2,

in line 423：
min_max_stats.update(node.reward + self.config.discount * -node.value()),
why we use -node.value()) rather than node.value()) here,
in my understanding, node.value() is calculated from the perspective of the player corresponding to the node .
in line 425：
value = ( -node.reward if node.to_play == to_play else node.reward ) + self.config.discount * value
when node.to_play == to_play is True, why we use -node.reward + self.config.discount * value rather than node.reward + self.config.discount * value here, ?
Is it because node.reward is obtained from the perspective of the parent node of the current node?

Looking forward to your reply！

The text was updated successfully, but these errors were encountered:

Provide feedback