[Release] v2.4.0 (#1254)

sotetsuk · Sep 26, 2024 · c5dd751 · c5dd751
1 parent d8447fe
commit c5dd751
Show file tree

Hide file tree

Showing 4 changed files with 20 additions and 34 deletions.
diff --git a/README.md b/README.md
@@ -119,7 +119,7 @@ Use `pgx.available_envs() -> Tuple[EnvId]` to see the list of currently availabl
 |<a href="https://en.wikipedia.org/wiki/Chess">Chess</a><br>`"chess"` |<img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/chess_dark.gif" width="60px"><img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/chess_light.gif" width="60px">| `v2` | *Checkmate opponent's king to win.* |
 |<a href="https://en.wikipedia.org/wiki/Connect_Four">Connect Four</a><br>`"connect_four"` |<img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/connect_four_dark.gif" width="60px"><img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/connect_four_light.gif" width="60px">| `v0` | *Connect discs, win with four.* |
 |<a href="https://en.wikipedia.org/wiki/Minichess">Gardner Chess</a><br>`"gardner_chess"`|<img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/gardner_chess_dark.gif" width="60px"><img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/gardner_chess_light.gif" width="60px">| `v0` | *5x5 chess variant, excluding castling.* |
-|<a href="https://en.wikipedia.org/wiki/Go_(game)">Go</a><br>`"go_9x9"` `"go_19x19"` |<img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/go-19x19_dark.gif" width="60px"><img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/go-19x19_light.gif" width="60px">| `v0` | *Strategically place stones, claim territory.* |
+|<a href="https://en.wikipedia.org/wiki/Go_(game)">Go</a><br>`"go_9x9"` `"go_19x19"` |<img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/go-19x19_dark.gif" width="60px"><img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/go-19x19_light.gif" width="60px">| `v1` | *Strategically place stones, claim territory.* |
 |<a href="https://en.wikipedia.org/wiki/Hex_(board_game)">Hex</a><br>`"hex"` |<img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/hex_dark.gif" width="60px"><img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/hex_light.gif" width="60px">| `v0` | *Connect opposite sides, block opponent.* |
 |<a href="https://en.wikipedia.org/wiki/Kuhn_poker">Kuhn Poker</a><br>`"kuhn_poker"` |<img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/kuhn_poker_dark.gif" width="60px"><img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/kuhn_poker_light.gif" width="60px">| `v1` | *Three-card betting and bluffing game.* |
 |<a href="https://arxiv.org/abs/1207.1411">Leduc hold'em</a><br>`"leduc_holdem"` |<img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/leduc_holdem_dark.gif" width="60px"><img src="https://raw.githubusercontent.com/sotetsuk/pgx/main/docs/assets/leduc_holdem_light.gif" width="60px">| `v0` | *Two-suit, limited deck poker.* |

diff --git a/docs/go.md b/docs/go.md
@@ -26,7 +26,7 @@ or you can directly load `Go` class
 ```py
 from pgx.go import Go
 
-env = Go(size=19, komi=6.5)
+env = Go(size=19, komi=7.5)
 ```
 
 ## Description
@@ -39,46 +39,31 @@ env = Go(size=19, komi=6.5)
 
 The rule implemented in Pgx follows [Tromp-Taylor Rules](https://tromp.github.io/go.html).
 
-!!! note "Komi"
+!!! note Komi
 
-    By default, we use `6.5`. Users can set different `komi` at `Go` class constructor.
+    By default, we use `7.5`. Users can set different `komi` at `Go` class constructor.
 
 
-!!! note "Ko"
+!!! note Superko rule
 
-    On PSK implementations.
+    The Tromp-Taylor rule enforces PSK (Positional Superko). However, strictly implementing PSK to determine legal moves is inefficient, as it requires computing the hash for all possible subsequent board states. Since PSK rarely occurs—based on our best knowledge—most implementations compromise. For example:
 
-    Tromp-Taylor rule employ PSK. However, implementing strict PSK is inefficient because
-
-    - Simulator has to store all previous board (or hash) history, and
-    - Agent also has to remember all previous board to avoid losing by PSK
-
-    As PSK rarely happens, as far as our best knowledge, it is usual to compromise in PSK implementations.
-    For example,
-
-    - **OpenSpiel** employs SSK (instead of PSK) for computing legal actions, and if PSK action happened, the game ends with tie.
-        - Pros: Detect all PSK actions
-        - Cons: Agent cannot know why the game ends with tie (if the same board is too old)
-    - **PettingZoo** employs SSK for legal actions, and ignores even if PSK action happened.
-        - Pros: Simple
-        - Cons: PSK is totally ignored
-
-    Note that the strict rule is "PSK for legal actions, and PSK action leads to immediate lose."
-    So, we also compromise at this point, our approach is
-
-    - **Pgx** employs SSK for legal actions, PSK is approximated by up to 8-steps before board, and approximate PSK action leads to immediate lose
-        - Pros: Agent may be able to avoid PSK (as it observes board history up to 8-steps in AlphaGo Zero feature)
-        - Cons: Ignoring the old same boards
-
-    Anyway, we believe it's effect is very small as PSK rarely happens, especially in 19x19 board.
+    OpenSpiel uses SSK (Situational Superko) instead of PSK to compute legal moves, and if a PSK move occurs, the game ends in a tie. PettingZoo also uses SSK for legal moves but ignores PSK moves altogether. The strict rule is: "PSK for legal moves, and any PSK move results in an immediate loss." Like others, we also compromise. Our approach is similar to OpenSpiel:
+
+    Pgx uses SSK for legal moves, but any PSK move results in an immediate loss. Overall, we believe the impact of this compromise is minimal, especially on a 19x19 board, since PSK scenarios are rare.
 
+    | | Tromp-Taylor | OpenSpiel | PettingZoo | Pgx | 
+    |:---|:---:|:---:|:---:|:---:| 
+    | legal action | **PSK** | SSK | SSK | SSK |
+    | PSK occurrence | **loss** | tie | ignore (SSK) | loss |
+
 ## Specs
 
 Let `N` be the board size (e.g., `19`).
 
 | Name | Value |
 |:---|:----:|
-| Version | `v0` |
+| Version | `v1` |
 | Number of players | `2` |
 | Number of actions | `N x N + 1` |
 | Observation shape | `(N, N, 17)` |
@@ -97,7 +82,7 @@ We follow the observation design of AlphaGo Zero `[Silver+17]`.
 | ... | ... |
 | `obs[:, :, -1]` | color of `player_id` |
 
-!!! note "Final observation dimension"
+!!! note Final observation dimension
 
     For the final dimension, there are two possible options:
 
@@ -140,6 +125,7 @@ Termination happens when
 
 ## Version History
 
+- `v1` : Superko rule change in [#1224](https://github.com/sotetsuk/pgx/pull/1224) (v2.4.0)
 - `v0` : Initial release (v1.0.0)
 
 ## Reference
@@ -153,4 +139,4 @@ See [our paper](https://arxiv.org/abs/2303.17503) for more details. See [this co
 
 | Model ID | Description |
 |:---:|:----|
-| `go_9x9_v0`| See [our paper](https://arxiv.org/abs/2303.17503) for the training details. |
+| `go_9x9_v0`| See [our paper](https://arxiv.org/abs/2303.17503) for the training details. |
diff --git a/pgx/__init__.py b/pgx/__init__.py
@@ -4,7 +4,7 @@
 from pgx._src.visualizer import save_svg, save_svg_animation, set_visualization_config
 from pgx.core import Env, EnvId, State, available_envs, make
 
-__version__ = "2.3.0"
+__version__ = "2.4.0"
 
 __all__ = [
     # types

diff --git a/pgx/go.py b/pgx/go.py
@@ -94,7 +94,7 @@ def id(self) -> core.EnvId:
 
     @property
     def version(self) -> str:
-        return "v0"
+        return "v1"
 
     @property
     def num_players(self) -> int: