diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 000000000..e69de29bb diff --git a/404.html b/404.html new file mode 100644 index 000000000..99765a54a --- /dev/null +++ b/404.html @@ -0,0 +1,658 @@ + + + + + + + + + + + + + + + + + + + + + + Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ +

404 - Not found

+ +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/animal_shogi/index.html b/animal_shogi/index.html new file mode 100644 index 000000000..09331333b --- /dev/null +++ b/animal_shogi/index.html @@ -0,0 +1,988 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Animal Shogi - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

AnimalShogi

+
+
+
+

+ +

+
+
+

+ +

+
+
+
+

Usage

+
import pgx
+
+env = pgx.make("animal_shogi")
+
+

or you can directly load AnimalShogi class

+
from pgx.animal_shogi import AnimalShogi
+
+env = AnimalShogi()
+
+

Description

+

Animal Shogi (Dōbutsu shōgi) is a variant of shogi primarily developed for children. It consists of a 3x4 board and four types of pieces (five including promoted pieces). One of the rule differences from regular shogi is the Try Rule, where entering the opponent's territory with the king leads to victory.

+

See also Wikipedia

+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players2
Number of actions132
Observation shape(4, 3, 194)
Observation typefloat
Rewards{-1, 0, 1}
+

Observation

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
IndexDescription
[:, :, 0:5]my pieces on board
[:, :, 5:10]opponent's pieces on board
[:, :, 10:16]my hands
[:, :, 16:22]opponent's hands
[:, :, 22:24]repetitions
......
[:, :, 193]player_id's turn'
[:, :, 194]Elapsed timesteps (normalized to 1)
+

Action

+

Uses AlphaZero like action label:

+
    +
  • 132 labels
  • +
  • Move: 8 x 12 (direction) x (source square)
  • +
  • Drop: 3 x 12 (drop piece type) x (destination square)
  • +
+

Rewards

+

Non-zero rewards are given only at the terminal states. +The reward at terminal state is described in this table:

+ + + + + + + + + + + + + + + + + + + + + +
Reward
Win+1
Lose-1
Draw0
+

Termination

+

Termination happens when

+
    +
  1. If either player's king is checkmated, or
  2. +
  3. if either king enters the opponent's territory (farthest rank)
  4. +
  5. If the same position occurs three times.
  6. +
  7. If 250 moves have passed (a unique rule in Pgx).
  8. +
+

In cases 3 and 4, the game is declared a draw.

+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/api/index.html b/api/index.html new file mode 100644 index 000000000..a580b3749 --- /dev/null +++ b/api/index.html @@ -0,0 +1,2997 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Reference - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Pgx API

+

This is the list of all public APIs of Pgx. +Two important components in Pgx are State and Env.

+ + +
+ + + +

+ pgx.State + + +

+ + +
+

+ Bases: abc.ABC

+ + +

Base state class of all Pgx game environments. Basically an immutable (frozen) dataclass. +A basic usage is generating via Env.init:

+
state = env.init(jax.random.PRNGKey(0))
+
+

and Env.step receives and returns this state class:

+
state = env.step(state, action)
+
+

Serialization via flax.struct.serialization is supported. +There are 6 common attributes over all games:

+ +

Attributes:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescription
current_player + jnp.ndarray +

id of agent to play. +Note that this does NOT represent the turn (e.g., black/white in Go). +This ID is consistent over the parallel vmapped states.

observation + jnp.ndarray +

observation for the current state. +Env.observe is called to compute.

rewards + jnp.ndarray +

the i-th element indicates the intermediate reward for +the agent with player-id i. If Env.step is called for a terminal state, +the following state.rewards is zero for all players.

terminated + jnp.ndarray +

denotes that the state is terminal state. Note that +some environments (e.g., Go) have an max_termination_steps parameter inside +and will terminate within a limited number of states (following AlphaGo).

truncated + jnp.ndarray +

indicates that the episode ends with the reason other than termination. +Note that current Pgx environments do not invoke truncation but users can use TimeLimit wrapper +to truncate the environment. In Pgx environments, some MinAtar games may not terminate within a finite timestep. +However, the other environments are supposed to terminate within a finite timestep with probability one.

legal_action_mask + jnp.ndarray +

Boolean array of legal actions. If illegal action is taken, +the game will terminate immediately with the penalty to the palyer.

+ + +
+ Source code in pgx/v1.py +
@dataclass
+class State(abc.ABC):
+    """Base state class of all Pgx game environments. Basically an immutable (frozen) dataclass.
+    A basic usage is generating via `Env.init`:
+
+        state = env.init(jax.random.PRNGKey(0))
+
+    and `Env.step` receives and returns this state class:
+
+        state = env.step(state, action)
+
+    Serialization via `flax.struct.serialization` is supported.
+    There are 6 common attributes over all games:
+
+    Attributes:
+        current_player (jnp.ndarray): id of agent to play.
+            Note that this does NOT represent the turn (e.g., black/white in Go).
+            This ID is consistent over the parallel vmapped states.
+        observation (jnp.ndarray): observation for the current state.
+            `Env.observe` is called to compute.
+        rewards (jnp.ndarray): the `i`-th element indicates the intermediate reward for
+            the agent with player-id `i`. If `Env.step` is called for a terminal state,
+            the following `state.rewards` is zero for all players.
+        terminated (jnp.ndarray): denotes that the state is terminal state. Note that
+            some environments (e.g., Go) have an `max_termination_steps` parameter inside
+            and will terminate within a limited number of states (following AlphaGo).
+        truncated (jnp.ndarray): indicates that the episode ends with the reason other than termination.
+            Note that current Pgx environments do not invoke truncation but users can use `TimeLimit` wrapper
+            to truncate the environment. In Pgx environments, some MinAtar games may not terminate within a finite timestep.
+            However, the other environments are supposed to terminate within a finite timestep with probability one.
+        legal_action_mask (jnp.ndarray): Boolean array of legal actions. If illegal action is taken,
+            the game will terminate immediately with the penalty to the palyer.
+    """
+
+    current_player: jnp.ndarray
+    observation: jnp.ndarray
+    rewards: jnp.ndarray
+    terminated: jnp.ndarray
+    truncated: jnp.ndarray
+    legal_action_mask: jnp.ndarray
+    # NOTE: _rng_key is
+    #   - used for stochastic env and auto reset
+    #   - updated only when actually used
+    #   - supposed NOT to be used by agent
+    _rng_key: jax.random.KeyArray
+    _step_count: jnp.ndarray
+
+    @property
+    @abc.abstractmethod
+    def env_id(self) -> EnvId:
+        """Environment id (e.g. "go_19x19")"""
+        ...
+
+    def _repr_html_(self) -> str:
+        return self.to_svg()
+
+    def to_svg(
+        self,
+        *,
+        color_theme: Optional[Literal["light", "dark"]] = None,
+        scale: Optional[float] = None,
+    ) -> str:
+        """Return SVG string. Useful for visualization in notebook.
+
+        Args:
+            color_theme (Optional[Literal["light", "dark"]]): xxx see also global config.
+            scale (Optional[float]): change image size. Default(None) is 1.0
+
+        Returns:
+            str: SVG string
+        """
+        from pgx._src.visualizer import Visualizer
+
+        v = Visualizer(color_theme=color_theme, scale=scale)
+        return v.get_dwg(states=self).tostring()
+
+    def save_svg(
+        self,
+        filename,
+        *,
+        color_theme: Optional[Literal["light", "dark"]] = None,
+        scale: Optional[float] = None,
+    ) -> None:
+        """Save the entire state (not observation) to a file.
+        The filename must end with `.svg`
+
+        Args:
+            color_theme (Optional[Literal["light", "dark"]]): xxx see also global config.
+            scale (Optional[float]): change image size. Default(None) is 1.0
+
+        Returns:
+            None
+        """
+        from pgx._src.visualizer import save_svg
+
+        save_svg(self, filename, color_theme=color_theme, scale=scale)
+
+
+ + + +
+ + + + + + + +
+ + + +

+env_id: EnvId + + + property + abstractmethod + + +

+ + +
+ +

Environment id (e.g. "go_19x19")

+
+ +
+ + + +
+ + + +

+save_svg(filename, *, color_theme=None, scale=None) + +

+ + +
+ +

Save the entire state (not observation) to a file. +The filename must end with .svg

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
color_theme + Optional[Literal['light', 'dark']] +

xxx see also global config.

+ None +
scale + Optional[float] +

change image size. Default(None) is 1.0

+ None +
+ +

Returns:

+ + + + + + + + + + + + + +
TypeDescription
+ None +

None

+ +
+ Source code in pgx/v1.py +
def save_svg(
+    self,
+    filename,
+    *,
+    color_theme: Optional[Literal["light", "dark"]] = None,
+    scale: Optional[float] = None,
+) -> None:
+    """Save the entire state (not observation) to a file.
+    The filename must end with `.svg`
+
+    Args:
+        color_theme (Optional[Literal["light", "dark"]]): xxx see also global config.
+        scale (Optional[float]): change image size. Default(None) is 1.0
+
+    Returns:
+        None
+    """
+    from pgx._src.visualizer import save_svg
+
+    save_svg(self, filename, color_theme=color_theme, scale=scale)
+
+
+
+ +
+ +
+ + + +

+to_svg(*, color_theme=None, scale=None) + +

+ + +
+ +

Return SVG string. Useful for visualization in notebook.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
color_theme + Optional[Literal['light', 'dark']] +

xxx see also global config.

+ None +
scale + Optional[float] +

change image size. Default(None) is 1.0

+ None +
+ +

Returns:

+ + + + + + + + + + + + + +
Name TypeDescription
str + str +

SVG string

+ +
+ Source code in pgx/v1.py +
def to_svg(
+    self,
+    *,
+    color_theme: Optional[Literal["light", "dark"]] = None,
+    scale: Optional[float] = None,
+) -> str:
+    """Return SVG string. Useful for visualization in notebook.
+
+    Args:
+        color_theme (Optional[Literal["light", "dark"]]): xxx see also global config.
+        scale (Optional[float]): change image size. Default(None) is 1.0
+
+    Returns:
+        str: SVG string
+    """
+    from pgx._src.visualizer import Visualizer
+
+    v = Visualizer(color_theme=color_theme, scale=scale)
+    return v.get_dwg(states=self).tostring()
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+ + + +

+ pgx.Env + + +

+ + +
+

+ Bases: abc.ABC

+ + +

Environment class API.

+
+

Example usage

+
env: Env = pgx.make("tic_tac_toe")
+state = env.init(jax.random.PRNGKey(0))
+action = jax.random.int32(4)
+state = env.step(state, action)
+
+
+ + +
+ Source code in pgx/v1.py +
class Env(abc.ABC):
+    """Environment class API.
+
+    !!! example "Example usage"
+
+        ```py
+        env: Env = pgx.make("tic_tac_toe")
+        state = env.init(jax.random.PRNGKey(0))
+        action = jax.random.int32(4)
+        state = env.step(state, action)
+        ```
+
+    """
+
+    def __init__(self):
+        ...
+
+    def init(self, key: jax.random.KeyArray) -> State:
+        """Return the initial state. Note that no internal state of
+        environment changes.
+
+        Args:
+            key: pseudo-random generator key in JAX
+
+        Returns:
+            State: initial state of environment
+
+        """
+        key, subkey = jax.random.split(key)
+        state = self._init(subkey)
+        state = state.replace(_rng_key=key)  # type: ignore
+        observation = self.observe(state, state.current_player)
+        return state.replace(observation=observation)  # type: ignore
+
+    def step(self, state: State, action: jnp.ndarray) -> State:
+        """Step function."""
+        is_illegal = ~state.legal_action_mask[action]
+        current_player = state.current_player
+
+        # If the state is already terminated or truncated, environment does not take usual step,
+        # but return the same state with zero-rewards for all players
+        state = jax.lax.cond(
+            (state.terminated | state.truncated),
+            lambda: state.replace(rewards=jnp.zeros_like(state.rewards)),  # type: ignore
+            lambda: self._step(state.replace(_step_count=state._step_count + 1), action),  # type: ignore
+        )
+
+        # Taking illegal action leads to immediate game terminal with negative reward
+        state = jax.lax.cond(
+            is_illegal,
+            lambda: self._step_with_illegal_action(state, current_player),
+            lambda: state,
+        )
+
+        # All legal_action_mask elements are **TRUE** at terminal state
+        # This is to avoid zero-division error when normalizing action probability
+        # Taking any action at terminal state does not give any effect to the state
+        state = jax.lax.cond(
+            state.terminated,
+            lambda: state.replace(  # type: ignore
+                legal_action_mask=jnp.ones_like(state.legal_action_mask)
+            ),
+            lambda: state,
+        )
+
+        observation = self.observe(state, state.current_player)
+        state = state.replace(observation=observation)  # type: ignore
+
+        return state
+
+    def observe(self, state: State, player_id: jnp.ndarray) -> jnp.ndarray:
+        """Observation function."""
+        obs = self._observe(state, player_id)
+        return jax.lax.stop_gradient(obs)
+
+    @abc.abstractmethod
+    def _init(self, key: jax.random.KeyArray) -> State:
+        """Implement game-specific init function here."""
+        ...
+
+    @abc.abstractmethod
+    def _step(self, state, action) -> State:
+        """Implement game-specific step function here."""
+        ...
+
+    @abc.abstractmethod
+    def _observe(self, state: State, player_id: jnp.ndarray) -> jnp.ndarray:
+        """Implement game-specific observe function here."""
+        ...
+
+    @property
+    @abc.abstractmethod
+    def id(self) -> EnvId:
+        """Environment id."""
+        ...
+
+    @property
+    @abc.abstractmethod
+    def version(self) -> str:
+        """Environment version. Updated when behavior, parameter, or API is changed.
+        Refactoring or speeding up without any expected behavior changes will NOT update the version number.
+        """
+        ...
+
+    @property
+    @abc.abstractmethod
+    def num_players(self) -> int:
+        """Number of players (e.g., 2 in Tic-tac-toe)"""
+        ...
+
+    @property
+    def num_actions(self) -> int:
+        """Return the size of action space (e.g., 9 in Tic-tac-toe)"""
+        state = self.init(jax.random.PRNGKey(0))
+        return int(state.legal_action_mask.shape[0])
+
+    @property
+    def observation_shape(self) -> Tuple[int, ...]:
+        """Return the matrix shape of observation"""
+        state = self.init(jax.random.PRNGKey(0))
+        obs = self._observe(state, state.current_player)
+        return obs.shape
+
+    @property
+    def _illegal_action_penalty(self) -> float:
+        """Negative reward given when illegal action is selected."""
+        return -1.0
+
+    def _step_with_illegal_action(
+        self, state: State, loser: jnp.ndarray
+    ) -> State:
+        penalty = self._illegal_action_penalty
+        reward = (
+            jnp.ones_like(state.rewards)
+            * (-1 * penalty)
+            * (self.num_players - 1)
+        )
+        reward = reward.at[loser].set(penalty)
+        return state.replace(rewards=reward, terminated=TRUE)  # type: ignore
+
+
+ + + +
+ + + + + + + +
+ + + +

+id: EnvId + + + property + abstractmethod + + +

+ + +
+ +

Environment id.

+
+ +
+ +
+ + + +

+num_actions: int + + + property + + +

+ + +
+ +

Return the size of action space (e.g., 9 in Tic-tac-toe)

+
+ +
+ +
+ + + +

+num_players: int + + + property + abstractmethod + + +

+ + +
+ +

Number of players (e.g., 2 in Tic-tac-toe)

+
+ +
+ +
+ + + +

+observation_shape: Tuple[int, ...] + + + property + + +

+ + +
+ +

Return the matrix shape of observation

+
+ +
+ +
+ + + +

+version: str + + + property + abstractmethod + + +

+ + +
+ +

Environment version. Updated when behavior, parameter, or API is changed. +Refactoring or speeding up without any expected behavior changes will NOT update the version number.

+
+ +
+ + + +
+ + + +

+init(key) + +

+ + +
+ +

Return the initial state. Note that no internal state of +environment changes.

+ +

Parameters:

+ + + + + + + + + + + + + + + + + +
NameTypeDescriptionDefault
key + jax.random.KeyArray +

pseudo-random generator key in JAX

+ required +
+ +

Returns:

+ + + + + + + + + + + + + +
Name TypeDescription
State + State +

initial state of environment

+ +
+ Source code in pgx/v1.py +
def init(self, key: jax.random.KeyArray) -> State:
+    """Return the initial state. Note that no internal state of
+    environment changes.
+
+    Args:
+        key: pseudo-random generator key in JAX
+
+    Returns:
+        State: initial state of environment
+
+    """
+    key, subkey = jax.random.split(key)
+    state = self._init(subkey)
+    state = state.replace(_rng_key=key)  # type: ignore
+    observation = self.observe(state, state.current_player)
+    return state.replace(observation=observation)  # type: ignore
+
+
+
+ +
+ +
+ + + +

+observe(state, player_id) + +

+ + +
+ +

Observation function.

+ +
+ Source code in pgx/v1.py +
def observe(self, state: State, player_id: jnp.ndarray) -> jnp.ndarray:
+    """Observation function."""
+    obs = self._observe(state, player_id)
+    return jax.lax.stop_gradient(obs)
+
+
+
+ +
+ +
+ + + +

+step(state, action) + +

+ + +
+ +

Step function.

+ +
+ Source code in pgx/v1.py +
def step(self, state: State, action: jnp.ndarray) -> State:
+    """Step function."""
+    is_illegal = ~state.legal_action_mask[action]
+    current_player = state.current_player
+
+    # If the state is already terminated or truncated, environment does not take usual step,
+    # but return the same state with zero-rewards for all players
+    state = jax.lax.cond(
+        (state.terminated | state.truncated),
+        lambda: state.replace(rewards=jnp.zeros_like(state.rewards)),  # type: ignore
+        lambda: self._step(state.replace(_step_count=state._step_count + 1), action),  # type: ignore
+    )
+
+    # Taking illegal action leads to immediate game terminal with negative reward
+    state = jax.lax.cond(
+        is_illegal,
+        lambda: self._step_with_illegal_action(state, current_player),
+        lambda: state,
+    )
+
+    # All legal_action_mask elements are **TRUE** at terminal state
+    # This is to avoid zero-division error when normalizing action probability
+    # Taking any action at terminal state does not give any effect to the state
+    state = jax.lax.cond(
+        state.terminated,
+        lambda: state.replace(  # type: ignore
+            legal_action_mask=jnp.ones_like(state.legal_action_mask)
+        ),
+        lambda: state,
+    )
+
+    observation = self.observe(state, state.current_player)
+    state = state.replace(observation=observation)  # type: ignore
+
+    return state
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+ + + +

+pgx.EnvId = Literal['2048', 'animal_shogi', 'backgammon', 'bridge_bidding', 'chess', 'connect_four', 'gardner_chess', 'go_9x9', 'go_19x19', 'hex', 'kuhn_poker', 'leduc_holdem', 'minatar-asterix', 'minatar-breakout', 'minatar-freeway', 'minatar-seaquest', 'minatar-space_invaders', 'othello', 'shogi', 'sparrow_mahjong', 'tic_tac_toe'] + + + module-attribute + + +

+ + +
+
+ +
+

Naming convention of EnvId

+

Hyphen - is used to represent that there is a different original game source (e.g., MinAtar), and underscore - is used for the other cases.

+
+ + +
+ + + +

+pgx.make(env_id) + +

+ + +
+ +

Load the specified environment.

+
+

Example usage

+
env = pgx.make("tic_tac_toe")
+
+
+
+

BridgeBidding environment

+

BridgeBidding environment requires the domain knowledge of bridge game. +So we forbid users to load the bridge environment by make("bridge_bidding"). +Use BridgeBidding class directly by from pgx.bridge_bidding import BridgeBidding.

+
+ +
+ Source code in pgx/v1.py +
def make(env_id: EnvId):  # noqa: C901
+    """Load the specified environment.
+
+    !!! example "Example usage"
+
+        ```py
+        env = pgx.make("tic_tac_toe")
+        ```
+
+    !!! note "`BridgeBidding` environment"
+
+        `BridgeBidding` environment requires the domain knowledge of bridge game.
+        So we forbid users to load the bridge environment by `make("bridge_bidding")`.
+        Use `BridgeBidding` class directly by `from pgx.bridge_bidding import BridgeBidding`.
+
+    """
+    # NOTE: BridgeBidding environment requires the domain knowledge of bridge
+    # So we forbid users to load the bridge environment by `make("bridge_bidding")`.
+    if env_id == "2048":
+        from pgx.play2048 import Play2048
+
+        return Play2048()
+    elif env_id == "animal_shogi":
+        from pgx.animal_shogi import AnimalShogi
+
+        return AnimalShogi()
+    elif env_id == "backgammon":
+        from pgx.backgammon import Backgammon
+
+        return Backgammon()
+    elif env_id == "chess":
+        from pgx.chess import Chess
+
+        return Chess()
+    elif env_id == "connect_four":
+        from pgx.connect_four import ConnectFour
+
+        return ConnectFour()
+    elif env_id == "gardner_chess":
+        from pgx.gardner_chess import GardnerChess
+
+        return GardnerChess()
+    elif env_id == "go_9x9":
+        from pgx.go import Go
+
+        return Go(size=9, komi=7.5)
+    elif env_id == "go_19x19":
+        from pgx.go import Go
+
+        return Go(size=19, komi=7.5)
+    elif env_id == "hex":
+        from pgx.hex import Hex
+
+        return Hex()
+    elif env_id == "kuhn_poker":
+        from pgx.kuhn_poker import KuhnPoker
+
+        return KuhnPoker()
+    elif env_id == "leduc_holdem":
+        from pgx.leduc_holdem import LeducHoldem
+
+        return LeducHoldem()
+    elif env_id == "minatar-asterix":
+        try:
+            from pgx_minatar.asterix import MinAtarAsterix  # type: ignore
+
+            return MinAtarAsterix()
+        except ModuleNotFoundError:
+            print(
+                '"minatar-asterix" environment is provided as a separate plugin of Pgx.\nPlease run `$ pip install pgx-minatar` to use this environment in Pgx.',
+                file=sys.stderr,
+            )
+            sys.exit(1)
+    elif env_id == "minatar-breakout":
+        try:
+            from pgx_minatar.breakout import MinAtarBreakout  # type: ignore
+
+            return MinAtarBreakout()
+        except ModuleNotFoundError:
+            print(
+                '"minatar-breakout" environment is provided as a separate plugin of Pgx.\nPlease run `$ pip install pgx-minatar` to use this environment in Pgx.',
+                file=sys.stderr,
+            )
+            sys.exit(1)
+    elif env_id == "minatar-freeway":
+        try:
+            from pgx_minatar.freeway import MinAtarFreeway  # type: ignore
+
+            return MinAtarFreeway()
+        except ModuleNotFoundError:
+            print(
+                '"minatar-freeway" environment is provided as a separate plugin of Pgx.\nPlease run `$ pip install pgx-minatar` to use this environment in Pgx.',
+                file=sys.stderr,
+            )
+            sys.exit(1)
+    elif env_id == "minatar-seaquest":
+        try:
+            from pgx_minatar.seaquest import MinAtarSeaquest  # type: ignore
+
+            return MinAtarSeaquest()
+        except ModuleNotFoundError:
+            print(
+                '"minatar-seaquest" environment is provided as a separate plugin of Pgx.\nPlease run `$ pip install pgx-minatar` to use this environment in Pgx.',
+                file=sys.stderr,
+            )
+            sys.exit(1)
+    elif env_id == "minatar-space_invaders":
+        try:
+            from pgx_minatar.space_invaders import (  # type: ignore
+                MinAtarSpaceInvaders,
+            )
+
+            return MinAtarSpaceInvaders()
+        except ModuleNotFoundError:
+            print(
+                '"minatar-space_invaders" environment is provided as a separate plugin of Pgx.\nPlease run `$ pip install pgx-minatar` to use this environment in Pgx.',
+                file=sys.stderr,
+            )
+            sys.exit(1)
+    elif env_id == "othello":
+        from pgx.othello import Othello
+
+        return Othello()
+    elif env_id == "shogi":
+        from pgx.shogi import Shogi
+
+        return Shogi()
+    elif env_id == "sparrow_mahjong":
+        from pgx.sparrow_mahjong import SparrowMahjong
+
+        return SparrowMahjong()
+    elif env_id == "tic_tac_toe":
+        from pgx.tic_tac_toe import TicTacToe
+
+        return TicTacToe()
+    else:
+        envs = "\n".join(available_envs())
+        raise ValueError(
+            f"Wrong env_id '{env_id}' is passed. Available ids are: \n{envs}"
+        )
+
+
+
+ +
+ +
+ + + +

+pgx.available_envs() + +

+ + +
+ +

List up all environment id available in pgx.make function.

+
+

Example usage

+
pgx.available_envs()
+('2048', 'animal_shogi', 'backgammon', 'chess', 'connect_four', 'go_9x9', 'go_19x19', 'hex', 'kuhn_poker', 'leduc_holdem', 'minatar-asterix', 'minatar-breakout', 'minatar-freeway', 'minatar-seaquest', 'minatar-space_invaders', 'othello', 'shogi', 'sparrow_mahjong', 'tic_tac_toe')
+
+
+
+

BridgeBidding environment

+

BridgeBidding environment requires the domain knowledge of bridge game. +So we forbid users to load the bridge environment by make("bridge_bidding"). +Use BridgeBidding class directly by from pgx.bridge_bidding import BridgeBidding.

+
+ +
+ Source code in pgx/v1.py +
def available_envs() -> Tuple[EnvId, ...]:
+    """List up all environment id available in `pgx.make` function.
+
+    !!! example "Example usage"
+
+        ```py
+        pgx.available_envs()
+        ('2048', 'animal_shogi', 'backgammon', 'chess', 'connect_four', 'go_9x9', 'go_19x19', 'hex', 'kuhn_poker', 'leduc_holdem', 'minatar-asterix', 'minatar-breakout', 'minatar-freeway', 'minatar-seaquest', 'minatar-space_invaders', 'othello', 'shogi', 'sparrow_mahjong', 'tic_tac_toe')
+        ```
+
+
+    !!! note "`BridgeBidding` environment"
+
+        `BridgeBidding` environment requires the domain knowledge of bridge game.
+        So we forbid users to load the bridge environment by `make("bridge_bidding")`.
+        Use `BridgeBidding` class directly by `from pgx.bridge_bidding import BridgeBidding`.
+
+    """
+    games = get_args(EnvId)
+    games = tuple(filter(lambda x: x != "bridge_bidding", games))
+    return games
+
+
+
+ +
+ +
+ + + +

+pgx.set_visualization_config(*, color_theme='light', scale=1.0, frame_duration_seconds=0.2) + +

+ + +
+ +
+ Source code in pgx/_src/visualizer.py +
38
+39
+40
+41
+42
+43
+44
+45
+46
def set_visualization_config(
+    *,
+    color_theme: ColorTheme = "light",
+    scale: float = 1.0,
+    frame_duration_seconds: float = 0.2,
+):
+    global_config.color_theme = color_theme
+    global_config.scale = scale
+    global_config.frame_duration_seconds = frame_duration_seconds
+
+
+
+ +
+ +
+ + + +

+pgx.save_svg(state, filename, *, color_theme=None, scale=None) + +

+ + +
+ +
+ Source code in pgx/_src/visualizer.py +
def save_svg(
+    state: State,
+    filename: Union[str, Path],
+    *,
+    color_theme: Optional[Literal["light", "dark"]] = None,
+    scale: Optional[float] = None,
+) -> None:
+    assert str(filename).endswith(".svg")
+    if state.env_id.startswith("minatar"):
+        state.save_svg(filename=filename)
+    else:
+        v = Visualizer(color_theme=color_theme, scale=scale)
+        v.get_dwg(states=state).saveas(filename)
+
+
+
+ +
+ +
+ + + +

+pgx.save_svg_animation(states, filename, *, color_theme=None, scale=None, frame_duration_seconds=None) + +

+ + +
+ +
+ Source code in pgx/_src/visualizer.py +
def save_svg_animation(
+    states: Sequence[State],
+    filename: Union[str, Path],
+    *,
+    color_theme: Optional[Literal["light", "dark"]] = None,
+    scale: Optional[float] = None,
+    frame_duration_seconds: Optional[float] = None,
+) -> None:
+    assert not states[0].env_id.startswith(
+        "minatar"
+    ), "MinAtar does not support svg animation."
+    assert str(filename).endswith(".svg")
+    v = Visualizer(color_theme=color_theme, scale=scale)
+
+    if frame_duration_seconds is None:
+        frame_duration_seconds = global_config.frame_duration_seconds
+
+    frame_groups = []
+    dwg = None
+    for i, state in enumerate(states):
+        dwg = v.get_dwg(states=state)
+        assert (
+            len(
+                [
+                    e
+                    for e in dwg.elements
+                    if type(e) == svgwrite.container.Group
+                ]
+            )
+            == 1
+        ), "Drawing must contain only one group"
+        group: svgwrite.container.Group = dwg.elements[-1]
+        group["id"] = f"_fr{i:x}"  # hex frame number
+        group["class"] = "frame"
+        frame_groups.append(group)
+
+    assert dwg is not None
+    del dwg.elements[-1]
+    total_seconds = frame_duration_seconds * len(frame_groups)
+
+    style = f".frame{{visibility:hidden; animation:{total_seconds}s linear _k infinite;}}"
+    style += f"@keyframes _k{{0%,{100/len(frame_groups)}%{{visibility:visible}}{100/len(frame_groups) * 1.000001}%,100%{{visibility:hidden}}}}"
+
+    for i, group in enumerate(frame_groups):
+        dwg.add(group)
+        style += (
+            f"#{group['id']}{{animation-delay:{i * frame_duration_seconds}s}}"
+        )
+    dwg.defs.add(svgwrite.container.Style(content=style))
+    dwg.saveas(filename)
+
+
+
+ +
+ +
+ + + +

+pgx.BaselineModelId = Literal['animal_shogi_v0', 'gardner_chess_v0', 'go_9x9_v0', 'hex_v0', 'othello_v0'] + + + module-attribute + + +

+ + +
+
+ +
+ +
+ + + +

+pgx.make_baseline_model(model_id, download_dir='baselines') + +

+ + +
+ +
+ Source code in pgx/_src/baseline.py +
19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
def make_baseline_model(
+    model_id: BaselineModelId, download_dir: str = "baselines"
+):
+    import haiku as hk
+
+    create_model_fn = _make_create_model_fn(model_id)
+    model_args, model_params, model_state = _load_baseline_model(
+        model_id, download_dir
+    )
+
+    def forward_fn(x, is_eval=False):
+        net = create_model_fn(**model_args)
+        policy_out, value_out = net(
+            x, is_training=not is_eval, test_local_stats=False
+        )
+        return policy_out, value_out
+
+    forward = hk.without_apply_rng(hk.transform_with_state(forward_fn))
+
+    def apply(obs):
+        (logits, value), _ = forward.apply(
+            model_params, model_state, obs, is_eval=True
+        )
+        return logits, value
+
+    return apply
+
+
+
+ +
+ +
+ + + +

+pgx.v1_api_test(env, num=100) + +

+ + +
+ +
+ Source code in pgx/_src/api_test.py +
28
+29
+30
def v1_api_test(env: Env, num: int = 100):
+    api_test_single(env, num)
+    api_test_batch(env, num)
+
+
+
+ +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/api_usage/index.html b/api_usage/index.html new file mode 100644 index 000000000..8a54b3480 --- /dev/null +++ b/api_usage/index.html @@ -0,0 +1,847 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Usage - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Pgx API Usage

+

Example.1: Random play

+
import jax
+import jax.numpy as jnp
+import pgx
+
+seed = 42
+batch_size = 10
+key = jax.random.PRNGKey(seed)
+
+
+def act_randomly(rng_key, obs, mask):
+    """Ignore observation and choose randomly from legal actions"""
+    del obs
+    probs = mask / mask.sum()
+    logits = jnp.maximum(jnp.log(probs), jnp.finfo(probs.dtype).min)
+    return jax.random.categorical(rng_key, logits=logits, axis=-1)
+
+
+# Load the environment
+env = pgx.make("go_9x9")
+init_fn = jax.jit(jax.vmap(env.init))
+step_fn = jax.jit(jax.vmap(env.step))
+
+# Initialize the states
+key, subkey = jax.random.split(key)
+keys = jax.random.split(subkey, batch_size)
+state = init_fn(keys)
+
+# Run random simulation
+while not (state.terminated | state.truncated).all():
+    key, subkey = jax.random.split(key)
+    action = act_randomly(subkey, state.observation, state.legal_action_mask)
+    state = step_fn(state, action)  # state.reward (2,)
+
+

Example.2: Random agent vs Baseline model

+

This illustrative example helps to understand

+
    +
  • How state.current_player is defined
  • +
  • How to access the reward of each player
  • +
  • How Env.step behaves against already terminated states
  • +
  • How to use baseline models probided by Pgx
  • +
+
import jax
+import jax.numpy as jnp
+import pgx
+from pgx.experimental.utils import act_randomly
+
+seed = 42
+batch_size = 10
+key = jax.random.PRNGKey(seed)
+
+# Prepare agent A and B
+#   Agent A: random player
+#   Agent B: baseline player provided by Pgx
+A = 0
+B = 1
+
+# Load the environment
+env = pgx.make("go_9x9")
+init_fn = jax.jit(jax.vmap(env.init))
+step_fn = jax.jit(jax.vmap(env.step))
+
+# Prepare baseline model
+# Note that it additionaly requires Haiku library ($ pip install dm-haiku)
+model_id = "go_9x9_v0"
+model = pgx.make_baseline_model(model_id)
+
+# Initialize the states
+key, subkey = jax.random.split(key)
+keys = jax.random.split(subkey, batch_size)
+state = init_fn(keys)
+print(f"Game index: {jnp.arange(batch_size)}")  #  [0 1 2 3 4 5 6 7 8 9]
+print(f"Black player: {state.current_player}")  #  [1 1 0 1 0 0 1 1 1 1]
+# In other words
+print(f"A is black: {state.current_player == A}")  # [False False  True False  True  True False False False False]
+print(f"B is black: {state.current_player == B}")  # [ True  True False  True False False  True  True  True  True]
+
+# Run simulation
+R = state.rewards
+while not (state.terminated | state.truncated).all():
+    # Action of random player A
+    key, subkey = jax.random.split(key)
+    action_A = jax.jit(act_randomly)(subkey, state)
+    # Greedy action of baseline model B
+    logits, value = model(state.observation)
+    action_B = logits.argmax(axis=-1)
+
+    action = jnp.where(state.current_player == A, action_A, action_B)
+    state = step_fn(state, action)
+    R += state.rewards
+
+print(f"Return of agent A = {R[:, A]}")  # [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]
+print(f"Return of agent B = {R[:, B]}")  # [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
+
+

Note that we can avoid to explicitly deal with the first batch dimension like [:, A] by using vmap later.

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/assets/2048_dark.gif b/assets/2048_dark.gif new file mode 100644 index 000000000..ac3a29ef5 Binary files /dev/null and b/assets/2048_dark.gif differ diff --git a/assets/2048_light.gif b/assets/2048_light.gif new file mode 100644 index 000000000..7176cade8 Binary files /dev/null and b/assets/2048_light.gif differ diff --git a/assets/_mkdocstrings.css b/assets/_mkdocstrings.css new file mode 100644 index 000000000..a65078d02 --- /dev/null +++ b/assets/_mkdocstrings.css @@ -0,0 +1,36 @@ + +/* Don't capitalize names. */ +h5.doc-heading { + text-transform: none !important; +} + +/* Avoid breaking parameters name, etc. in table cells. */ +.doc-contents td code { + word-break: normal !important; +} + +/* For pieces of Markdown rendered in table cells. */ +.doc-contents td p { + margin-top: 0 !important; + margin-bottom: 0 !important; +} + +/* Max width for docstring sections tables. */ +.doc .md-typeset__table, +.doc .md-typeset__table table { + display: table !important; + width: 100%; +} +.doc .md-typeset__table tr { + display: table-row; +} + +/* Avoid line breaks in rendered fields. */ +.field-body p { + display: inline; +} + +/* Defaults in Spacy table style. */ +.doc-param-default { + float: right; +} diff --git a/assets/animal_shogi_dark.gif b/assets/animal_shogi_dark.gif new file mode 100644 index 000000000..3e05ea44f Binary files /dev/null and b/assets/animal_shogi_dark.gif differ diff --git a/assets/animal_shogi_light.gif b/assets/animal_shogi_light.gif new file mode 100644 index 000000000..b26790f41 Binary files /dev/null and b/assets/animal_shogi_light.gif differ diff --git a/assets/backgammon_dark.gif b/assets/backgammon_dark.gif new file mode 100644 index 000000000..af7086539 Binary files /dev/null and b/assets/backgammon_dark.gif differ diff --git a/assets/backgammon_light.gif b/assets/backgammon_light.gif new file mode 100644 index 000000000..e68d35a46 Binary files /dev/null and b/assets/backgammon_light.gif differ diff --git a/assets/bridge_bidding_dark.gif b/assets/bridge_bidding_dark.gif new file mode 100644 index 000000000..24a19bfb9 Binary files /dev/null and b/assets/bridge_bidding_dark.gif differ diff --git a/assets/bridge_bidding_light.gif b/assets/bridge_bidding_light.gif new file mode 100644 index 000000000..c188861d2 Binary files /dev/null and b/assets/bridge_bidding_light.gif differ diff --git a/assets/chess_dark.gif b/assets/chess_dark.gif new file mode 100644 index 000000000..d4a597fd3 Binary files /dev/null and b/assets/chess_dark.gif differ diff --git a/assets/chess_light.gif b/assets/chess_light.gif new file mode 100644 index 000000000..d5ff8d92e Binary files /dev/null and b/assets/chess_light.gif differ diff --git a/assets/connect_four_dark.gif b/assets/connect_four_dark.gif new file mode 100644 index 000000000..cc3c7bd66 Binary files /dev/null and b/assets/connect_four_dark.gif differ diff --git a/assets/connect_four_light.gif b/assets/connect_four_light.gif new file mode 100644 index 000000000..0280bf568 Binary files /dev/null and b/assets/connect_four_light.gif differ diff --git a/assets/favicon.svg b/assets/favicon.svg new file mode 100644 index 000000000..f48547c51 --- /dev/null +++ b/assets/favicon.svg @@ -0,0 +1,113 @@ + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/gardner_chess_dark.gif b/assets/gardner_chess_dark.gif new file mode 100644 index 000000000..6fc190ed2 Binary files /dev/null and b/assets/gardner_chess_dark.gif differ diff --git a/assets/gardner_chess_light.gif b/assets/gardner_chess_light.gif new file mode 100644 index 000000000..73c37de6b Binary files /dev/null and b/assets/gardner_chess_light.gif differ diff --git a/assets/generate_gif.py b/assets/generate_gif.py new file mode 100644 index 000000000..c8bbe7cc8 --- /dev/null +++ b/assets/generate_gif.py @@ -0,0 +1,25 @@ +import os +import sys +import jax +import pgx +from pgx.experimental.utils import act_randomly + +os.makedirs("tmp", exist_ok=True) + +env_id: pgx.EnvId = sys.argv[1] +color_theme = sys.argv[2] +env = pgx.make(env_id, auto_reset=True) +init = jax.jit(env.init) +step = jax.jit(env.step) + +rng = jax.random.PRNGKey(9999) + +states = [] +rng, subkey = jax.random.split(rng) +state = init(subkey) +# while not state.terminated.all(): +for i in range(50): + state.save_svg(f"tmp/{env_id}_{i:03d}.svg", color_theme=color_theme) + rng, subkey = jax.random.split(rng) + action = act_randomly(subkey, state) + state = step(state, action) diff --git a/assets/generate_gif.sh b/assets/generate_gif.sh new file mode 100644 index 000000000..d4c67e46f --- /dev/null +++ b/assets/generate_gif.sh @@ -0,0 +1,22 @@ +#! /bin/bash +set -e + +# light theme +rm -rf tmp +python3 generate_gif.py $1 light +cd tmp +inkscape --export-type=png *.svg +convert *.png "$1"_light.gif +mv "$1"_light.gif ../ +cd .. + +# dark theme +rm -rf tmp +python3 generate_gif.py $1 dark +cd tmp +inkscape --export-type=png *.svg +convert *.png "$1"_dark.gif +mv "$1"_dark.gif ../ +cd .. + +rm -rf tmp diff --git a/assets/go-19x19_dark.gif b/assets/go-19x19_dark.gif new file mode 100644 index 000000000..296c04b6f Binary files /dev/null and b/assets/go-19x19_dark.gif differ diff --git a/assets/go-19x19_light.gif b/assets/go-19x19_light.gif new file mode 100644 index 000000000..125f9cc5f Binary files /dev/null and b/assets/go-19x19_light.gif differ diff --git a/assets/go_dark.gif b/assets/go_dark.gif new file mode 100644 index 000000000..98dff0d65 Binary files /dev/null and b/assets/go_dark.gif differ diff --git a/assets/go_light.gif b/assets/go_light.gif new file mode 100644 index 000000000..281a0d8ee Binary files /dev/null and b/assets/go_light.gif differ diff --git a/assets/hex_dark.gif b/assets/hex_dark.gif new file mode 100644 index 000000000..004a468ca Binary files /dev/null and b/assets/hex_dark.gif differ diff --git a/assets/hex_light.gif b/assets/hex_light.gif new file mode 100644 index 000000000..4571ad8ef Binary files /dev/null and b/assets/hex_light.gif differ diff --git a/assets/icon.svg b/assets/icon.svg new file mode 100644 index 000000000..0bc211f45 --- /dev/null +++ b/assets/icon.svg @@ -0,0 +1,113 @@ + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/images/favicon.png b/assets/images/favicon.png new file mode 100644 index 000000000..1cf13b9f9 Binary files /dev/null and b/assets/images/favicon.png differ diff --git a/assets/javascripts/bundle.220ee61c.min.js b/assets/javascripts/bundle.220ee61c.min.js new file mode 100644 index 000000000..116072a11 --- /dev/null +++ b/assets/javascripts/bundle.220ee61c.min.js @@ -0,0 +1,29 @@ +"use strict";(()=>{var Ci=Object.create;var gr=Object.defineProperty;var Ri=Object.getOwnPropertyDescriptor;var ki=Object.getOwnPropertyNames,Ht=Object.getOwnPropertySymbols,Hi=Object.getPrototypeOf,yr=Object.prototype.hasOwnProperty,nn=Object.prototype.propertyIsEnumerable;var rn=(e,t,r)=>t in e?gr(e,t,{enumerable:!0,configurable:!0,writable:!0,value:r}):e[t]=r,P=(e,t)=>{for(var r in t||(t={}))yr.call(t,r)&&rn(e,r,t[r]);if(Ht)for(var r of Ht(t))nn.call(t,r)&&rn(e,r,t[r]);return e};var on=(e,t)=>{var r={};for(var n in e)yr.call(e,n)&&t.indexOf(n)<0&&(r[n]=e[n]);if(e!=null&&Ht)for(var n of Ht(e))t.indexOf(n)<0&&nn.call(e,n)&&(r[n]=e[n]);return r};var Pt=(e,t)=>()=>(t||e((t={exports:{}}).exports,t),t.exports);var Pi=(e,t,r,n)=>{if(t&&typeof t=="object"||typeof t=="function")for(let o of ki(t))!yr.call(e,o)&&o!==r&&gr(e,o,{get:()=>t[o],enumerable:!(n=Ri(t,o))||n.enumerable});return e};var yt=(e,t,r)=>(r=e!=null?Ci(Hi(e)):{},Pi(t||!e||!e.__esModule?gr(r,"default",{value:e,enumerable:!0}):r,e));var sn=Pt((xr,an)=>{(function(e,t){typeof xr=="object"&&typeof an!="undefined"?t():typeof define=="function"&&define.amd?define(t):t()})(xr,function(){"use strict";function e(r){var n=!0,o=!1,i=null,s={text:!0,search:!0,url:!0,tel:!0,email:!0,password:!0,number:!0,date:!0,month:!0,week:!0,time:!0,datetime:!0,"datetime-local":!0};function a(O){return!!(O&&O!==document&&O.nodeName!=="HTML"&&O.nodeName!=="BODY"&&"classList"in O&&"contains"in O.classList)}function f(O){var Qe=O.type,De=O.tagName;return!!(De==="INPUT"&&s[Qe]&&!O.readOnly||De==="TEXTAREA"&&!O.readOnly||O.isContentEditable)}function c(O){O.classList.contains("focus-visible")||(O.classList.add("focus-visible"),O.setAttribute("data-focus-visible-added",""))}function u(O){O.hasAttribute("data-focus-visible-added")&&(O.classList.remove("focus-visible"),O.removeAttribute("data-focus-visible-added"))}function p(O){O.metaKey||O.altKey||O.ctrlKey||(a(r.activeElement)&&c(r.activeElement),n=!0)}function m(O){n=!1}function d(O){a(O.target)&&(n||f(O.target))&&c(O.target)}function h(O){a(O.target)&&(O.target.classList.contains("focus-visible")||O.target.hasAttribute("data-focus-visible-added"))&&(o=!0,window.clearTimeout(i),i=window.setTimeout(function(){o=!1},100),u(O.target))}function v(O){document.visibilityState==="hidden"&&(o&&(n=!0),Y())}function Y(){document.addEventListener("mousemove",N),document.addEventListener("mousedown",N),document.addEventListener("mouseup",N),document.addEventListener("pointermove",N),document.addEventListener("pointerdown",N),document.addEventListener("pointerup",N),document.addEventListener("touchmove",N),document.addEventListener("touchstart",N),document.addEventListener("touchend",N)}function B(){document.removeEventListener("mousemove",N),document.removeEventListener("mousedown",N),document.removeEventListener("mouseup",N),document.removeEventListener("pointermove",N),document.removeEventListener("pointerdown",N),document.removeEventListener("pointerup",N),document.removeEventListener("touchmove",N),document.removeEventListener("touchstart",N),document.removeEventListener("touchend",N)}function N(O){O.target.nodeName&&O.target.nodeName.toLowerCase()==="html"||(n=!1,B())}document.addEventListener("keydown",p,!0),document.addEventListener("mousedown",m,!0),document.addEventListener("pointerdown",m,!0),document.addEventListener("touchstart",m,!0),document.addEventListener("visibilitychange",v,!0),Y(),r.addEventListener("focus",d,!0),r.addEventListener("blur",h,!0),r.nodeType===Node.DOCUMENT_FRAGMENT_NODE&&r.host?r.host.setAttribute("data-js-focus-visible",""):r.nodeType===Node.DOCUMENT_NODE&&(document.documentElement.classList.add("js-focus-visible"),document.documentElement.setAttribute("data-js-focus-visible",""))}if(typeof window!="undefined"&&typeof document!="undefined"){window.applyFocusVisiblePolyfill=e;var t;try{t=new CustomEvent("focus-visible-polyfill-ready")}catch(r){t=document.createEvent("CustomEvent"),t.initCustomEvent("focus-visible-polyfill-ready",!1,!1,{})}window.dispatchEvent(t)}typeof document!="undefined"&&e(document)})});var cn=Pt(Er=>{(function(e){var t=function(){try{return!!Symbol.iterator}catch(c){return!1}},r=t(),n=function(c){var u={next:function(){var p=c.shift();return{done:p===void 0,value:p}}};return r&&(u[Symbol.iterator]=function(){return u}),u},o=function(c){return encodeURIComponent(c).replace(/%20/g,"+")},i=function(c){return decodeURIComponent(String(c).replace(/\+/g," "))},s=function(){var c=function(p){Object.defineProperty(this,"_entries",{writable:!0,value:{}});var m=typeof p;if(m!=="undefined")if(m==="string")p!==""&&this._fromString(p);else if(p instanceof c){var d=this;p.forEach(function(B,N){d.append(N,B)})}else if(p!==null&&m==="object")if(Object.prototype.toString.call(p)==="[object Array]")for(var h=0;hd[0]?1:0}),c._entries&&(c._entries={});for(var p=0;p1?i(d[1]):"")}})})(typeof global!="undefined"?global:typeof window!="undefined"?window:typeof self!="undefined"?self:Er);(function(e){var t=function(){try{var o=new e.URL("b","http://a");return o.pathname="c d",o.href==="http://a/c%20d"&&o.searchParams}catch(i){return!1}},r=function(){var o=e.URL,i=function(f,c){typeof f!="string"&&(f=String(f)),c&&typeof c!="string"&&(c=String(c));var u=document,p;if(c&&(e.location===void 0||c!==e.location.href)){c=c.toLowerCase(),u=document.implementation.createHTMLDocument(""),p=u.createElement("base"),p.href=c,u.head.appendChild(p);try{if(p.href.indexOf(c)!==0)throw new Error(p.href)}catch(O){throw new Error("URL unable to set base "+c+" due to "+O)}}var m=u.createElement("a");m.href=f,p&&(u.body.appendChild(m),m.href=m.href);var d=u.createElement("input");if(d.type="url",d.value=f,m.protocol===":"||!/:/.test(m.href)||!d.checkValidity()&&!c)throw new TypeError("Invalid URL");Object.defineProperty(this,"_anchorElement",{value:m});var h=new e.URLSearchParams(this.search),v=!0,Y=!0,B=this;["append","delete","set"].forEach(function(O){var Qe=h[O];h[O]=function(){Qe.apply(h,arguments),v&&(Y=!1,B.search=h.toString(),Y=!0)}}),Object.defineProperty(this,"searchParams",{value:h,enumerable:!0});var N=void 0;Object.defineProperty(this,"_updateSearchParams",{enumerable:!1,configurable:!1,writable:!1,value:function(){this.search!==N&&(N=this.search,Y&&(v=!1,this.searchParams._fromString(this.search),v=!0))}})},s=i.prototype,a=function(f){Object.defineProperty(s,f,{get:function(){return this._anchorElement[f]},set:function(c){this._anchorElement[f]=c},enumerable:!0})};["hash","host","hostname","port","protocol"].forEach(function(f){a(f)}),Object.defineProperty(s,"search",{get:function(){return this._anchorElement.search},set:function(f){this._anchorElement.search=f,this._updateSearchParams()},enumerable:!0}),Object.defineProperties(s,{toString:{get:function(){var f=this;return function(){return f.href}}},href:{get:function(){return this._anchorElement.href.replace(/\?$/,"")},set:function(f){this._anchorElement.href=f,this._updateSearchParams()},enumerable:!0},pathname:{get:function(){return this._anchorElement.pathname.replace(/(^\/?)/,"/")},set:function(f){this._anchorElement.pathname=f},enumerable:!0},origin:{get:function(){var f={"http:":80,"https:":443,"ftp:":21}[this._anchorElement.protocol],c=this._anchorElement.port!=f&&this._anchorElement.port!=="";return this._anchorElement.protocol+"//"+this._anchorElement.hostname+(c?":"+this._anchorElement.port:"")},enumerable:!0},password:{get:function(){return""},set:function(f){},enumerable:!0},username:{get:function(){return""},set:function(f){},enumerable:!0}}),i.createObjectURL=function(f){return o.createObjectURL.apply(o,arguments)},i.revokeObjectURL=function(f){return o.revokeObjectURL.apply(o,arguments)},e.URL=i};if(t()||r(),e.location!==void 0&&!("origin"in e.location)){var n=function(){return e.location.protocol+"//"+e.location.hostname+(e.location.port?":"+e.location.port:"")};try{Object.defineProperty(e.location,"origin",{get:n,enumerable:!0})}catch(o){setInterval(function(){e.location.origin=n()},100)}}})(typeof global!="undefined"?global:typeof window!="undefined"?window:typeof self!="undefined"?self:Er)});var qr=Pt((Mt,Nr)=>{/*! + * clipboard.js v2.0.11 + * https://clipboardjs.com/ + * + * Licensed MIT © Zeno Rocha + */(function(t,r){typeof Mt=="object"&&typeof Nr=="object"?Nr.exports=r():typeof define=="function"&&define.amd?define([],r):typeof Mt=="object"?Mt.ClipboardJS=r():t.ClipboardJS=r()})(Mt,function(){return function(){var e={686:function(n,o,i){"use strict";i.d(o,{default:function(){return Ai}});var s=i(279),a=i.n(s),f=i(370),c=i.n(f),u=i(817),p=i.n(u);function m(j){try{return document.execCommand(j)}catch(T){return!1}}var d=function(T){var E=p()(T);return m("cut"),E},h=d;function v(j){var T=document.documentElement.getAttribute("dir")==="rtl",E=document.createElement("textarea");E.style.fontSize="12pt",E.style.border="0",E.style.padding="0",E.style.margin="0",E.style.position="absolute",E.style[T?"right":"left"]="-9999px";var H=window.pageYOffset||document.documentElement.scrollTop;return E.style.top="".concat(H,"px"),E.setAttribute("readonly",""),E.value=j,E}var Y=function(T,E){var H=v(T);E.container.appendChild(H);var I=p()(H);return m("copy"),H.remove(),I},B=function(T){var E=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body},H="";return typeof T=="string"?H=Y(T,E):T instanceof HTMLInputElement&&!["text","search","url","tel","password"].includes(T==null?void 0:T.type)?H=Y(T.value,E):(H=p()(T),m("copy")),H},N=B;function O(j){"@babel/helpers - typeof";return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?O=function(E){return typeof E}:O=function(E){return E&&typeof Symbol=="function"&&E.constructor===Symbol&&E!==Symbol.prototype?"symbol":typeof E},O(j)}var Qe=function(){var T=arguments.length>0&&arguments[0]!==void 0?arguments[0]:{},E=T.action,H=E===void 0?"copy":E,I=T.container,q=T.target,Me=T.text;if(H!=="copy"&&H!=="cut")throw new Error('Invalid "action" value, use either "copy" or "cut"');if(q!==void 0)if(q&&O(q)==="object"&&q.nodeType===1){if(H==="copy"&&q.hasAttribute("disabled"))throw new Error('Invalid "target" attribute. Please use "readonly" instead of "disabled" attribute');if(H==="cut"&&(q.hasAttribute("readonly")||q.hasAttribute("disabled")))throw new Error(`Invalid "target" attribute. You can't cut text from elements with "readonly" or "disabled" attributes`)}else throw new Error('Invalid "target" value, use a valid Element');if(Me)return N(Me,{container:I});if(q)return H==="cut"?h(q):N(q,{container:I})},De=Qe;function $e(j){"@babel/helpers - typeof";return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?$e=function(E){return typeof E}:$e=function(E){return E&&typeof Symbol=="function"&&E.constructor===Symbol&&E!==Symbol.prototype?"symbol":typeof E},$e(j)}function Ei(j,T){if(!(j instanceof T))throw new TypeError("Cannot call a class as a function")}function tn(j,T){for(var E=0;E0&&arguments[0]!==void 0?arguments[0]:{};this.action=typeof I.action=="function"?I.action:this.defaultAction,this.target=typeof I.target=="function"?I.target:this.defaultTarget,this.text=typeof I.text=="function"?I.text:this.defaultText,this.container=$e(I.container)==="object"?I.container:document.body}},{key:"listenClick",value:function(I){var q=this;this.listener=c()(I,"click",function(Me){return q.onClick(Me)})}},{key:"onClick",value:function(I){var q=I.delegateTarget||I.currentTarget,Me=this.action(q)||"copy",kt=De({action:Me,container:this.container,target:this.target(q),text:this.text(q)});this.emit(kt?"success":"error",{action:Me,text:kt,trigger:q,clearSelection:function(){q&&q.focus(),window.getSelection().removeAllRanges()}})}},{key:"defaultAction",value:function(I){return vr("action",I)}},{key:"defaultTarget",value:function(I){var q=vr("target",I);if(q)return document.querySelector(q)}},{key:"defaultText",value:function(I){return vr("text",I)}},{key:"destroy",value:function(){this.listener.destroy()}}],[{key:"copy",value:function(I){var q=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body};return N(I,q)}},{key:"cut",value:function(I){return h(I)}},{key:"isSupported",value:function(){var I=arguments.length>0&&arguments[0]!==void 0?arguments[0]:["copy","cut"],q=typeof I=="string"?[I]:I,Me=!!document.queryCommandSupported;return q.forEach(function(kt){Me=Me&&!!document.queryCommandSupported(kt)}),Me}}]),E}(a()),Ai=Li},828:function(n){var o=9;if(typeof Element!="undefined"&&!Element.prototype.matches){var i=Element.prototype;i.matches=i.matchesSelector||i.mozMatchesSelector||i.msMatchesSelector||i.oMatchesSelector||i.webkitMatchesSelector}function s(a,f){for(;a&&a.nodeType!==o;){if(typeof a.matches=="function"&&a.matches(f))return a;a=a.parentNode}}n.exports=s},438:function(n,o,i){var s=i(828);function a(u,p,m,d,h){var v=c.apply(this,arguments);return u.addEventListener(m,v,h),{destroy:function(){u.removeEventListener(m,v,h)}}}function f(u,p,m,d,h){return typeof u.addEventListener=="function"?a.apply(null,arguments):typeof m=="function"?a.bind(null,document).apply(null,arguments):(typeof u=="string"&&(u=document.querySelectorAll(u)),Array.prototype.map.call(u,function(v){return a(v,p,m,d,h)}))}function c(u,p,m,d){return function(h){h.delegateTarget=s(h.target,p),h.delegateTarget&&d.call(u,h)}}n.exports=f},879:function(n,o){o.node=function(i){return i!==void 0&&i instanceof HTMLElement&&i.nodeType===1},o.nodeList=function(i){var s=Object.prototype.toString.call(i);return i!==void 0&&(s==="[object NodeList]"||s==="[object HTMLCollection]")&&"length"in i&&(i.length===0||o.node(i[0]))},o.string=function(i){return typeof i=="string"||i instanceof String},o.fn=function(i){var s=Object.prototype.toString.call(i);return s==="[object Function]"}},370:function(n,o,i){var s=i(879),a=i(438);function f(m,d,h){if(!m&&!d&&!h)throw new Error("Missing required arguments");if(!s.string(d))throw new TypeError("Second argument must be a String");if(!s.fn(h))throw new TypeError("Third argument must be a Function");if(s.node(m))return c(m,d,h);if(s.nodeList(m))return u(m,d,h);if(s.string(m))return p(m,d,h);throw new TypeError("First argument must be a String, HTMLElement, HTMLCollection, or NodeList")}function c(m,d,h){return m.addEventListener(d,h),{destroy:function(){m.removeEventListener(d,h)}}}function u(m,d,h){return Array.prototype.forEach.call(m,function(v){v.addEventListener(d,h)}),{destroy:function(){Array.prototype.forEach.call(m,function(v){v.removeEventListener(d,h)})}}}function p(m,d,h){return a(document.body,m,d,h)}n.exports=f},817:function(n){function o(i){var s;if(i.nodeName==="SELECT")i.focus(),s=i.value;else if(i.nodeName==="INPUT"||i.nodeName==="TEXTAREA"){var a=i.hasAttribute("readonly");a||i.setAttribute("readonly",""),i.select(),i.setSelectionRange(0,i.value.length),a||i.removeAttribute("readonly"),s=i.value}else{i.hasAttribute("contenteditable")&&i.focus();var f=window.getSelection(),c=document.createRange();c.selectNodeContents(i),f.removeAllRanges(),f.addRange(c),s=f.toString()}return s}n.exports=o},279:function(n){function o(){}o.prototype={on:function(i,s,a){var f=this.e||(this.e={});return(f[i]||(f[i]=[])).push({fn:s,ctx:a}),this},once:function(i,s,a){var f=this;function c(){f.off(i,c),s.apply(a,arguments)}return c._=s,this.on(i,c,a)},emit:function(i){var s=[].slice.call(arguments,1),a=((this.e||(this.e={}))[i]||[]).slice(),f=0,c=a.length;for(f;f{"use strict";/*! + * escape-html + * Copyright(c) 2012-2013 TJ Holowaychuk + * Copyright(c) 2015 Andreas Lubbe + * Copyright(c) 2015 Tiancheng "Timothy" Gu + * MIT Licensed + */var rs=/["'&<>]/;Yo.exports=ns;function ns(e){var t=""+e,r=rs.exec(t);if(!r)return t;var n,o="",i=0,s=0;for(i=r.index;i0&&i[i.length-1])&&(c[0]===6||c[0]===2)){r=0;continue}if(c[0]===3&&(!i||c[1]>i[0]&&c[1]=e.length&&(e=void 0),{value:e&&e[n++],done:!e}}};throw new TypeError(t?"Object is not iterable.":"Symbol.iterator is not defined.")}function W(e,t){var r=typeof Symbol=="function"&&e[Symbol.iterator];if(!r)return e;var n=r.call(e),o,i=[],s;try{for(;(t===void 0||t-- >0)&&!(o=n.next()).done;)i.push(o.value)}catch(a){s={error:a}}finally{try{o&&!o.done&&(r=n.return)&&r.call(n)}finally{if(s)throw s.error}}return i}function D(e,t,r){if(r||arguments.length===2)for(var n=0,o=t.length,i;n1||a(m,d)})})}function a(m,d){try{f(n[m](d))}catch(h){p(i[0][3],h)}}function f(m){m.value instanceof et?Promise.resolve(m.value.v).then(c,u):p(i[0][2],m)}function c(m){a("next",m)}function u(m){a("throw",m)}function p(m,d){m(d),i.shift(),i.length&&a(i[0][0],i[0][1])}}function pn(e){if(!Symbol.asyncIterator)throw new TypeError("Symbol.asyncIterator is not defined.");var t=e[Symbol.asyncIterator],r;return t?t.call(e):(e=typeof Ee=="function"?Ee(e):e[Symbol.iterator](),r={},n("next"),n("throw"),n("return"),r[Symbol.asyncIterator]=function(){return this},r);function n(i){r[i]=e[i]&&function(s){return new Promise(function(a,f){s=e[i](s),o(a,f,s.done,s.value)})}}function o(i,s,a,f){Promise.resolve(f).then(function(c){i({value:c,done:a})},s)}}function C(e){return typeof e=="function"}function at(e){var t=function(n){Error.call(n),n.stack=new Error().stack},r=e(t);return r.prototype=Object.create(Error.prototype),r.prototype.constructor=r,r}var It=at(function(e){return function(r){e(this),this.message=r?r.length+` errors occurred during unsubscription: +`+r.map(function(n,o){return o+1+") "+n.toString()}).join(` + `):"",this.name="UnsubscriptionError",this.errors=r}});function Ve(e,t){if(e){var r=e.indexOf(t);0<=r&&e.splice(r,1)}}var Ie=function(){function e(t){this.initialTeardown=t,this.closed=!1,this._parentage=null,this._finalizers=null}return e.prototype.unsubscribe=function(){var t,r,n,o,i;if(!this.closed){this.closed=!0;var s=this._parentage;if(s)if(this._parentage=null,Array.isArray(s))try{for(var a=Ee(s),f=a.next();!f.done;f=a.next()){var c=f.value;c.remove(this)}}catch(v){t={error:v}}finally{try{f&&!f.done&&(r=a.return)&&r.call(a)}finally{if(t)throw t.error}}else s.remove(this);var u=this.initialTeardown;if(C(u))try{u()}catch(v){i=v instanceof It?v.errors:[v]}var p=this._finalizers;if(p){this._finalizers=null;try{for(var m=Ee(p),d=m.next();!d.done;d=m.next()){var h=d.value;try{ln(h)}catch(v){i=i!=null?i:[],v instanceof It?i=D(D([],W(i)),W(v.errors)):i.push(v)}}}catch(v){n={error:v}}finally{try{d&&!d.done&&(o=m.return)&&o.call(m)}finally{if(n)throw n.error}}}if(i)throw new It(i)}},e.prototype.add=function(t){var r;if(t&&t!==this)if(this.closed)ln(t);else{if(t instanceof e){if(t.closed||t._hasParent(this))return;t._addParent(this)}(this._finalizers=(r=this._finalizers)!==null&&r!==void 0?r:[]).push(t)}},e.prototype._hasParent=function(t){var r=this._parentage;return r===t||Array.isArray(r)&&r.includes(t)},e.prototype._addParent=function(t){var r=this._parentage;this._parentage=Array.isArray(r)?(r.push(t),r):r?[r,t]:t},e.prototype._removeParent=function(t){var r=this._parentage;r===t?this._parentage=null:Array.isArray(r)&&Ve(r,t)},e.prototype.remove=function(t){var r=this._finalizers;r&&Ve(r,t),t instanceof e&&t._removeParent(this)},e.EMPTY=function(){var t=new e;return t.closed=!0,t}(),e}();var Sr=Ie.EMPTY;function jt(e){return e instanceof Ie||e&&"closed"in e&&C(e.remove)&&C(e.add)&&C(e.unsubscribe)}function ln(e){C(e)?e():e.unsubscribe()}var Le={onUnhandledError:null,onStoppedNotification:null,Promise:void 0,useDeprecatedSynchronousErrorHandling:!1,useDeprecatedNextContext:!1};var st={setTimeout:function(e,t){for(var r=[],n=2;n0},enumerable:!1,configurable:!0}),t.prototype._trySubscribe=function(r){return this._throwIfClosed(),e.prototype._trySubscribe.call(this,r)},t.prototype._subscribe=function(r){return this._throwIfClosed(),this._checkFinalizedStatuses(r),this._innerSubscribe(r)},t.prototype._innerSubscribe=function(r){var n=this,o=this,i=o.hasError,s=o.isStopped,a=o.observers;return i||s?Sr:(this.currentObservers=null,a.push(r),new Ie(function(){n.currentObservers=null,Ve(a,r)}))},t.prototype._checkFinalizedStatuses=function(r){var n=this,o=n.hasError,i=n.thrownError,s=n.isStopped;o?r.error(i):s&&r.complete()},t.prototype.asObservable=function(){var r=new F;return r.source=this,r},t.create=function(r,n){return new xn(r,n)},t}(F);var xn=function(e){ie(t,e);function t(r,n){var o=e.call(this)||this;return o.destination=r,o.source=n,o}return t.prototype.next=function(r){var n,o;(o=(n=this.destination)===null||n===void 0?void 0:n.next)===null||o===void 0||o.call(n,r)},t.prototype.error=function(r){var n,o;(o=(n=this.destination)===null||n===void 0?void 0:n.error)===null||o===void 0||o.call(n,r)},t.prototype.complete=function(){var r,n;(n=(r=this.destination)===null||r===void 0?void 0:r.complete)===null||n===void 0||n.call(r)},t.prototype._subscribe=function(r){var n,o;return(o=(n=this.source)===null||n===void 0?void 0:n.subscribe(r))!==null&&o!==void 0?o:Sr},t}(x);var Et={now:function(){return(Et.delegate||Date).now()},delegate:void 0};var wt=function(e){ie(t,e);function t(r,n,o){r===void 0&&(r=1/0),n===void 0&&(n=1/0),o===void 0&&(o=Et);var i=e.call(this)||this;return i._bufferSize=r,i._windowTime=n,i._timestampProvider=o,i._buffer=[],i._infiniteTimeWindow=!0,i._infiniteTimeWindow=n===1/0,i._bufferSize=Math.max(1,r),i._windowTime=Math.max(1,n),i}return t.prototype.next=function(r){var n=this,o=n.isStopped,i=n._buffer,s=n._infiniteTimeWindow,a=n._timestampProvider,f=n._windowTime;o||(i.push(r),!s&&i.push(a.now()+f)),this._trimBuffer(),e.prototype.next.call(this,r)},t.prototype._subscribe=function(r){this._throwIfClosed(),this._trimBuffer();for(var n=this._innerSubscribe(r),o=this,i=o._infiniteTimeWindow,s=o._buffer,a=s.slice(),f=0;f0?e.prototype.requestAsyncId.call(this,r,n,o):(r.actions.push(this),r._scheduled||(r._scheduled=ut.requestAnimationFrame(function(){return r.flush(void 0)})))},t.prototype.recycleAsyncId=function(r,n,o){var i;if(o===void 0&&(o=0),o!=null?o>0:this.delay>0)return e.prototype.recycleAsyncId.call(this,r,n,o);var s=r.actions;n!=null&&((i=s[s.length-1])===null||i===void 0?void 0:i.id)!==n&&(ut.cancelAnimationFrame(n),r._scheduled=void 0)},t}(Wt);var Sn=function(e){ie(t,e);function t(){return e!==null&&e.apply(this,arguments)||this}return t.prototype.flush=function(r){this._active=!0;var n=this._scheduled;this._scheduled=void 0;var o=this.actions,i;r=r||o.shift();do if(i=r.execute(r.state,r.delay))break;while((r=o[0])&&r.id===n&&o.shift());if(this._active=!1,i){for(;(r=o[0])&&r.id===n&&o.shift();)r.unsubscribe();throw i}},t}(Dt);var Oe=new Sn(wn);var M=new F(function(e){return e.complete()});function Vt(e){return e&&C(e.schedule)}function Cr(e){return e[e.length-1]}function Ye(e){return C(Cr(e))?e.pop():void 0}function Te(e){return Vt(Cr(e))?e.pop():void 0}function zt(e,t){return typeof Cr(e)=="number"?e.pop():t}var pt=function(e){return e&&typeof e.length=="number"&&typeof e!="function"};function Nt(e){return C(e==null?void 0:e.then)}function qt(e){return C(e[ft])}function Kt(e){return Symbol.asyncIterator&&C(e==null?void 0:e[Symbol.asyncIterator])}function Qt(e){return new TypeError("You provided "+(e!==null&&typeof e=="object"?"an invalid object":"'"+e+"'")+" where a stream was expected. You can provide an Observable, Promise, ReadableStream, Array, AsyncIterable, or Iterable.")}function zi(){return typeof Symbol!="function"||!Symbol.iterator?"@@iterator":Symbol.iterator}var Yt=zi();function Gt(e){return C(e==null?void 0:e[Yt])}function Bt(e){return un(this,arguments,function(){var r,n,o,i;return $t(this,function(s){switch(s.label){case 0:r=e.getReader(),s.label=1;case 1:s.trys.push([1,,9,10]),s.label=2;case 2:return[4,et(r.read())];case 3:return n=s.sent(),o=n.value,i=n.done,i?[4,et(void 0)]:[3,5];case 4:return[2,s.sent()];case 5:return[4,et(o)];case 6:return[4,s.sent()];case 7:return s.sent(),[3,2];case 8:return[3,10];case 9:return r.releaseLock(),[7];case 10:return[2]}})})}function Jt(e){return C(e==null?void 0:e.getReader)}function U(e){if(e instanceof F)return e;if(e!=null){if(qt(e))return Ni(e);if(pt(e))return qi(e);if(Nt(e))return Ki(e);if(Kt(e))return On(e);if(Gt(e))return Qi(e);if(Jt(e))return Yi(e)}throw Qt(e)}function Ni(e){return new F(function(t){var r=e[ft]();if(C(r.subscribe))return r.subscribe(t);throw new TypeError("Provided object does not correctly implement Symbol.observable")})}function qi(e){return new F(function(t){for(var r=0;r=2;return function(n){return n.pipe(e?A(function(o,i){return e(o,i,n)}):de,ge(1),r?He(t):Dn(function(){return new Zt}))}}function Vn(){for(var e=[],t=0;t=2,!0))}function pe(e){e===void 0&&(e={});var t=e.connector,r=t===void 0?function(){return new x}:t,n=e.resetOnError,o=n===void 0?!0:n,i=e.resetOnComplete,s=i===void 0?!0:i,a=e.resetOnRefCountZero,f=a===void 0?!0:a;return function(c){var u,p,m,d=0,h=!1,v=!1,Y=function(){p==null||p.unsubscribe(),p=void 0},B=function(){Y(),u=m=void 0,h=v=!1},N=function(){var O=u;B(),O==null||O.unsubscribe()};return y(function(O,Qe){d++,!v&&!h&&Y();var De=m=m!=null?m:r();Qe.add(function(){d--,d===0&&!v&&!h&&(p=$r(N,f))}),De.subscribe(Qe),!u&&d>0&&(u=new rt({next:function($e){return De.next($e)},error:function($e){v=!0,Y(),p=$r(B,o,$e),De.error($e)},complete:function(){h=!0,Y(),p=$r(B,s),De.complete()}}),U(O).subscribe(u))})(c)}}function $r(e,t){for(var r=[],n=2;ne.next(document)),e}function K(e,t=document){return Array.from(t.querySelectorAll(e))}function z(e,t=document){let r=ce(e,t);if(typeof r=="undefined")throw new ReferenceError(`Missing element: expected "${e}" to be present`);return r}function ce(e,t=document){return t.querySelector(e)||void 0}function _e(){return document.activeElement instanceof HTMLElement&&document.activeElement||void 0}function tr(e){return L(b(document.body,"focusin"),b(document.body,"focusout")).pipe(ke(1),l(()=>{let t=_e();return typeof t!="undefined"?e.contains(t):!1}),V(e===_e()),J())}function Xe(e){return{x:e.offsetLeft,y:e.offsetTop}}function Kn(e){return L(b(window,"load"),b(window,"resize")).pipe(Ce(0,Oe),l(()=>Xe(e)),V(Xe(e)))}function rr(e){return{x:e.scrollLeft,y:e.scrollTop}}function dt(e){return L(b(e,"scroll"),b(window,"resize")).pipe(Ce(0,Oe),l(()=>rr(e)),V(rr(e)))}var Yn=function(){if(typeof Map!="undefined")return Map;function e(t,r){var n=-1;return t.some(function(o,i){return o[0]===r?(n=i,!0):!1}),n}return function(){function t(){this.__entries__=[]}return Object.defineProperty(t.prototype,"size",{get:function(){return this.__entries__.length},enumerable:!0,configurable:!0}),t.prototype.get=function(r){var n=e(this.__entries__,r),o=this.__entries__[n];return o&&o[1]},t.prototype.set=function(r,n){var o=e(this.__entries__,r);~o?this.__entries__[o][1]=n:this.__entries__.push([r,n])},t.prototype.delete=function(r){var n=this.__entries__,o=e(n,r);~o&&n.splice(o,1)},t.prototype.has=function(r){return!!~e(this.__entries__,r)},t.prototype.clear=function(){this.__entries__.splice(0)},t.prototype.forEach=function(r,n){n===void 0&&(n=null);for(var o=0,i=this.__entries__;o0},e.prototype.connect_=function(){!Wr||this.connected_||(document.addEventListener("transitionend",this.onTransitionEnd_),window.addEventListener("resize",this.refresh),va?(this.mutationsObserver_=new MutationObserver(this.refresh),this.mutationsObserver_.observe(document,{attributes:!0,childList:!0,characterData:!0,subtree:!0})):(document.addEventListener("DOMSubtreeModified",this.refresh),this.mutationEventsAdded_=!0),this.connected_=!0)},e.prototype.disconnect_=function(){!Wr||!this.connected_||(document.removeEventListener("transitionend",this.onTransitionEnd_),window.removeEventListener("resize",this.refresh),this.mutationsObserver_&&this.mutationsObserver_.disconnect(),this.mutationEventsAdded_&&document.removeEventListener("DOMSubtreeModified",this.refresh),this.mutationsObserver_=null,this.mutationEventsAdded_=!1,this.connected_=!1)},e.prototype.onTransitionEnd_=function(t){var r=t.propertyName,n=r===void 0?"":r,o=ba.some(function(i){return!!~n.indexOf(i)});o&&this.refresh()},e.getInstance=function(){return this.instance_||(this.instance_=new e),this.instance_},e.instance_=null,e}(),Gn=function(e,t){for(var r=0,n=Object.keys(t);r0},e}(),Jn=typeof WeakMap!="undefined"?new WeakMap:new Yn,Xn=function(){function e(t){if(!(this instanceof e))throw new TypeError("Cannot call a class as a function.");if(!arguments.length)throw new TypeError("1 argument required, but only 0 present.");var r=ga.getInstance(),n=new La(t,r,this);Jn.set(this,n)}return e}();["observe","unobserve","disconnect"].forEach(function(e){Xn.prototype[e]=function(){var t;return(t=Jn.get(this))[e].apply(t,arguments)}});var Aa=function(){return typeof nr.ResizeObserver!="undefined"?nr.ResizeObserver:Xn}(),Zn=Aa;var eo=new x,Ca=$(()=>k(new Zn(e=>{for(let t of e)eo.next(t)}))).pipe(g(e=>L(ze,k(e)).pipe(R(()=>e.disconnect()))),X(1));function he(e){return{width:e.offsetWidth,height:e.offsetHeight}}function ye(e){return Ca.pipe(S(t=>t.observe(e)),g(t=>eo.pipe(A(({target:r})=>r===e),R(()=>t.unobserve(e)),l(()=>he(e)))),V(he(e)))}function bt(e){return{width:e.scrollWidth,height:e.scrollHeight}}function ar(e){let t=e.parentElement;for(;t&&(e.scrollWidth<=t.scrollWidth&&e.scrollHeight<=t.scrollHeight);)t=(e=t).parentElement;return t?e:void 0}var to=new x,Ra=$(()=>k(new IntersectionObserver(e=>{for(let t of e)to.next(t)},{threshold:0}))).pipe(g(e=>L(ze,k(e)).pipe(R(()=>e.disconnect()))),X(1));function sr(e){return Ra.pipe(S(t=>t.observe(e)),g(t=>to.pipe(A(({target:r})=>r===e),R(()=>t.unobserve(e)),l(({isIntersecting:r})=>r))))}function ro(e,t=16){return dt(e).pipe(l(({y:r})=>{let n=he(e),o=bt(e);return r>=o.height-n.height-t}),J())}var cr={drawer:z("[data-md-toggle=drawer]"),search:z("[data-md-toggle=search]")};function no(e){return cr[e].checked}function Ke(e,t){cr[e].checked!==t&&cr[e].click()}function Ue(e){let t=cr[e];return b(t,"change").pipe(l(()=>t.checked),V(t.checked))}function ka(e,t){switch(e.constructor){case HTMLInputElement:return e.type==="radio"?/^Arrow/.test(t):!0;case HTMLSelectElement:case HTMLTextAreaElement:return!0;default:return e.isContentEditable}}function Ha(){return L(b(window,"compositionstart").pipe(l(()=>!0)),b(window,"compositionend").pipe(l(()=>!1))).pipe(V(!1))}function oo(){let e=b(window,"keydown").pipe(A(t=>!(t.metaKey||t.ctrlKey)),l(t=>({mode:no("search")?"search":"global",type:t.key,claim(){t.preventDefault(),t.stopPropagation()}})),A(({mode:t,type:r})=>{if(t==="global"){let n=_e();if(typeof n!="undefined")return!ka(n,r)}return!0}),pe());return Ha().pipe(g(t=>t?M:e))}function le(){return new URL(location.href)}function ot(e){location.href=e.href}function io(){return new x}function ao(e,t){if(typeof t=="string"||typeof t=="number")e.innerHTML+=t.toString();else if(t instanceof Node)e.appendChild(t);else if(Array.isArray(t))for(let r of t)ao(e,r)}function _(e,t,...r){let n=document.createElement(e);if(t)for(let o of Object.keys(t))typeof t[o]!="undefined"&&(typeof t[o]!="boolean"?n.setAttribute(o,t[o]):n.setAttribute(o,""));for(let o of r)ao(n,o);return n}function fr(e){if(e>999){let t=+((e-950)%1e3>99);return`${((e+1e-6)/1e3).toFixed(t)}k`}else return e.toString()}function so(){return location.hash.substring(1)}function Dr(e){let t=_("a",{href:e});t.addEventListener("click",r=>r.stopPropagation()),t.click()}function Pa(e){return L(b(window,"hashchange"),e).pipe(l(so),V(so()),A(t=>t.length>0),X(1))}function co(e){return Pa(e).pipe(l(t=>ce(`[id="${t}"]`)),A(t=>typeof t!="undefined"))}function Vr(e){let t=matchMedia(e);return er(r=>t.addListener(()=>r(t.matches))).pipe(V(t.matches))}function fo(){let e=matchMedia("print");return L(b(window,"beforeprint").pipe(l(()=>!0)),b(window,"afterprint").pipe(l(()=>!1))).pipe(V(e.matches))}function zr(e,t){return e.pipe(g(r=>r?t():M))}function ur(e,t={credentials:"same-origin"}){return ue(fetch(`${e}`,t)).pipe(fe(()=>M),g(r=>r.status!==200?Ot(()=>new Error(r.statusText)):k(r)))}function We(e,t){return ur(e,t).pipe(g(r=>r.json()),X(1))}function uo(e,t){let r=new DOMParser;return ur(e,t).pipe(g(n=>n.text()),l(n=>r.parseFromString(n,"text/xml")),X(1))}function pr(e){let t=_("script",{src:e});return $(()=>(document.head.appendChild(t),L(b(t,"load"),b(t,"error").pipe(g(()=>Ot(()=>new ReferenceError(`Invalid script: ${e}`))))).pipe(l(()=>{}),R(()=>document.head.removeChild(t)),ge(1))))}function po(){return{x:Math.max(0,scrollX),y:Math.max(0,scrollY)}}function lo(){return L(b(window,"scroll",{passive:!0}),b(window,"resize",{passive:!0})).pipe(l(po),V(po()))}function mo(){return{width:innerWidth,height:innerHeight}}function ho(){return b(window,"resize",{passive:!0}).pipe(l(mo),V(mo()))}function bo(){return G([lo(),ho()]).pipe(l(([e,t])=>({offset:e,size:t})),X(1))}function lr(e,{viewport$:t,header$:r}){let n=t.pipe(ee("size")),o=G([n,r]).pipe(l(()=>Xe(e)));return G([r,t,o]).pipe(l(([{height:i},{offset:s,size:a},{x:f,y:c}])=>({offset:{x:s.x-f,y:s.y-c+i},size:a})))}(()=>{function e(n,o){parent.postMessage(n,o||"*")}function t(...n){return n.reduce((o,i)=>o.then(()=>new Promise(s=>{let a=document.createElement("script");a.src=i,a.onload=s,document.body.appendChild(a)})),Promise.resolve())}var r=class extends EventTarget{constructor(n){super(),this.url=n,this.m=i=>{i.source===this.w&&(this.dispatchEvent(new MessageEvent("message",{data:i.data})),this.onmessage&&this.onmessage(i))},this.e=(i,s,a,f,c)=>{if(s===`${this.url}`){let u=new ErrorEvent("error",{message:i,filename:s,lineno:a,colno:f,error:c});this.dispatchEvent(u),this.onerror&&this.onerror(u)}};let o=document.createElement("iframe");o.hidden=!0,document.body.appendChild(this.iframe=o),this.w.document.open(),this.w.document.write(` + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Backgammon

+
+
+
+

+ +

+
+
+

+ +

+
+
+
+

Usage

+
import pgx
+
+env = pgx.make("backgammon")
+
+

or you can directly load Backgammon class

+
from pgx.backgammon import Backgammon
+
+env = Backgammon()
+
+

Description

+
+

Backgammon ...

+

Wikipedia

+
+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players2
Number of actions162 (= 6 * 26 + 6)
Observation shape(34,)
Observation typeint
Rewards{-3, -2, -1, 0, 1, 2, 3}
+

Observation

+

The first 28 observation dimensions follow [Antonoglou+22]:

+
+

An action in our implementation consists of 4 micro-actions, the same as the maximum number +of dice a player can play at each turn. Each micro-action encodes the source position of a chip +along with the value of the die used. We consider 26 possible source positions, with the 0-th position corresponding to a no-op, the 1st to retrieving a chip from the hit pile, and the remaining to selecting a chip in one of the 24 possible points. Each micro-action is encoded as a single integer with micro-action = src · 6 + die.

+
+ + + + + + + + + + + + + + + + + + + + + +
IndexDescription
[:24]represents
[24:28]represents
[28:34]is one-hot vector of playable dice
+

Action

+

...

+

Rewards

+

...

+

Termination

+

...

+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+

Reference

+
    +
  1. [Antonoglou+22] "Planning in Stochastic Environments with a Learned Modell", ICLR
  2. +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/bridge_bidding/index.html b/bridge_bidding/index.html new file mode 100644 index 000000000..02414cf56 --- /dev/null +++ b/bridge_bidding/index.html @@ -0,0 +1,756 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bridge bidding - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Bridge bidding

+
+
+
+

+ +

+
+
+

+ +

+
+
+
+

Description

+

TBA

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/chess/index.html b/chess/index.html new file mode 100644 index 000000000..a2719e704 --- /dev/null +++ b/chess/index.html @@ -0,0 +1,990 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Chess - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Chess

+
+
+
+

+ +

+
+
+

+ +

+
+
+
+

Usage

+
import pgx
+
+env = pgx.make("chess")
+
+

or you can directly load Chess class

+
from pgx.chess import Chess
+
+env = Chess()
+
+

Description

+

TBA

+

Rules

+

TBA

+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players2
Number of actions4672
Observation shape(8, 8, 119)
Observation typefloat
Rewards{-1, 0, 1}
+

Observation

+

We follow the observation design of AlphaZero [Silver+18].

+ + + + + + + + + + + + + +
IndexDescription
TBATBA
+

Action

+

TBA

+

Rewards

+

Non-zero rewards are given only at the terminal states. +The reward at terminal state is described in this table:

+ + + + + + + + + + + + + + + + + + + + + +
Reward
Win+1
Lose-1
Draw0
+

Termination

+

Termination occurs when one of the following conditions are satisfied:

+
    +
  1. checkmate
  2. +
  3. stalemate
  4. +
  5. no sufficient pieces to checkmate
  6. +
  7. 50 halfmoves are elapsed without any captures or pawn moves
  8. +
  9. 512 steps are elapsed (from AlphaZero [Silver+18])
  10. +
+

Version History

+
    +
  • v1 : Bug fix when castling by @HongruiTang in #983 (v1.1.0)
  • +
  • v0 : Initial release (v1.0.0)
  • +
+

Reference

+
    +
  • [Silver+18] "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play" Science
  • +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/connect_four/index.html b/connect_four/index.html new file mode 100644 index 000000000..5e9a2cab2 --- /dev/null +++ b/connect_four/index.html @@ -0,0 +1,958 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Connect four - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Connect four

+
+
+
+

+ +

+
+
+

+ +

+
+
+
+

Usage

+
import pgx
+
+env = pgx.make("connect_four")
+
+

or you can directly load ConnectFour class

+
from pgx.connect_four import ConnectFour
+
+env = ConnectFour()
+
+

Description

+
+

Connect Four is a two-player connection rack game, in which the players choose a color and then take turns dropping colored tokens into a seven-column, six-row vertically suspended grid. The pieces fall straight down, occupying the lowest available space within the column. The objective of the game is to be the first to form a horizontal, vertical, or diagonal line of four of one's own tokens.

+

Wikipedia

+
+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players2
Number of actions7
Observation shape(6, 7, 2)
Observation typebool
Rewards{-1, 0, 1}
+

Observation

+ + + + + + + + + + + + + + + + + +
IndexDescription
[:, :, 0]represents (6, 7) squares filled by the current player
[:, :, 1]represents (6, 7) squares filled by the opponent player of current player
+

Action

+

Each action represents the column index the player drops the token to.

+

Rewards

+

Non-zero rewards are given only at the terminal states. +The reward at terminal state is described in this table:

+ + + + + + + + + + + + + + + + + + + + + +
Reward
Win+1
Lose-1
Draw0
+

Termination

+

Termination happens when

+
    +
  1. either one player places four of their tokens in a row (horizontally, vertically, or diagonally), or
  2. +
  3. all 42 (= 6 x 7) squares are filled.
  4. +
+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/gardner_chess/index.html b/gardner_chess/index.html new file mode 100644 index 000000000..f232c4d21 --- /dev/null +++ b/gardner_chess/index.html @@ -0,0 +1,989 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Gardner chess - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Gardner chess

+
+
+
+

+ +

+
+
+

+ +

+
+
+
+

Usage

+
import pgx
+
+env = pgx.make("gardner_chess")
+
+

or you can directly load GardnerChess class

+
from pgx.gardner_chess import GardnerChess
+
+env = GardnerChess()
+
+

Description

+

TBA

+

Rules

+

TBA

+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players2
Number of actions1225
Observation shape(5, 5, 115)
Observation typefloat
Rewards{-1, 0, 1}
+

Observation

+

We follow the observation design of AlphaZero [Silver+18].

+ + + + + + + + + + + + + +
IndexDescription
TBATBA
+

Action

+

TBA

+

Rewards

+

Non-zero rewards are given only at the terminal states. +The reward at terminal state is described in this table:

+ + + + + + + + + + + + + + + + + + + + + +
Reward
Win+1
Lose-1
Draw0
+

Termination

+

Termination occurs when one of the following conditions are satisfied:

+
    +
  1. checkmate
  2. +
  3. stalemate
  4. +
  5. no sufficient pieces to checkmate
  6. +
  7. 50 halfmoves are elapsed without any captures or pawn moves
  8. +
  9. 256 steps are elapsed (512 in full-size chess experiments in AlphaZero [Silver+18])
  10. +
+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+

Reference

+
    +
  • [Silver+18] "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play" Science
  • +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/go/index.html b/go/index.html new file mode 100644 index 000000000..ebb27b75c --- /dev/null +++ b/go/index.html @@ -0,0 +1,1064 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Go - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Go

+
+
+
+

+ +

+
+
+

+ +

+
+
+
+

Usage

+
import pgx
+
+env = pgx.make("go_19x19")  # or "go_9x9"
+
+

or you can directly load Go class

+
from pgx.go import Go
+
+env = Go(size=19, komi=6.5)
+
+

Description

+
+

Go is an abstract strategy board game for two players in which the aim is to surround more territory than the opponent. The game was invented in China more than 2,500 years ago and is believed to be the oldest board game continuously played to the present day.

+

Wikipedia

+
+

Rules

+

The rule implemented in Pgx follows Tromp-Taylor Rules.

+
+

Komi

+

By default, we use 6.5. Users can set different komi at Go class constructor.

+
+
+

Ko

+

On PSK implementations.

+

Tromp-Taylor rule employ PSK. However, implementing strict PSK is inefficient because

+
    +
  • Simulator has to store all previous board (or hash) history, and
  • +
  • Agent also has to remember all previous board to avoid losing by PSK
  • +
+

As PSK rarely happens, as far as our best knowledge, it is usual to compromise in PSK implementations. +For example,

+
    +
  • OpenSpiel employs SSK (instead of PSK) for computing legal actions, and if PSK action happened, the game ends with tie.
      +
    • Pros: Detect all PSK actions
    • +
    • Cons: Agent cannot know why the game ends with tie (if the same board is too old)
    • +
    +
  • +
  • PettingZoo employs SSK for legal actions, and ignores even if PSK action happened.
      +
    • Pros: Simple
    • +
    • Cons: PSK is totally ignored
    • +
    +
  • +
+

Note that the strict rule is "PSK for legal actions, and PSK action leads to immediate lose." +So, we also compromise at this point, our approach is

+
    +
  • Pgx employs SSK for legal actions, PSK is approximated by up to 8-steps before board, and approximate PSK action leads to immediate lose
      +
    • Pros: Agent may be able to avoid PSK (as it observes board history up to 8-steps in AlphaGo Zero feature)
    • +
    • Cons: Ignoring the old same boards
    • +
    +
  • +
+

Anyway, we believe it's effect is very small as PSK rarely happens, especially in 19x19 board.

+
+

Specs

+

Let N be the board size (e.g., 19).

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players2
Number of actionsN x N + 1
Observation shape(N, N, 17)
Observation typebool
Rewards{-1, 1}
+

Observation

+

We follow the observation design of AlphaGo Zero [Silver+17].

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
IndexDescription
obs[:, :, 0]stones of player_id (@ current board)
obs[:, :, 1]stones of player_id's opponent (@ current board)
obs[:, :, 2]stones of player_id (@ 1-step before)
obs[:, :, 3]stones of player_id's opponent (@ 1-step before)
......
obs[:, :, -1]color of player_id
+
+

Final observation dimension

+

For the final dimension, there are two possible options:

+
    +
  • Use the color of current player to play
  • +
  • Use the color of player_id
  • +
+

This ambiguity happens because observe function is available even if player_id is different from state.current_player. +In AlphaGo Zero paper [Silver+17], the final dimension C is explained as:

+
+

The final feature plane, C, represents the colour to play, and has a constant value of either 1 if black + is to play or 0 if white is to play.

+
+

however, it also describes as

+
+

the colour feature C is necessary because the komi is not observable.

+
+

So, we use player_id's color to let the agent know komi information. +As long as it's called when player_id == state.current_player, this doesn't matter.

+
+

Action

+

Each action ({0, ..., N * N - 1}) represents the point to be colored. +The final action represents pass action.

+

Rewards

+

Non-zero rewards are given only at the terminal states. +The reward at terminal state is described in this table:

+ + + + + + + + + + + + + + + + + +
Reward
Win+1
Lose-1
+

Termination

+

Termination happens when

+
    +
  1. either one plays two consecutive passes, or
  2. +
  3. N * N * 2 steps are elapsed [Silver+17].
  4. +
+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+

Reference

+
    +
  1. [Silver+17] "Mastering the game of go without human knowledge" Nature
  2. +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/hex/index.html b/hex/index.html new file mode 100644 index 000000000..f8266185f --- /dev/null +++ b/hex/index.html @@ -0,0 +1,977 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Hex - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Hex

+
+
+
+

+ +

+
+
+

+ +

+
+
+
+

Usage

+
import pgx
+
+env = pgx.make("hex")
+
+

or you can directly load Hex class

+
from pgx.hex import Hex
+
+env = Hex()
+
+

Description

+
+

Hex is a two player abstract strategy board game in which players attempt to connect opposite sides of a rhombus-shaped board made of hexagonal cells. Hex was invented by mathematician and poet Piet Hein in 1942 and later rediscovered and popularized by John Nash.

+

Wikipedia

+
+

Rules

+

As the first player to move has a distinct advantage, the swap rule is used to compensate for this. +The detailed swap rule used in Pgx follows swap pieces:

+
+

"Swap pieces": The players perform the swap by switching pieces. This means the initial red piece is replaced by a blue piece in the mirror image position, where the mirroring takes place with respect to the board's long diagonal. For example, a red piece at a3 becomes a blue piece at c1. The players do not switch colours: Red stays Red and Blue stays Blue. After the swap, it is Red's turn.

+

Hex Wiki - Swap rule

+
+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players2
Number of actions122 (= 11 x 11) + 1
Observation shape(11, 11, 3)
Observation typebool
Rewards{-1, 1}
+

Observation

+ + + + + + + + + + + + + + + + + + + + + +
IndexDescription
[:, :, 0]represents (11, 11) cells filled by player_ix
[:, :, 1]represents (11, 11) cells filled by the opponent player of player_id
[:, :, 2]represents whether player_id is black or white
+

Action

+

Each action ({0, ... 120}) represents the cell index to be filled. +The final action 121 is the swap action available only at the second turn.

+

Rewards

+

Non-zero rewards are given only at the terminal states. +The reward at terminal state is described in this table:

+ + + + + + + + + + + + + + + + + +
Reward
Win+1
Lose-1
+

Note that there is no draw in Hex.

+

Termination

+

Termination happens when either one player connect opposite sides of the board.

+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/index.html b/index.html new file mode 100644 index 000000000..07b4b5663 --- /dev/null +++ b/index.html @@ -0,0 +1,700 @@ + + + + + + + + + + + + + + + + + + + + + + + + Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

+ +

+ +

+ +

+ +

Pgx Documentation

+
import jax
+import pgx
+
+env = pgx.make("go_19x19")
+init = jax.jit(jax.vmap(env.init))
+step = jax.jit(jax.vmap(env.step))
+
+batch_size = 1024
+keys = jax.random.split(jax.random.PRNGKey(42), batch_size)
+state = init(keys)  # vectorized states
+while not (state.terminated | state.truncated).all():
+    action = model(state.current_player, state.observation, state.legal_action_mask)
+    state = step(state, action)  # state.reward (2,)
+
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/kuhn_poker/index.html b/kuhn_poker/index.html new file mode 100644 index 000000000..d01e1104e --- /dev/null +++ b/kuhn_poker/index.html @@ -0,0 +1,975 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Kuhn poker - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Kuhn poker

+
+
+
+

+ +

+
+
+

+ +

+
+
+
+

Description

+

Kuhn poker is a simplified poker with three cards: J, Q, and K.

+

Rules

+

Each player is dealt one card and the remaining card is unused. +There are four actions: check, call, bet, and fold and five possible scenarios.

+
    +
  1. bet (1st) - call (2nd) : Showdown and the winner takes +2
  2. +
  3. bet (1st) - fold (2nd) : 1st player takes +1
  4. +
  5. check (1st) - check (2nd) : Showdown and the winner takes +1
  6. +
  7. check (1st) - bet (2nd) - call (1st) : Showdown and the winner takes +2
  8. +
  9. check (1st) - bet (2nd) - fold (1st) : 2nd takes +1
  10. +
+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players2
Number of actions4
Observation shape(7,)
Observation typebool
Rewards{-2, -1, +1, +2}
+

Observation

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
IndexDescription
[0]One if J in my hand
[1]One if Q in my hand
[2]One if K in my hand
[3]One if 0 chip is bet by me
[4]One if 1 chip is bet by me
[5]One if 0 chip of the opponent
[6]One if 1 chip of the opponent
+

Action

+

There are four distinct actions.

+ + + + + + + + + + + + + + + + + + + + + + + + + +
ActionIndex
Call0
Bet1
Fold2
Check3
+

Rewards

+

The winner takes +2 or +1 depending on the game payoff. +As Kuhn poker is zero-sum game, the loser takes -2 or -1 respectively.

+

Termination

+

Follows the rules above.

+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/leduc_holdem/index.html b/leduc_holdem/index.html new file mode 100644 index 000000000..0853ab870 --- /dev/null +++ b/leduc_holdem/index.html @@ -0,0 +1,1000 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Leduc hold’em - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Leduc hold’em

+
+
+
+

+ +

+
+
+

+ +

+
+
+
+

Description

+

Leduc hold’em is a simplified poker proposed in [Souhty+05].

+

Rules

+

We quote the description in [Souhty+05]:

+
+

Leduc Hold ’Em. We have also constructed a smaller +version of hold ’em, which seeks to retain the strategic elements of the large game while keeping the size of the game +tractable. In Leduc hold ’em, the deck consists of two suits +with three cards in each suit. There are two rounds. In the +first round a single private card is dealt to each player. In +the second round a single board card is revealed. There is +a two-bet maximum, with raise amounts of 2 and 4 in the +first and second round, respectively. Both players start the +first round with 1 already in the pot.

+

+

Figure 1: An example decision tree for a single betting +round in poker with a two-bet maximum. Leaf nodes with +open boxes continue to the next round, while closed boxes +end the hand.

+
+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players2
Number of actions4
Observation shape(7,)
Observation typebool
Rewards{-13, -12, ... 0, ..., 12, 13}
+

Observation

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
IndexDescription
[0]True if J in hand
[1]True if Q in hand
[2]True if K in hand
[3]True if J is the public card
[4]True if J is the public card
[5]True if J is the public card
[6:19]represent my chip count (0, ..., 13)
[20:33]represent opponent's chip count (0, ..., 13)
+

Action

+

There are four distinct actions.

+ + + + + + + + + + + + + + + + + + + + + +
IndexAction
0Call
1Raise
2Fold
+

Rewards

+

The reward is the payoff of the game.

+

Termination

+

Follows the rules above.

+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+

References

+ + + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/minatar_asterix/index.html b/minatar_asterix/index.html new file mode 100644 index 000000000..83abab399 --- /dev/null +++ b/minatar_asterix/index.html @@ -0,0 +1,935 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + MinAtar Asterix - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

MinAtar Asterix

+

+ +

+ +

Usage

+

Note that the MinAtar suite is provided as a separate extension for Pgx (pgx-minatar). Therefore, please run the following command additionaly to use the MinAtar suite in Pgx:

+
pip install pgx-minatar
+
+

Then, you can use the environment as follows:

+
import pgx
+
+env = pgx.make("minatar-asterix")
+
+

Description

+

MinAtar is originally proposed by [Young&Tian+19]. +The Pgx implementation is intended to be the exact copy of the original MinAtar implementation in JAX. The Asterix environment is described as follows:

+
+

The player can move freely along the 4 cardinal directions. Enemies and treasure spawn from the sides. A reward of ++1 is given for picking up treasure. Termination occurs if the player makes contact with an enemy. Enemy and +treasure direction are indicated by a trail channel. Difficulty is periodically increased by increasing the speed +and spawn rate of enemies and treasure.

+

github.com/kenjyoung/MinAtar - asterix.py

+
+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players1
Number of actions5
Observation shape(10, 10, 4)
Observation typebool
Rewards{0, 1}
+

Observation

+ + + + + + + + + + + + + + + + + + + + + + + + + +
IndexChannel
[:, :, 0]Player
[:, :, 1]Enemy
[:, :, 2]Trail
[:, :, 3]Gold
+

Action

+

TBA

+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+

Reference

+
    +
  • [Young&Tian+19] "Minatar: An atari-inspired testbed for thorough and reproducible reinforcement learning experiments" arXiv:1903.03176
  • +
+

License

+

Pgx is provided under the Apache 2.0 License, but the original MinAtar suite follows the GPL 3.0 License. Therefore, please note that the separated MinAtar extension for Pgx also adheres to the GPL 3.0 License.

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/minatar_breakout/index.html b/minatar_breakout/index.html new file mode 100644 index 000000000..54735f652 --- /dev/null +++ b/minatar_breakout/index.html @@ -0,0 +1,936 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + MinAtar Breakout - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

MinAtar Breakout

+

+ +

+ +

Usage

+

Note that the MinAtar suite is provided as a separate extension for Pgx (pgx-minatar). Therefore, please run the following command additionaly to use the MinAtar suite in Pgx:

+
pip install pgx-minatar
+
+

Then, you can use the environment as follows:

+
import pgx
+
+env = pgx.make("minatar-breakout")
+
+

Description

+

MinAtar is originally proposed by [Young&Tian+19]. +The Pgx implementation is intended to be the exact copy of the original MinAtar implementation in JAX. The Breakout environment is described as follows:

+
+

The player controls a paddle on the bottom of the screen and must bounce a ball tobreak 3 rows of bricks along the +top of the screen. A reward of +1 is given for each brick broken by the ball. When all bricks are cleared another 3 +rows are added. The ball travels only along diagonals, when it hits the paddle it is bounced either to the left or +right depending on the side of the paddle hit, when it hits a wall or brick it is reflected. Termination occurs when +the ball hits the bottom of the screen. The balls direction is indicated by a trail channel.

+

github.com/kenjyoung/MinAtar - breakout.py

+
+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players1
Number of actions3
Observation shape(10, 10, 4)
Observation typebool
Rewards{0, 1}
+

Observation

+ + + + + + + + + + + + + + + + + + + + + + + + + +
IndexChannel
[:, :, 0]Paddle
[:, :, 1]Ball
[:, :, 2]Trail
[:, :, 3]Brick
+

Action

+

TBA

+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+

Reference

+
    +
  • [Young&Tian+19] "Minatar: An atari-inspired testbed for thorough and reproducible reinforcement learning experiments" arXiv:1903.03176
  • +
+

License

+

Pgx is provided under the Apache 2.0 License, but the original MinAtar suite follows the GPL 3.0 License. Therefore, please note that the separated MinAtar extension for Pgx also adheres to the GPL 3.0 License.

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/minatar_freeway/index.html b/minatar_freeway/index.html new file mode 100644 index 000000000..0d073cc5b --- /dev/null +++ b/minatar_freeway/index.html @@ -0,0 +1,947 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + MinAtar Freeway - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

MinAtar Freeway

+

+ +

+ +

Usage

+

Note that the MinAtar suite is provided as a separate extension for Pgx (pgx-minatar). Therefore, please run the following command additionaly to use the MinAtar suite in Pgx:

+
pip install pgx-minatar
+
+

Then, you can use the environment as follows:

+
import pgx
+
+env = pgx.make("minatar-freeway")
+
+

Description

+

MinAtar is originally proposed by [Young&Tian+19]. +The Pgx implementation is intended to be the exact copy of the original MinAtar implementation in JAX. The Freeway environment is described as follows:

+
+

The player begins at the bottom of the screen and motion is restricted to traveling up and down. Player speed is +also restricted such that the player can only move every 3 frames. A reward of +1 is given when the player reaches +the top of the screen, at which point the player is returned to the bottom. Cars travel horizontally on the screen +and teleport to the other side when the edge is reached. When hit by a car, the player is returned to the bottom of +the screen. Car direction and speed is indicated by 5 trail channels, the location of the trail gives direction +while the specific channel indicates how frequently the car moves (from once every frame to once every 5 frames). +Each time the player successfully reaches the top of the screen, the car speeds are randomized. Termination occurs +after 2500 frames have elapsed.

+

github.com/kenjyoung/MinAtar - freeway.py

+
+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players1
Number of actions3
Observation shape(10, 10, 7)
Observation typebool
Rewards{0, 1}
+

Observation

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
IndexChannel
[:, :, 0]Chicken
[:, :, 1]Car
[:, :, 2]Speed 1
[:, :, 3]Speed 2
[:, :, 4]Speed 3
[:, :, 5]Speed 4
+

Action

+

TBA

+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+

Reference

+
    +
  • [Young&Tian+19] "Minatar: An atari-inspired testbed for thorough and reproducible reinforcement learning experiments" arXiv:1903.03176
  • +
+

LICENSE

+

Pgx is provided under the Apache 2.0 License, but the original MinAtar suite follows the GPL 3.0 License. Therefore, please note that the separated MinAtar extension for Pgx also adheres to the GPL 3.0 License.

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/minatar_seaquest/index.html b/minatar_seaquest/index.html new file mode 100644 index 000000000..e6130890d --- /dev/null +++ b/minatar_seaquest/index.html @@ -0,0 +1,968 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + MinAtar Seaquest - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

MinAtar Seaquest

+

+ +

+ +

Usage

+

Note that the MinAtar suite is provided as a separate extension for Pgx (pgx-minatar). Therefore, please run the following command additionaly to use the MinAtar suite in Pgx:

+
pip install pgx-minatar
+
+

Then, you can use the environment as follows:

+
import pgx
+
+env = pgx.make("minatar-seaquest")
+
+

Description

+

MinAtar is originally proposed by [Young&Tian+19]. +The Pgx implementation is intended to be the exact copy of the original MinAtar implementation in JAX. The Seaquest environment is described as follows:

+
+

The player controls a submarine consisting of two cells, front and back, to allow direction to be determined. The +player can also fire bullets from the front of the submarine. Enemies consist of submarines and fish, distinguished +by the fact that submarines shoot bullets and fish do not. A reward of +1 is given each time an enemy is struck by +one of the player's bullets, at which point the enemy is also removed. There are also divers which the player can +move onto to pick up, doing so increments a bar indicated by another channel along the bottom of the screen. The +player also has a limited supply of oxygen indicated by another bar in another channel. Oxygen degrades over time, +and is replenished whenever the player moves to the top of the screen as long as the player has at least one rescued +diver on board. The player can carry a maximum of 6 divers. When surfacing with less than 6, one diver is removed. +When surfacing with 6, all divers are removed and a reward is given for each active cell in the oxygen bar. Each +time the player surfaces the difficulty is increased by increasing the spawn rate and movement speed of enemies. +Termination occurs when the player is hit by an enemy fish, sub or bullet; or when oxygen reached 0; or when the +player attempts to surface with no rescued divers. Enemy and diver directions are indicated by a trail channel +active in their previous location to reduce partial observability.

+

github.com/kenjyoung/MinAtar - seaquest.py

+
+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players1
Number of actions6
Observation shape(10, 10, 10)
Observation typebool
Rewards{0, 1, ..., 10}
+

Observation

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
IndexChannel
[:, :, 0]Player submarine (front)
[:, :, 1]Player submarine (back)
[:, :, 2]Friendly bullet
[:, :, 3]Trail
[:, :, 4]Enemy bullet
[:, :, 5]Enemy fish
[:, :, 6]Enemy submarine
[:, :, 7]Oxygen guage
[:, :, 8]Diver guage
[:, :, 9]Diver
+

Action

+

TBA

+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+

Reference

+
    +
  • [Young&Tian+19] "Minatar: An atari-inspired testbed for thorough and reproducible reinforcement learning experiments" arXiv:1903.03176
  • +
+

LICENSE

+

Pgx is provided under the Apache 2.0 License, but the original MinAtar suite follows the GPL 3.0 License. Therefore, please note that the separated MinAtar extension for Pgx also adheres to the GPL 3.0 License.

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/minatar_space_invaders/index.html b/minatar_space_invaders/index.html new file mode 100644 index 000000000..eba4bc2ff --- /dev/null +++ b/minatar_space_invaders/index.html @@ -0,0 +1,947 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + MinAtar Space Invaders - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

MinAtar Space Invaders

+

+ +

+ +

Usage

+

Note that the MinAtar suite is provided as a separate extension for Pgx (pgx-minatar). Therefore, please run the following command additionaly to use the MinAtar suite in Pgx:

+
pip install pgx-minatar
+
+

Then, you can use the environment as follows:

+
import pgx
+
+env = pgx.make("minatar-space_invaders")
+
+

Description

+

MinAtar is originally proposed by [Young&Tian+19]. +The Pgx implementation is intended to be the exact copy of the original MinAtar implementation in JAX. The Space Invaders environment is described as follows:

+
+

The player controls a cannon at the bottom of the screen and can shoot bullets upward at a cluster of aliens above. +The aliens move across the screen until one of them hits the edge, at which point they all move down and switch +directions. The current alien direction is indicated by 2 channels (one for left and one for right) one of which is +active at the location of each alien. A reward of +1 is given each time an alien is shot, and that alien is also +removed. The aliens will also shoot bullets back at the player. When few aliens are left, alien speed will begin to +increase. When only one alien is left, it will move at one cell per frame. When a wave of aliens is fully cleared a +new one will spawn which moves at a slightly faster speed than the last. Termination occurs when an alien or bullet +hits the player.

+

github.com/kenjyoung/MinAtar - space_invaders.py

+
+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players1
Number of actions4
Observation shape(10, 10, 6)
Observation typebool
Rewards{0, 1}
+

Observation

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
IndexChannel
[:, :, 0]Cannon
[:, :, 1]Alien
[:, :, 2]Alien left
[:, :, 3]Alien right
[:, :, 4]Friendly bullet
[:, :, 5]Enemy bullet
+

Action

+

TBA

+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+

Reference

+
    +
  • [Young&Tian+19] "Minatar: An atari-inspired testbed for thorough and reproducible reinforcement learning experiments" arXiv:1903.03176
  • +
+

LICENSE

+

Pgx is provided under the Apache 2.0 License, but the original MinAtar suite follows the GPL 3.0 License. Therefore, please note that the separated MinAtar extension for Pgx also adheres to the GPL 3.0 License.

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/objects.inv b/objects.inv new file mode 100644 index 000000000..804401381 Binary files /dev/null and b/objects.inv differ diff --git a/othello/index.html b/othello/index.html new file mode 100644 index 000000000..5e953ea85 --- /dev/null +++ b/othello/index.html @@ -0,0 +1,954 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Othello - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Othello

+
+
+
+

+ +

+
+
+

+ +

+
+
+
+

Usage

+
import pgx
+
+env = pgx.make("othello")
+
+

or you can directly load Othello class

+
from pgx.othello import Othello
+
+env = Othello()
+
+

Description

+
+

Othello, or differing in not having a defined starting position, Reversi, is a two-player zero-sum and perfect information abstract strategy board game, usually played on a board with 8 rows and 8 columns and a set of light and a dark turnable pieces for each side. The player's goal is to have a majority of their colored pieces showing at the end of the game, turning over as many of their opponent's pieces as possible. The dark player makes the first move from the starting position, alternating with the light player. Each player has to place a piece on the board such that there exists at least one straight (horizontal, vertical, or diagonal) occupied line of opponent pieces between the new piece and another own piece. After placing the piece, the side turns over (flips, captures) all opponent pieces lying on any straight lines between the new piece and any anchoring own pieces.

+

Chess Programming Wiki

+
+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players2
Number of actions65 (= 8 x 8 + 1)
Observation shape(8, 8, 2)
Observation typebool
Rewards{-1, 0, 1}
+

Observation

+ + + + + + + + + + + + + + + + + +
IndexDescription
[:, :, 0]represents (8, 8) squares colored by the current player
[:, :, 1]represents (8, 8) squares colored by the opponent player of current player
+

Action

+

Each action ({0, ..., 63}) represents the square index to be filled. The last 64-th action represents pass action.

+

Rewards

+

Non-zero rewards are given only at the terminal states. +The reward at terminal state is described in this table:

+ + + + + + + + + + + + + + + + + + + + + +
Reward
Win+1
Lose-1
Draw0
+

Termination

+

Termination happens when all 64 (= 8 x 8) playable squares are filled.

+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/play2048/index.html b/play2048/index.html new file mode 100644 index 000000000..eec25dabb --- /dev/null +++ b/play2048/index.html @@ -0,0 +1,952 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + 2048 - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

2048

+
+
+
+

+ +

+
+
+

+ +

+
+
+
+

Usage

+
import pgx
+
+env = pgx.make("2048")
+
+

or you can directly load Play2048 class

+
from pgx.paly2048 import Play2048
+
+env = Play2048()
+
+

Description

+
+

2048 ...

+

Wikipedia

+
+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players1
Number of actions4
Observation shape(4, 4, 31)
Observation typebool
Rewards{0, 2, 4, ...}
+

Observation

+

Our obseervation design basically follows [Antonoglou+22]:

+
+

In our 2048 experiments we used a binary representation of the observation as an input to our model. +Specifically, the 4 × 4 board was flattened into a single vector of size 16, and a binary representation +of 31 bits for each number was obtained, for a total size of 496 numbers.

+
+

However, instaead of 496-d flat vector, we employ (4, 4, 31) vector.

+ + + + + + + + + + + + + +
IndexDescription
[i, j, b]represents that square (i, j) has a tile of 2 ^ b if b > 0
+

Action

+

Each action corresnponds to 0 (left), 1 (up), 2 (right), 3 (down).

+

Rewards

+

Sum of merged tiles.

+

Termination

+

If all squares are filled with tiles and no legal actions are available, the game terminates.

+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+

Reference

+
    +
  1. [Antonoglou+22] "Planning in Stochastic Environments with a Learned Modell", ICLR
  2. +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/requirements.in b/requirements.in new file mode 100644 index 000000000..bec300ca6 --- /dev/null +++ b/requirements.in @@ -0,0 +1,3 @@ +mkdocs +mkdocstrings[python] +markdown-include diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 000000000..a12290afc --- /dev/null +++ b/requirements.txt @@ -0,0 +1,63 @@ +# +# This file is autogenerated by pip-compile with Python 3.9 +# by the following command: +# +# pip-compile requirements.in +# +click==8.1.3 + # via mkdocs +colorama==0.4.6 + # via griffe +ghp-import==2.1.0 + # via mkdocs +griffe==0.27.1 + # via mkdocstrings-python +jinja2==3.1.2 + # via + # mkdocs + # mkdocstrings +markdown==3.3.7 + # via + # markdown-include + # mkdocs + # mkdocs-autorefs + # mkdocstrings + # pymdown-extensions +markdown-include==0.8.1 + # via -r requirements.in +markupsafe==2.1.2 + # via + # jinja2 + # mkdocstrings +mergedeep==1.3.4 + # via mkdocs +mkdocs==1.4.2 + # via + # -r requirements.in + # mkdocs-autorefs + # mkdocstrings +mkdocs-autorefs==0.4.1 + # via mkdocstrings +mkdocstrings[python]==0.21.2 + # via + # -r requirements.in + # mkdocstrings-python +mkdocstrings-python==0.9.0 + # via mkdocstrings +packaging==23.1 + # via mkdocs +pymdown-extensions==10.0 + # via mkdocstrings +python-dateutil==2.8.2 + # via ghp-import +pyyaml==6.0 + # via + # mkdocs + # pymdown-extensions + # pyyaml-env-tag +pyyaml-env-tag==0.1 + # via mkdocs +six==1.16.0 + # via python-dateutil +watchdog==3.0.0 + # via mkdocs diff --git a/search/search_index.json b/search/search_index.json new file mode 100644 index 000000000..b31b4ac8e --- /dev/null +++ b/search/search_index.json @@ -0,0 +1 @@ +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":""},{"location":"#pgx-documentation","title":"Pgx Documentation","text":"
import jax\nimport pgx\nenv = pgx.make(\"go_19x19\")\ninit = jax.jit(jax.vmap(env.init))\nstep = jax.jit(jax.vmap(env.step))\nbatch_size = 1024\nkeys = jax.random.split(jax.random.PRNGKey(42), batch_size)\nstate = init(keys)  # vectorized states\nwhile not (state.terminated | state.truncated).all():\naction = model(state.current_player, state.observation, state.legal_action_mask)\nstate = step(state, action)  # state.reward (2,)\n
"},{"location":"animal_shogi/","title":"AnimalShogi","text":"darklight

"},{"location":"animal_shogi/#usage","title":"Usage","text":"
import pgx\nenv = pgx.make(\"animal_shogi\")\n

or you can directly load AnimalShogi class

from pgx.animal_shogi import AnimalShogi\nenv = AnimalShogi()\n
"},{"location":"animal_shogi/#description","title":"Description","text":"

Animal Shogi (D\u014dbutsu sh\u014dgi) is a variant of shogi primarily developed for children. It consists of a 3x4 board and four types of pieces (five including promoted pieces). One of the rule differences from regular shogi is the Try Rule, where entering the opponent's territory with the king leads to victory.

See also Wikipedia

"},{"location":"animal_shogi/#specs","title":"Specs","text":"Name Value Version v0 Number of players 2 Number of actions 132 Observation shape (4, 3, 194) Observation type float Rewards {-1, 0, 1}"},{"location":"animal_shogi/#observation","title":"Observation","text":"Index Description [:, :, 0:5] my pieces on board [:, :, 5:10] opponent's pieces on board [:, :, 10:16] my hands [:, :, 16:22] opponent's hands [:, :, 22:24] repetitions ... ... [:, :, 193] player_id's turn' [:, :, 194] Elapsed timesteps (normalized to 1)"},{"location":"animal_shogi/#action","title":"Action","text":"

Uses AlphaZero like action label:

  • 132 labels
  • Move: 8 x 12 (direction) x (source square)
  • Drop: 3 x 12 (drop piece type) x (destination square)
"},{"location":"animal_shogi/#rewards","title":"Rewards","text":"

Non-zero rewards are given only at the terminal states. The reward at terminal state is described in this table:

Reward Win +1 Lose -1 Draw 0"},{"location":"animal_shogi/#termination","title":"Termination","text":"

Termination happens when

  1. If either player's king is checkmated, or
  2. if either king enters the opponent's territory (farthest rank)
  3. If the same position occurs three times.
  4. If 250 moves have passed (a unique rule in Pgx).

In cases 3 and 4, the game is declared a draw.

"},{"location":"animal_shogi/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"api/","title":"Pgx API","text":"

This is the list of all public APIs of Pgx. Two important components in Pgx are State and Env.

Naming convention of EnvId

Hyphen - is used to represent that there is a different original game source (e.g., MinAtar), and underscore - is used for the other cases.

"},{"location":"api/#pgx.State","title":"pgx.State","text":"

Bases: abc.ABC

Base state class of all Pgx game environments. Basically an immutable (frozen) dataclass. A basic usage is generating via Env.init:

state = env.init(jax.random.PRNGKey(0))\n

and Env.step receives and returns this state class:

state = env.step(state, action)\n

Serialization via flax.struct.serialization is supported. There are 6 common attributes over all games:

Attributes:

Name Type Description current_player jnp.ndarray

id of agent to play. Note that this does NOT represent the turn (e.g., black/white in Go). This ID is consistent over the parallel vmapped states.

observation jnp.ndarray

observation for the current state. Env.observe is called to compute.

rewards jnp.ndarray

the i-th element indicates the intermediate reward for the agent with player-id i. If Env.step is called for a terminal state, the following state.rewards is zero for all players.

terminated jnp.ndarray

denotes that the state is terminal state. Note that some environments (e.g., Go) have an max_termination_steps parameter inside and will terminate within a limited number of states (following AlphaGo).

truncated jnp.ndarray

indicates that the episode ends with the reason other than termination. Note that current Pgx environments do not invoke truncation but users can use TimeLimit wrapper to truncate the environment. In Pgx environments, some MinAtar games may not terminate within a finite timestep. However, the other environments are supposed to terminate within a finite timestep with probability one.

legal_action_mask jnp.ndarray

Boolean array of legal actions. If illegal action is taken, the game will terminate immediately with the penalty to the palyer.

Source code in pgx/v1.py
@dataclass\nclass State(abc.ABC):\n\"\"\"Base state class of all Pgx game environments. Basically an immutable (frozen) dataclass.\n    A basic usage is generating via `Env.init`:\n        state = env.init(jax.random.PRNGKey(0))\n    and `Env.step` receives and returns this state class:\n        state = env.step(state, action)\n    Serialization via `flax.struct.serialization` is supported.\n    There are 6 common attributes over all games:\n    Attributes:\n        current_player (jnp.ndarray): id of agent to play.\n            Note that this does NOT represent the turn (e.g., black/white in Go).\n            This ID is consistent over the parallel vmapped states.\n        observation (jnp.ndarray): observation for the current state.\n            `Env.observe` is called to compute.\n        rewards (jnp.ndarray): the `i`-th element indicates the intermediate reward for\n            the agent with player-id `i`. If `Env.step` is called for a terminal state,\n            the following `state.rewards` is zero for all players.\n        terminated (jnp.ndarray): denotes that the state is terminal state. Note that\n            some environments (e.g., Go) have an `max_termination_steps` parameter inside\n            and will terminate within a limited number of states (following AlphaGo).\n        truncated (jnp.ndarray): indicates that the episode ends with the reason other than termination.\n            Note that current Pgx environments do not invoke truncation but users can use `TimeLimit` wrapper\n            to truncate the environment. In Pgx environments, some MinAtar games may not terminate within a finite timestep.\n            However, the other environments are supposed to terminate within a finite timestep with probability one.\n        legal_action_mask (jnp.ndarray): Boolean array of legal actions. If illegal action is taken,\n            the game will terminate immediately with the penalty to the palyer.\n    \"\"\"\ncurrent_player: jnp.ndarray\nobservation: jnp.ndarray\nrewards: jnp.ndarray\nterminated: jnp.ndarray\ntruncated: jnp.ndarray\nlegal_action_mask: jnp.ndarray\n# NOTE: _rng_key is\n#   - used for stochastic env and auto reset\n#   - updated only when actually used\n#   - supposed NOT to be used by agent\n_rng_key: jax.random.KeyArray\n_step_count: jnp.ndarray\n@property\n@abc.abstractmethod\ndef env_id(self) -> EnvId:\n\"\"\"Environment id (e.g. \"go_19x19\")\"\"\"\n...\ndef _repr_html_(self) -> str:\nreturn self.to_svg()\ndef to_svg(\nself,\n*,\ncolor_theme: Optional[Literal[\"light\", \"dark\"]] = None,\nscale: Optional[float] = None,\n) -> str:\n\"\"\"Return SVG string. Useful for visualization in notebook.\n        Args:\n            color_theme (Optional[Literal[\"light\", \"dark\"]]): xxx see also global config.\n            scale (Optional[float]): change image size. Default(None) is 1.0\n        Returns:\n            str: SVG string\n        \"\"\"\nfrom pgx._src.visualizer import Visualizer\nv = Visualizer(color_theme=color_theme, scale=scale)\nreturn v.get_dwg(states=self).tostring()\ndef save_svg(\nself,\nfilename,\n*,\ncolor_theme: Optional[Literal[\"light\", \"dark\"]] = None,\nscale: Optional[float] = None,\n) -> None:\n\"\"\"Save the entire state (not observation) to a file.\n        The filename must end with `.svg`\n        Args:\n            color_theme (Optional[Literal[\"light\", \"dark\"]]): xxx see also global config.\n            scale (Optional[float]): change image size. Default(None) is 1.0\n        Returns:\n            None\n        \"\"\"\nfrom pgx._src.visualizer import save_svg\nsave_svg(self, filename, color_theme=color_theme, scale=scale)\n
"},{"location":"api/#pgx.v1.State.env_id","title":"env_id: EnvId property abstractmethod","text":"

Environment id (e.g. \"go_19x19\")

"},{"location":"api/#pgx.v1.State.save_svg","title":"save_svg(filename, *, color_theme=None, scale=None)","text":"

Save the entire state (not observation) to a file. The filename must end with .svg

Parameters:

Name Type Description Default color_theme Optional[Literal['light', 'dark']]

xxx see also global config.

None scale Optional[float]

change image size. Default(None) is 1.0

None

Returns:

Type Description None

None

Source code in pgx/v1.py
def save_svg(\nself,\nfilename,\n*,\ncolor_theme: Optional[Literal[\"light\", \"dark\"]] = None,\nscale: Optional[float] = None,\n) -> None:\n\"\"\"Save the entire state (not observation) to a file.\n    The filename must end with `.svg`\n    Args:\n        color_theme (Optional[Literal[\"light\", \"dark\"]]): xxx see also global config.\n        scale (Optional[float]): change image size. Default(None) is 1.0\n    Returns:\n        None\n    \"\"\"\nfrom pgx._src.visualizer import save_svg\nsave_svg(self, filename, color_theme=color_theme, scale=scale)\n
"},{"location":"api/#pgx.v1.State.to_svg","title":"to_svg(*, color_theme=None, scale=None)","text":"

Return SVG string. Useful for visualization in notebook.

Parameters:

Name Type Description Default color_theme Optional[Literal['light', 'dark']]

xxx see also global config.

None scale Optional[float]

change image size. Default(None) is 1.0

None

Returns:

Name Type Description str str

SVG string

Source code in pgx/v1.py
def to_svg(\nself,\n*,\ncolor_theme: Optional[Literal[\"light\", \"dark\"]] = None,\nscale: Optional[float] = None,\n) -> str:\n\"\"\"Return SVG string. Useful for visualization in notebook.\n    Args:\n        color_theme (Optional[Literal[\"light\", \"dark\"]]): xxx see also global config.\n        scale (Optional[float]): change image size. Default(None) is 1.0\n    Returns:\n        str: SVG string\n    \"\"\"\nfrom pgx._src.visualizer import Visualizer\nv = Visualizer(color_theme=color_theme, scale=scale)\nreturn v.get_dwg(states=self).tostring()\n
"},{"location":"api/#pgx.Env","title":"pgx.Env","text":"

Bases: abc.ABC

Environment class API.

Example usage

env: Env = pgx.make(\"tic_tac_toe\")\nstate = env.init(jax.random.PRNGKey(0))\naction = jax.random.int32(4)\nstate = env.step(state, action)\n
Source code in pgx/v1.py
class Env(abc.ABC):\n\"\"\"Environment class API.\n    !!! example \"Example usage\"\n        ```py\n        env: Env = pgx.make(\"tic_tac_toe\")\n        state = env.init(jax.random.PRNGKey(0))\n        action = jax.random.int32(4)\n        state = env.step(state, action)\n        ```\n    \"\"\"\ndef __init__(self):\n...\ndef init(self, key: jax.random.KeyArray) -> State:\n\"\"\"Return the initial state. Note that no internal state of\n        environment changes.\n        Args:\n            key: pseudo-random generator key in JAX\n        Returns:\n            State: initial state of environment\n        \"\"\"\nkey, subkey = jax.random.split(key)\nstate = self._init(subkey)\nstate = state.replace(_rng_key=key)  # type: ignore\nobservation = self.observe(state, state.current_player)\nreturn state.replace(observation=observation)  # type: ignore\ndef step(self, state: State, action: jnp.ndarray) -> State:\n\"\"\"Step function.\"\"\"\nis_illegal = ~state.legal_action_mask[action]\ncurrent_player = state.current_player\n# If the state is already terminated or truncated, environment does not take usual step,\n# but return the same state with zero-rewards for all players\nstate = jax.lax.cond(\n(state.terminated | state.truncated),\nlambda: state.replace(rewards=jnp.zeros_like(state.rewards)),  # type: ignore\nlambda: self._step(state.replace(_step_count=state._step_count + 1), action),  # type: ignore\n)\n# Taking illegal action leads to immediate game terminal with negative reward\nstate = jax.lax.cond(\nis_illegal,\nlambda: self._step_with_illegal_action(state, current_player),\nlambda: state,\n)\n# All legal_action_mask elements are **TRUE** at terminal state\n# This is to avoid zero-division error when normalizing action probability\n# Taking any action at terminal state does not give any effect to the state\nstate = jax.lax.cond(\nstate.terminated,\nlambda: state.replace(  # type: ignore\nlegal_action_mask=jnp.ones_like(state.legal_action_mask)\n),\nlambda: state,\n)\nobservation = self.observe(state, state.current_player)\nstate = state.replace(observation=observation)  # type: ignore\nreturn state\ndef observe(self, state: State, player_id: jnp.ndarray) -> jnp.ndarray:\n\"\"\"Observation function.\"\"\"\nobs = self._observe(state, player_id)\nreturn jax.lax.stop_gradient(obs)\n@abc.abstractmethod\ndef _init(self, key: jax.random.KeyArray) -> State:\n\"\"\"Implement game-specific init function here.\"\"\"\n...\n@abc.abstractmethod\ndef _step(self, state, action) -> State:\n\"\"\"Implement game-specific step function here.\"\"\"\n...\n@abc.abstractmethod\ndef _observe(self, state: State, player_id: jnp.ndarray) -> jnp.ndarray:\n\"\"\"Implement game-specific observe function here.\"\"\"\n...\n@property\n@abc.abstractmethod\ndef id(self) -> EnvId:\n\"\"\"Environment id.\"\"\"\n...\n@property\n@abc.abstractmethod\ndef version(self) -> str:\n\"\"\"Environment version. Updated when behavior, parameter, or API is changed.\n        Refactoring or speeding up without any expected behavior changes will NOT update the version number.\n        \"\"\"\n...\n@property\n@abc.abstractmethod\ndef num_players(self) -> int:\n\"\"\"Number of players (e.g., 2 in Tic-tac-toe)\"\"\"\n...\n@property\ndef num_actions(self) -> int:\n\"\"\"Return the size of action space (e.g., 9 in Tic-tac-toe)\"\"\"\nstate = self.init(jax.random.PRNGKey(0))\nreturn int(state.legal_action_mask.shape[0])\n@property\ndef observation_shape(self) -> Tuple[int, ...]:\n\"\"\"Return the matrix shape of observation\"\"\"\nstate = self.init(jax.random.PRNGKey(0))\nobs = self._observe(state, state.current_player)\nreturn obs.shape\n@property\ndef _illegal_action_penalty(self) -> float:\n\"\"\"Negative reward given when illegal action is selected.\"\"\"\nreturn -1.0\ndef _step_with_illegal_action(\nself, state: State, loser: jnp.ndarray\n) -> State:\npenalty = self._illegal_action_penalty\nreward = (\njnp.ones_like(state.rewards)\n* (-1 * penalty)\n* (self.num_players - 1)\n)\nreward = reward.at[loser].set(penalty)\nreturn state.replace(rewards=reward, terminated=TRUE)  # type: ignore\n
"},{"location":"api/#pgx.v1.Env.id","title":"id: EnvId property abstractmethod","text":"

Environment id.

"},{"location":"api/#pgx.v1.Env.num_actions","title":"num_actions: int property","text":"

Return the size of action space (e.g., 9 in Tic-tac-toe)

"},{"location":"api/#pgx.v1.Env.num_players","title":"num_players: int property abstractmethod","text":"

Number of players (e.g., 2 in Tic-tac-toe)

"},{"location":"api/#pgx.v1.Env.observation_shape","title":"observation_shape: Tuple[int, ...] property","text":"

Return the matrix shape of observation

"},{"location":"api/#pgx.v1.Env.version","title":"version: str property abstractmethod","text":"

Environment version. Updated when behavior, parameter, or API is changed. Refactoring or speeding up without any expected behavior changes will NOT update the version number.

"},{"location":"api/#pgx.v1.Env.init","title":"init(key)","text":"

Return the initial state. Note that no internal state of environment changes.

Parameters:

Name Type Description Default key jax.random.KeyArray

pseudo-random generator key in JAX

required

Returns:

Name Type Description State State

initial state of environment

Source code in pgx/v1.py
def init(self, key: jax.random.KeyArray) -> State:\n\"\"\"Return the initial state. Note that no internal state of\n    environment changes.\n    Args:\n        key: pseudo-random generator key in JAX\n    Returns:\n        State: initial state of environment\n    \"\"\"\nkey, subkey = jax.random.split(key)\nstate = self._init(subkey)\nstate = state.replace(_rng_key=key)  # type: ignore\nobservation = self.observe(state, state.current_player)\nreturn state.replace(observation=observation)  # type: ignore\n
"},{"location":"api/#pgx.v1.Env.observe","title":"observe(state, player_id)","text":"

Observation function.

Source code in pgx/v1.py
def observe(self, state: State, player_id: jnp.ndarray) -> jnp.ndarray:\n\"\"\"Observation function.\"\"\"\nobs = self._observe(state, player_id)\nreturn jax.lax.stop_gradient(obs)\n
"},{"location":"api/#pgx.v1.Env.step","title":"step(state, action)","text":"

Step function.

Source code in pgx/v1.py
def step(self, state: State, action: jnp.ndarray) -> State:\n\"\"\"Step function.\"\"\"\nis_illegal = ~state.legal_action_mask[action]\ncurrent_player = state.current_player\n# If the state is already terminated or truncated, environment does not take usual step,\n# but return the same state with zero-rewards for all players\nstate = jax.lax.cond(\n(state.terminated | state.truncated),\nlambda: state.replace(rewards=jnp.zeros_like(state.rewards)),  # type: ignore\nlambda: self._step(state.replace(_step_count=state._step_count + 1), action),  # type: ignore\n)\n# Taking illegal action leads to immediate game terminal with negative reward\nstate = jax.lax.cond(\nis_illegal,\nlambda: self._step_with_illegal_action(state, current_player),\nlambda: state,\n)\n# All legal_action_mask elements are **TRUE** at terminal state\n# This is to avoid zero-division error when normalizing action probability\n# Taking any action at terminal state does not give any effect to the state\nstate = jax.lax.cond(\nstate.terminated,\nlambda: state.replace(  # type: ignore\nlegal_action_mask=jnp.ones_like(state.legal_action_mask)\n),\nlambda: state,\n)\nobservation = self.observe(state, state.current_player)\nstate = state.replace(observation=observation)  # type: ignore\nreturn state\n
"},{"location":"api/#pgx.EnvId","title":"pgx.EnvId = Literal['2048', 'animal_shogi', 'backgammon', 'bridge_bidding', 'chess', 'connect_four', 'gardner_chess', 'go_9x9', 'go_19x19', 'hex', 'kuhn_poker', 'leduc_holdem', 'minatar-asterix', 'minatar-breakout', 'minatar-freeway', 'minatar-seaquest', 'minatar-space_invaders', 'othello', 'shogi', 'sparrow_mahjong', 'tic_tac_toe'] module-attribute","text":""},{"location":"api/#pgx.make","title":"pgx.make(env_id)","text":"

Load the specified environment.

Example usage

env = pgx.make(\"tic_tac_toe\")\n

BridgeBidding environment

BridgeBidding environment requires the domain knowledge of bridge game. So we forbid users to load the bridge environment by make(\"bridge_bidding\"). Use BridgeBidding class directly by from pgx.bridge_bidding import BridgeBidding.

Source code in pgx/v1.py
def make(env_id: EnvId):  # noqa: C901\n\"\"\"Load the specified environment.\n    !!! example \"Example usage\"\n        ```py\n        env = pgx.make(\"tic_tac_toe\")\n        ```\n    !!! note \"`BridgeBidding` environment\"\n        `BridgeBidding` environment requires the domain knowledge of bridge game.\n        So we forbid users to load the bridge environment by `make(\"bridge_bidding\")`.\n        Use `BridgeBidding` class directly by `from pgx.bridge_bidding import BridgeBidding`.\n    \"\"\"\n# NOTE: BridgeBidding environment requires the domain knowledge of bridge\n# So we forbid users to load the bridge environment by `make(\"bridge_bidding\")`.\nif env_id == \"2048\":\nfrom pgx.play2048 import Play2048\nreturn Play2048()\nelif env_id == \"animal_shogi\":\nfrom pgx.animal_shogi import AnimalShogi\nreturn AnimalShogi()\nelif env_id == \"backgammon\":\nfrom pgx.backgammon import Backgammon\nreturn Backgammon()\nelif env_id == \"chess\":\nfrom pgx.chess import Chess\nreturn Chess()\nelif env_id == \"connect_four\":\nfrom pgx.connect_four import ConnectFour\nreturn ConnectFour()\nelif env_id == \"gardner_chess\":\nfrom pgx.gardner_chess import GardnerChess\nreturn GardnerChess()\nelif env_id == \"go_9x9\":\nfrom pgx.go import Go\nreturn Go(size=9, komi=7.5)\nelif env_id == \"go_19x19\":\nfrom pgx.go import Go\nreturn Go(size=19, komi=7.5)\nelif env_id == \"hex\":\nfrom pgx.hex import Hex\nreturn Hex()\nelif env_id == \"kuhn_poker\":\nfrom pgx.kuhn_poker import KuhnPoker\nreturn KuhnPoker()\nelif env_id == \"leduc_holdem\":\nfrom pgx.leduc_holdem import LeducHoldem\nreturn LeducHoldem()\nelif env_id == \"minatar-asterix\":\ntry:\nfrom pgx_minatar.asterix import MinAtarAsterix  # type: ignore\nreturn MinAtarAsterix()\nexcept ModuleNotFoundError:\nprint(\n'\"minatar-asterix\" environment is provided as a separate plugin of Pgx.\\nPlease run `$ pip install pgx-minatar` to use this environment in Pgx.',\nfile=sys.stderr,\n)\nsys.exit(1)\nelif env_id == \"minatar-breakout\":\ntry:\nfrom pgx_minatar.breakout import MinAtarBreakout  # type: ignore\nreturn MinAtarBreakout()\nexcept ModuleNotFoundError:\nprint(\n'\"minatar-breakout\" environment is provided as a separate plugin of Pgx.\\nPlease run `$ pip install pgx-minatar` to use this environment in Pgx.',\nfile=sys.stderr,\n)\nsys.exit(1)\nelif env_id == \"minatar-freeway\":\ntry:\nfrom pgx_minatar.freeway import MinAtarFreeway  # type: ignore\nreturn MinAtarFreeway()\nexcept ModuleNotFoundError:\nprint(\n'\"minatar-freeway\" environment is provided as a separate plugin of Pgx.\\nPlease run `$ pip install pgx-minatar` to use this environment in Pgx.',\nfile=sys.stderr,\n)\nsys.exit(1)\nelif env_id == \"minatar-seaquest\":\ntry:\nfrom pgx_minatar.seaquest import MinAtarSeaquest  # type: ignore\nreturn MinAtarSeaquest()\nexcept ModuleNotFoundError:\nprint(\n'\"minatar-seaquest\" environment is provided as a separate plugin of Pgx.\\nPlease run `$ pip install pgx-minatar` to use this environment in Pgx.',\nfile=sys.stderr,\n)\nsys.exit(1)\nelif env_id == \"minatar-space_invaders\":\ntry:\nfrom pgx_minatar.space_invaders import (  # type: ignore\nMinAtarSpaceInvaders,\n)\nreturn MinAtarSpaceInvaders()\nexcept ModuleNotFoundError:\nprint(\n'\"minatar-space_invaders\" environment is provided as a separate plugin of Pgx.\\nPlease run `$ pip install pgx-minatar` to use this environment in Pgx.',\nfile=sys.stderr,\n)\nsys.exit(1)\nelif env_id == \"othello\":\nfrom pgx.othello import Othello\nreturn Othello()\nelif env_id == \"shogi\":\nfrom pgx.shogi import Shogi\nreturn Shogi()\nelif env_id == \"sparrow_mahjong\":\nfrom pgx.sparrow_mahjong import SparrowMahjong\nreturn SparrowMahjong()\nelif env_id == \"tic_tac_toe\":\nfrom pgx.tic_tac_toe import TicTacToe\nreturn TicTacToe()\nelse:\nenvs = \"\\n\".join(available_envs())\nraise ValueError(\nf\"Wrong env_id '{env_id}' is passed. Available ids are: \\n{envs}\"\n)\n
"},{"location":"api/#pgx.available_envs","title":"pgx.available_envs()","text":"

List up all environment id available in pgx.make function.

Example usage

pgx.available_envs()\n('2048', 'animal_shogi', 'backgammon', 'chess', 'connect_four', 'go_9x9', 'go_19x19', 'hex', 'kuhn_poker', 'leduc_holdem', 'minatar-asterix', 'minatar-breakout', 'minatar-freeway', 'minatar-seaquest', 'minatar-space_invaders', 'othello', 'shogi', 'sparrow_mahjong', 'tic_tac_toe')\n

BridgeBidding environment

BridgeBidding environment requires the domain knowledge of bridge game. So we forbid users to load the bridge environment by make(\"bridge_bidding\"). Use BridgeBidding class directly by from pgx.bridge_bidding import BridgeBidding.

Source code in pgx/v1.py
def available_envs() -> Tuple[EnvId, ...]:\n\"\"\"List up all environment id available in `pgx.make` function.\n    !!! example \"Example usage\"\n        ```py\n        pgx.available_envs()\n        ('2048', 'animal_shogi', 'backgammon', 'chess', 'connect_four', 'go_9x9', 'go_19x19', 'hex', 'kuhn_poker', 'leduc_holdem', 'minatar-asterix', 'minatar-breakout', 'minatar-freeway', 'minatar-seaquest', 'minatar-space_invaders', 'othello', 'shogi', 'sparrow_mahjong', 'tic_tac_toe')\n        ```\n    !!! note \"`BridgeBidding` environment\"\n        `BridgeBidding` environment requires the domain knowledge of bridge game.\n        So we forbid users to load the bridge environment by `make(\"bridge_bidding\")`.\n        Use `BridgeBidding` class directly by `from pgx.bridge_bidding import BridgeBidding`.\n    \"\"\"\ngames = get_args(EnvId)\ngames = tuple(filter(lambda x: x != \"bridge_bidding\", games))\nreturn games\n
"},{"location":"api/#pgx.set_visualization_config","title":"pgx.set_visualization_config(*, color_theme='light', scale=1.0, frame_duration_seconds=0.2)","text":"Source code in pgx/_src/visualizer.py
def set_visualization_config(\n*,\ncolor_theme: ColorTheme = \"light\",\nscale: float = 1.0,\nframe_duration_seconds: float = 0.2,\n):\nglobal_config.color_theme = color_theme\nglobal_config.scale = scale\nglobal_config.frame_duration_seconds = frame_duration_seconds\n
"},{"location":"api/#pgx.save_svg","title":"pgx.save_svg(state, filename, *, color_theme=None, scale=None)","text":"Source code in pgx/_src/visualizer.py
def save_svg(\nstate: State,\nfilename: Union[str, Path],\n*,\ncolor_theme: Optional[Literal[\"light\", \"dark\"]] = None,\nscale: Optional[float] = None,\n) -> None:\nassert str(filename).endswith(\".svg\")\nif state.env_id.startswith(\"minatar\"):\nstate.save_svg(filename=filename)\nelse:\nv = Visualizer(color_theme=color_theme, scale=scale)\nv.get_dwg(states=state).saveas(filename)\n
"},{"location":"api/#pgx.save_svg_animation","title":"pgx.save_svg_animation(states, filename, *, color_theme=None, scale=None, frame_duration_seconds=None)","text":"Source code in pgx/_src/visualizer.py
def save_svg_animation(\nstates: Sequence[State],\nfilename: Union[str, Path],\n*,\ncolor_theme: Optional[Literal[\"light\", \"dark\"]] = None,\nscale: Optional[float] = None,\nframe_duration_seconds: Optional[float] = None,\n) -> None:\nassert not states[0].env_id.startswith(\n\"minatar\"\n), \"MinAtar does not support svg animation.\"\nassert str(filename).endswith(\".svg\")\nv = Visualizer(color_theme=color_theme, scale=scale)\nif frame_duration_seconds is None:\nframe_duration_seconds = global_config.frame_duration_seconds\nframe_groups = []\ndwg = None\nfor i, state in enumerate(states):\ndwg = v.get_dwg(states=state)\nassert (\nlen(\n[\ne\nfor e in dwg.elements\nif type(e) == svgwrite.container.Group\n]\n)\n== 1\n), \"Drawing must contain only one group\"\ngroup: svgwrite.container.Group = dwg.elements[-1]\ngroup[\"id\"] = f\"_fr{i:x}\"  # hex frame number\ngroup[\"class\"] = \"frame\"\nframe_groups.append(group)\nassert dwg is not None\ndel dwg.elements[-1]\ntotal_seconds = frame_duration_seconds * len(frame_groups)\nstyle = f\".frame{{visibility:hidden; animation:{total_seconds}s linear _k infinite;}}\"\nstyle += f\"@keyframes _k{{0%,{100/len(frame_groups)}%{{visibility:visible}}{100/len(frame_groups) * 1.000001}%,100%{{visibility:hidden}}}}\"\nfor i, group in enumerate(frame_groups):\ndwg.add(group)\nstyle += (\nf\"#{group['id']}{{animation-delay:{i * frame_duration_seconds}s}}\"\n)\ndwg.defs.add(svgwrite.container.Style(content=style))\ndwg.saveas(filename)\n
"},{"location":"api/#pgx.BaselineModelId","title":"pgx.BaselineModelId = Literal['animal_shogi_v0', 'gardner_chess_v0', 'go_9x9_v0', 'hex_v0', 'othello_v0'] module-attribute","text":""},{"location":"api/#pgx.make_baseline_model","title":"pgx.make_baseline_model(model_id, download_dir='baselines')","text":"Source code in pgx/_src/baseline.py
def make_baseline_model(\nmodel_id: BaselineModelId, download_dir: str = \"baselines\"\n):\nimport haiku as hk\ncreate_model_fn = _make_create_model_fn(model_id)\nmodel_args, model_params, model_state = _load_baseline_model(\nmodel_id, download_dir\n)\ndef forward_fn(x, is_eval=False):\nnet = create_model_fn(**model_args)\npolicy_out, value_out = net(\nx, is_training=not is_eval, test_local_stats=False\n)\nreturn policy_out, value_out\nforward = hk.without_apply_rng(hk.transform_with_state(forward_fn))\ndef apply(obs):\n(logits, value), _ = forward.apply(\nmodel_params, model_state, obs, is_eval=True\n)\nreturn logits, value\nreturn apply\n
"},{"location":"api/#pgx.v1_api_test","title":"pgx.v1_api_test(env, num=100)","text":"Source code in pgx/_src/api_test.py
def v1_api_test(env: Env, num: int = 100):\napi_test_single(env, num)\napi_test_batch(env, num)\n
"},{"location":"api_usage/","title":"Pgx API Usage","text":""},{"location":"api_usage/#example1-random-play","title":"Example.1: Random play","text":"
import jax\nimport jax.numpy as jnp\nimport pgx\nseed = 42\nbatch_size = 10\nkey = jax.random.PRNGKey(seed)\ndef act_randomly(rng_key, obs, mask):\n\"\"\"Ignore observation and choose randomly from legal actions\"\"\"\ndel obs\nprobs = mask / mask.sum()\nlogits = jnp.maximum(jnp.log(probs), jnp.finfo(probs.dtype).min)\nreturn jax.random.categorical(rng_key, logits=logits, axis=-1)\n# Load the environment\nenv = pgx.make(\"go_9x9\")\ninit_fn = jax.jit(jax.vmap(env.init))\nstep_fn = jax.jit(jax.vmap(env.step))\n# Initialize the states\nkey, subkey = jax.random.split(key)\nkeys = jax.random.split(subkey, batch_size)\nstate = init_fn(keys)\n# Run random simulation\nwhile not (state.terminated | state.truncated).all():\nkey, subkey = jax.random.split(key)\naction = act_randomly(subkey, state.observation, state.legal_action_mask)\nstate = step_fn(state, action)  # state.reward (2,)\n
"},{"location":"api_usage/#example2-random-agent-vs-baseline-model","title":"Example.2: Random agent vs Baseline model","text":"

This illustrative example helps to understand

  • How state.current_player is defined
  • How to access the reward of each player
  • How Env.step behaves against already terminated states
  • How to use baseline models probided by Pgx
import jax\nimport jax.numpy as jnp\nimport pgx\nfrom pgx.experimental.utils import act_randomly\nseed = 42\nbatch_size = 10\nkey = jax.random.PRNGKey(seed)\n# Prepare agent A and B\n#   Agent A: random player\n#   Agent B: baseline player provided by Pgx\nA = 0\nB = 1\n# Load the environment\nenv = pgx.make(\"go_9x9\")\ninit_fn = jax.jit(jax.vmap(env.init))\nstep_fn = jax.jit(jax.vmap(env.step))\n# Prepare baseline model\n# Note that it additionaly requires Haiku library ($ pip install dm-haiku)\nmodel_id = \"go_9x9_v0\"\nmodel = pgx.make_baseline_model(model_id)\n# Initialize the states\nkey, subkey = jax.random.split(key)\nkeys = jax.random.split(subkey, batch_size)\nstate = init_fn(keys)\nprint(f\"Game index: {jnp.arange(batch_size)}\")  #  [0 1 2 3 4 5 6 7 8 9]\nprint(f\"Black player: {state.current_player}\")  #  [1 1 0 1 0 0 1 1 1 1]\n# In other words\nprint(f\"A is black: {state.current_player == A}\")  # [False False  True False  True  True False False False False]\nprint(f\"B is black: {state.current_player == B}\")  # [ True  True False  True False False  True  True  True  True]\n# Run simulation\nR = state.rewards\nwhile not (state.terminated | state.truncated).all():\n# Action of random player A\nkey, subkey = jax.random.split(key)\naction_A = jax.jit(act_randomly)(subkey, state)\n# Greedy action of baseline model B\nlogits, value = model(state.observation)\naction_B = logits.argmax(axis=-1)\naction = jnp.where(state.current_player == A, action_A, action_B)\nstate = step_fn(state, action)\nR += state.rewards\nprint(f\"Return of agent A = {R[:, A]}\")  # [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]\nprint(f\"Return of agent B = {R[:, B]}\")  # [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n

Note that we can avoid to explicitly deal with the first batch dimension like [:, A] by using vmap later.

"},{"location":"backgammon/","title":"Backgammon","text":"darklight

"},{"location":"backgammon/#usage","title":"Usage","text":"
import pgx\nenv = pgx.make(\"backgammon\")\n

or you can directly load Backgammon class

from pgx.backgammon import Backgammon\nenv = Backgammon()\n
"},{"location":"backgammon/#description","title":"Description","text":"

Backgammon ...

Wikipedia

"},{"location":"backgammon/#specs","title":"Specs","text":"Name Value Version v0 Number of players 2 Number of actions 162 (= 6 * 26 + 6) Observation shape (34,) Observation type int Rewards {-3, -2, -1, 0, 1, 2, 3}"},{"location":"backgammon/#observation","title":"Observation","text":"

The first 28 observation dimensions follow [Antonoglou+22]:

An action in our implementation consists of 4 micro-actions, the same as the maximum number of dice a player can play at each turn. Each micro-action encodes the source position of a chip along with the value of the die used. We consider 26 possible source positions, with the 0-th position corresponding to a no-op, the 1st to retrieving a chip from the hit pile, and the remaining to selecting a chip in one of the 24 possible points. Each micro-action is encoded as a single integer with micro-action = src \u00b7 6 + die.

Index Description [:24] represents [24:28] represents [28:34] is one-hot vector of playable dice"},{"location":"backgammon/#action","title":"Action","text":"

...

"},{"location":"backgammon/#rewards","title":"Rewards","text":"

...

"},{"location":"backgammon/#termination","title":"Termination","text":"

...

"},{"location":"backgammon/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"backgammon/#reference","title":"Reference","text":"
  1. [Antonoglou+22] \"Planning in Stochastic Environments with a Learned Modell\", ICLR
"},{"location":"bridge_bidding/","title":"Bridge bidding","text":"darklight

"},{"location":"bridge_bidding/#description","title":"Description","text":"

TBA

"},{"location":"chess/","title":"Chess","text":"darklight

"},{"location":"chess/#usage","title":"Usage","text":"
import pgx\nenv = pgx.make(\"chess\")\n

or you can directly load Chess class

from pgx.chess import Chess\nenv = Chess()\n
"},{"location":"chess/#description","title":"Description","text":"

TBA

"},{"location":"chess/#rules","title":"Rules","text":"

TBA

"},{"location":"chess/#specs","title":"Specs","text":"Name Value Version v0 Number of players 2 Number of actions 4672 Observation shape (8, 8, 119) Observation type float Rewards {-1, 0, 1}"},{"location":"chess/#observation","title":"Observation","text":"

We follow the observation design of AlphaZero [Silver+18].

Index Description TBA TBA"},{"location":"chess/#action","title":"Action","text":"

TBA

"},{"location":"chess/#rewards","title":"Rewards","text":"

Non-zero rewards are given only at the terminal states. The reward at terminal state is described in this table:

Reward Win +1 Lose -1 Draw 0"},{"location":"chess/#termination","title":"Termination","text":"

Termination occurs when one of the following conditions are satisfied:

  1. checkmate
  2. stalemate
  3. no sufficient pieces to checkmate
  4. 50 halfmoves are elapsed without any captures or pawn moves
  5. 512 steps are elapsed (from AlphaZero [Silver+18])
"},{"location":"chess/#version-history","title":"Version History","text":"
  • v1 : Bug fix when castling by @HongruiTang in #983 (v1.1.0)
  • v0 : Initial release (v1.0.0)
"},{"location":"chess/#reference","title":"Reference","text":"
  • [Silver+18] \"A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play\" Science
"},{"location":"connect_four/","title":"Connect four","text":"darklight

"},{"location":"connect_four/#usage","title":"Usage","text":"
import pgx\nenv = pgx.make(\"connect_four\")\n

or you can directly load ConnectFour class

from pgx.connect_four import ConnectFour\nenv = ConnectFour()\n
"},{"location":"connect_four/#description","title":"Description","text":"

Connect Four is a two-player connection rack game, in which the players choose a color and then take turns dropping colored tokens into a seven-column, six-row vertically suspended grid. The pieces fall straight down, occupying the lowest available space within the column. The objective of the game is to be the first to form a horizontal, vertical, or diagonal line of four of one's own tokens.

Wikipedia

"},{"location":"connect_four/#specs","title":"Specs","text":"Name Value Version v0 Number of players 2 Number of actions 7 Observation shape (6, 7, 2) Observation type bool Rewards {-1, 0, 1}"},{"location":"connect_four/#observation","title":"Observation","text":"Index Description [:, :, 0] represents (6, 7) squares filled by the current player [:, :, 1] represents (6, 7) squares filled by the opponent player of current player"},{"location":"connect_four/#action","title":"Action","text":"

Each action represents the column index the player drops the token to.

"},{"location":"connect_four/#rewards","title":"Rewards","text":"

Non-zero rewards are given only at the terminal states. The reward at terminal state is described in this table:

Reward Win +1 Lose -1 Draw 0"},{"location":"connect_four/#termination","title":"Termination","text":"

Termination happens when

  1. either one player places four of their tokens in a row (horizontally, vertically, or diagonally), or
  2. all 42 (= 6 x 7) squares are filled.
"},{"location":"connect_four/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"gardner_chess/","title":"Gardner chess","text":"darklight

"},{"location":"gardner_chess/#usage","title":"Usage","text":"
import pgx\nenv = pgx.make(\"gardner_chess\")\n

or you can directly load GardnerChess class

from pgx.gardner_chess import GardnerChess\nenv = GardnerChess()\n
"},{"location":"gardner_chess/#description","title":"Description","text":"

TBA

"},{"location":"gardner_chess/#rules","title":"Rules","text":"

TBA

"},{"location":"gardner_chess/#specs","title":"Specs","text":"Name Value Version v0 Number of players 2 Number of actions 1225 Observation shape (5, 5, 115) Observation type float Rewards {-1, 0, 1}"},{"location":"gardner_chess/#observation","title":"Observation","text":"

We follow the observation design of AlphaZero [Silver+18].

Index Description TBA TBA"},{"location":"gardner_chess/#action","title":"Action","text":"

TBA

"},{"location":"gardner_chess/#rewards","title":"Rewards","text":"

Non-zero rewards are given only at the terminal states. The reward at terminal state is described in this table:

Reward Win +1 Lose -1 Draw 0"},{"location":"gardner_chess/#termination","title":"Termination","text":"

Termination occurs when one of the following conditions are satisfied:

  1. checkmate
  2. stalemate
  3. no sufficient pieces to checkmate
  4. 50 halfmoves are elapsed without any captures or pawn moves
  5. 256 steps are elapsed (512 in full-size chess experiments in AlphaZero [Silver+18])
"},{"location":"gardner_chess/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"gardner_chess/#reference","title":"Reference","text":"
  • [Silver+18] \"A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play\" Science
"},{"location":"go/","title":"Go","text":"darklight

"},{"location":"go/#usage","title":"Usage","text":"
import pgx\nenv = pgx.make(\"go_19x19\")  # or \"go_9x9\"\n

or you can directly load Go class

from pgx.go import Go\nenv = Go(size=19, komi=6.5)\n
"},{"location":"go/#description","title":"Description","text":"

Go is an abstract strategy board game for two players in which the aim is to surround more territory than the opponent. The game was invented in China more than 2,500 years ago and is believed to be the oldest board game continuously played to the present day.

Wikipedia

"},{"location":"go/#rules","title":"Rules","text":"

The rule implemented in Pgx follows Tromp-Taylor Rules.

Komi

By default, we use 6.5. Users can set different komi at Go class constructor.

Ko

On PSK implementations.

Tromp-Taylor rule employ PSK. However, implementing strict PSK is inefficient because

  • Simulator has to store all previous board (or hash) history, and
  • Agent also has to remember all previous board to avoid losing by PSK

As PSK rarely happens, as far as our best knowledge, it is usual to compromise in PSK implementations. For example,

  • OpenSpiel employs SSK (instead of PSK) for computing legal actions, and if PSK action happened, the game ends with tie.
    • Pros: Detect all PSK actions
    • Cons: Agent cannot know why the game ends with tie (if the same board is too old)
  • PettingZoo employs SSK for legal actions, and ignores even if PSK action happened.
    • Pros: Simple
    • Cons: PSK is totally ignored

Note that the strict rule is \"PSK for legal actions, and PSK action leads to immediate lose.\" So, we also compromise at this point, our approach is

  • Pgx employs SSK for legal actions, PSK is approximated by up to 8-steps before board, and approximate PSK action leads to immediate lose
    • Pros: Agent may be able to avoid PSK (as it observes board history up to 8-steps in AlphaGo Zero feature)
    • Cons: Ignoring the old same boards

Anyway, we believe it's effect is very small as PSK rarely happens, especially in 19x19 board.

"},{"location":"go/#specs","title":"Specs","text":"

Let N be the board size (e.g., 19).

Name Value Version v0 Number of players 2 Number of actions N x N + 1 Observation shape (N, N, 17) Observation type bool Rewards {-1, 1}"},{"location":"go/#observation","title":"Observation","text":"

We follow the observation design of AlphaGo Zero [Silver+17].

Index Description obs[:, :, 0] stones of player_id (@ current board) obs[:, :, 1] stones of player_id's opponent (@ current board) obs[:, :, 2] stones of player_id (@ 1-step before) obs[:, :, 3] stones of player_id's opponent (@ 1-step before) ... ... obs[:, :, -1] color of player_id

Final observation dimension

For the final dimension, there are two possible options:

  • Use the color of current player to play
  • Use the color of player_id

This ambiguity happens because observe function is available even if player_id is different from state.current_player. In AlphaGo Zero paper [Silver+17], the final dimension C is explained as:

The final feature plane, C, represents the colour to play, and has a constant value of either 1 if black is to play or 0 if white is to play.

however, it also describes as

the colour feature C is necessary because the komi is not observable.

So, we use player_id's color to let the agent know komi information. As long as it's called when player_id == state.current_player, this doesn't matter.

"},{"location":"go/#action","title":"Action","text":"

Each action ({0, ..., N * N - 1}) represents the point to be colored. The final action represents pass action.

"},{"location":"go/#rewards","title":"Rewards","text":"

Non-zero rewards are given only at the terminal states. The reward at terminal state is described in this table:

Reward Win +1 Lose -1"},{"location":"go/#termination","title":"Termination","text":"

Termination happens when

  1. either one plays two consecutive passes, or
  2. N * N * 2 steps are elapsed [Silver+17].
"},{"location":"go/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"go/#reference","title":"Reference","text":"
  1. [Silver+17] \"Mastering the game of go without human knowledge\" Nature
"},{"location":"hex/","title":"Hex","text":"darklight

"},{"location":"hex/#usage","title":"Usage","text":"
import pgx\nenv = pgx.make(\"hex\")\n

or you can directly load Hex class

from pgx.hex import Hex\nenv = Hex()\n
"},{"location":"hex/#description","title":"Description","text":"

Hex is a two player abstract strategy board game in which players attempt to connect opposite sides of a rhombus-shaped board made of hexagonal cells. Hex was invented by mathematician and poet Piet Hein in 1942 and later rediscovered and popularized by John Nash.

Wikipedia

"},{"location":"hex/#rules","title":"Rules","text":"

As the first player to move has a distinct advantage, the swap rule is used to compensate for this. The detailed swap rule used in Pgx follows swap pieces:

\"Swap pieces\": The players perform the swap by switching pieces. This means the initial red piece is replaced by a blue piece in the mirror image position, where the mirroring takes place with respect to the board's long diagonal. For example, a red piece at a3 becomes a blue piece at c1. The players do not switch colours: Red stays Red and Blue stays Blue. After the swap, it is Red's turn.

Hex Wiki - Swap rule

"},{"location":"hex/#specs","title":"Specs","text":"Name Value Version v0 Number of players 2 Number of actions 122 (= 11 x 11) + 1 Observation shape (11, 11, 3) Observation type bool Rewards {-1, 1}"},{"location":"hex/#observation","title":"Observation","text":"Index Description [:, :, 0] represents (11, 11) cells filled by player_ix [:, :, 1] represents (11, 11) cells filled by the opponent player of player_id [:, :, 2] represents whether player_id is black or white"},{"location":"hex/#action","title":"Action","text":"

Each action ({0, ... 120}) represents the cell index to be filled. The final action 121 is the swap action available only at the second turn.

"},{"location":"hex/#rewards","title":"Rewards","text":"

Non-zero rewards are given only at the terminal states. The reward at terminal state is described in this table:

Reward Win +1 Lose -1

Note that there is no draw in Hex.

"},{"location":"hex/#termination","title":"Termination","text":"

Termination happens when either one player connect opposite sides of the board.

"},{"location":"hex/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"kuhn_poker/","title":"Kuhn poker","text":"darklight

"},{"location":"kuhn_poker/#description","title":"Description","text":"

Kuhn poker is a simplified poker with three cards: J, Q, and K.

"},{"location":"kuhn_poker/#rules","title":"Rules","text":"

Each player is dealt one card and the remaining card is unused. There are four actions: check, call, bet, and fold and five possible scenarios.

  1. bet (1st) - call (2nd) : Showdown and the winner takes +2
  2. bet (1st) - fold (2nd) : 1st player takes +1
  3. check (1st) - check (2nd) : Showdown and the winner takes +1
  4. check (1st) - bet (2nd) - call (1st) : Showdown and the winner takes +2
  5. check (1st) - bet (2nd) - fold (1st) : 2nd takes +1
"},{"location":"kuhn_poker/#specs","title":"Specs","text":"Name Value Version v0 Number of players 2 Number of actions 4 Observation shape (7,) Observation type bool Rewards {-2, -1, +1, +2}"},{"location":"kuhn_poker/#observation","title":"Observation","text":"Index Description [0] One if J in my hand [1] One if Q in my hand [2] One if K in my hand [3] One if 0 chip is bet by me [4] One if 1 chip is bet by me [5] One if 0 chip of the opponent [6] One if 1 chip of the opponent"},{"location":"kuhn_poker/#action","title":"Action","text":"

There are four distinct actions.

Action Index Call 0 Bet 1 Fold 2 Check 3"},{"location":"kuhn_poker/#rewards","title":"Rewards","text":"

The winner takes +2 or +1 depending on the game payoff. As Kuhn poker is zero-sum game, the loser takes -2 or -1 respectively.

"},{"location":"kuhn_poker/#termination","title":"Termination","text":"

Follows the rules above.

"},{"location":"kuhn_poker/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"leduc_holdem/","title":"Leduc hold\u2019em","text":"darklight

"},{"location":"leduc_holdem/#description","title":"Description","text":"

Leduc hold\u2019em is a simplified poker proposed in [Souhty+05].

"},{"location":"leduc_holdem/#rules","title":"Rules","text":"

We quote the description in [Souhty+05]:

Leduc Hold \u2019Em. We have also constructed a smaller version of hold \u2019em, which seeks to retain the strategic elements of the large game while keeping the size of the game tractable. In Leduc hold \u2019em, the deck consists of two suits with three cards in each suit. There are two rounds. In the first round a single private card is dealt to each player. In the second round a single board card is revealed. There is a two-bet maximum, with raise amounts of 2 and 4 in the first and second round, respectively. Both players start the first round with 1 already in the pot.

Figure 1: An example decision tree for a single betting round in poker with a two-bet maximum. Leaf nodes with open boxes continue to the next round, while closed boxes end the hand.

"},{"location":"leduc_holdem/#specs","title":"Specs","text":"Name Value Version v0 Number of players 2 Number of actions 4 Observation shape (7,) Observation type bool Rewards {-13, -12, ... 0, ..., 12, 13}"},{"location":"leduc_holdem/#observation","title":"Observation","text":"Index Description [0] True if J in hand [1] True if Q in hand [2] True if K in hand [3] True if J is the public card [4] True if J is the public card [5] True if J is the public card [6:19] represent my chip count (0, ..., 13) [20:33] represent opponent's chip count (0, ..., 13)"},{"location":"leduc_holdem/#action","title":"Action","text":"

There are four distinct actions.

Index Action 0 Call 1 Raise 2 Fold"},{"location":"leduc_holdem/#rewards","title":"Rewards","text":"

The reward is the payoff of the game.

"},{"location":"leduc_holdem/#termination","title":"Termination","text":"

Follows the rules above.

"},{"location":"leduc_holdem/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"leduc_holdem/#references","title":"References","text":"
  • [Souhty+05] Bayes' Bluff: Opponent Modelling in Poker UAI2005
"},{"location":"minatar_asterix/","title":"MinAtar Asterix","text":""},{"location":"minatar_asterix/#usage","title":"Usage","text":"

Note that the MinAtar suite is provided as a separate extension for Pgx (pgx-minatar). Therefore, please run the following command additionaly to use the MinAtar suite in Pgx:

pip install pgx-minatar\n

Then, you can use the environment as follows:

import pgx\nenv = pgx.make(\"minatar-asterix\")\n
"},{"location":"minatar_asterix/#description","title":"Description","text":"

MinAtar is originally proposed by [Young&Tian+19]. The Pgx implementation is intended to be the exact copy of the original MinAtar implementation in JAX. The Asterix environment is described as follows:

The player can move freely along the 4 cardinal directions. Enemies and treasure spawn from the sides. A reward of +1 is given for picking up treasure. Termination occurs if the player makes contact with an enemy. Enemy and treasure direction are indicated by a trail channel. Difficulty is periodically increased by increasing the speed and spawn rate of enemies and treasure.

github.com/kenjyoung/MinAtar - asterix.py

"},{"location":"minatar_asterix/#specs","title":"Specs","text":"Name Value Version v0 Number of players 1 Number of actions 5 Observation shape (10, 10, 4) Observation type bool Rewards {0, 1}"},{"location":"minatar_asterix/#observation","title":"Observation","text":"Index Channel [:, :, 0] Player [:, :, 1] Enemy [:, :, 2] Trail [:, :, 3] Gold"},{"location":"minatar_asterix/#action","title":"Action","text":"

TBA

"},{"location":"minatar_asterix/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"minatar_asterix/#reference","title":"Reference","text":"
  • [Young&Tian+19] \"Minatar: An atari-inspired testbed for thorough and reproducible reinforcement learning experiments\" arXiv:1903.03176
"},{"location":"minatar_asterix/#license","title":"License","text":"

Pgx is provided under the Apache 2.0 License, but the original MinAtar suite follows the GPL 3.0 License. Therefore, please note that the separated MinAtar extension for Pgx also adheres to the GPL 3.0 License.

"},{"location":"minatar_breakout/","title":"MinAtar Breakout","text":""},{"location":"minatar_breakout/#usage","title":"Usage","text":"

Note that the MinAtar suite is provided as a separate extension for Pgx (pgx-minatar). Therefore, please run the following command additionaly to use the MinAtar suite in Pgx:

pip install pgx-minatar\n

Then, you can use the environment as follows:

import pgx\nenv = pgx.make(\"minatar-breakout\")\n
"},{"location":"minatar_breakout/#description","title":"Description","text":"

MinAtar is originally proposed by [Young&Tian+19]. The Pgx implementation is intended to be the exact copy of the original MinAtar implementation in JAX. The Breakout environment is described as follows:

The player controls a paddle on the bottom of the screen and must bounce a ball tobreak 3 rows of bricks along the top of the screen. A reward of +1 is given for each brick broken by the ball. When all bricks are cleared another 3 rows are added. The ball travels only along diagonals, when it hits the paddle it is bounced either to the left or right depending on the side of the paddle hit, when it hits a wall or brick it is reflected. Termination occurs when the ball hits the bottom of the screen. The balls direction is indicated by a trail channel.

github.com/kenjyoung/MinAtar - breakout.py

"},{"location":"minatar_breakout/#specs","title":"Specs","text":"Name Value Version v0 Number of players 1 Number of actions 3 Observation shape (10, 10, 4) Observation type bool Rewards {0, 1}"},{"location":"minatar_breakout/#observation","title":"Observation","text":"Index Channel [:, :, 0] Paddle [:, :, 1] Ball [:, :, 2] Trail [:, :, 3] Brick"},{"location":"minatar_breakout/#action","title":"Action","text":"

TBA

"},{"location":"minatar_breakout/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"minatar_breakout/#reference","title":"Reference","text":"
  • [Young&Tian+19] \"Minatar: An atari-inspired testbed for thorough and reproducible reinforcement learning experiments\" arXiv:1903.03176
"},{"location":"minatar_breakout/#license","title":"License","text":"

Pgx is provided under the Apache 2.0 License, but the original MinAtar suite follows the GPL 3.0 License. Therefore, please note that the separated MinAtar extension for Pgx also adheres to the GPL 3.0 License.

"},{"location":"minatar_freeway/","title":"MinAtar Freeway","text":""},{"location":"minatar_freeway/#usage","title":"Usage","text":"

Note that the MinAtar suite is provided as a separate extension for Pgx (pgx-minatar). Therefore, please run the following command additionaly to use the MinAtar suite in Pgx:

pip install pgx-minatar\n

Then, you can use the environment as follows:

import pgx\nenv = pgx.make(\"minatar-freeway\")\n
"},{"location":"minatar_freeway/#description","title":"Description","text":"

MinAtar is originally proposed by [Young&Tian+19]. The Pgx implementation is intended to be the exact copy of the original MinAtar implementation in JAX. The Freeway environment is described as follows:

The player begins at the bottom of the screen and motion is restricted to traveling up and down. Player speed is also restricted such that the player can only move every 3 frames. A reward of +1 is given when the player reaches the top of the screen, at which point the player is returned to the bottom. Cars travel horizontally on the screen and teleport to the other side when the edge is reached. When hit by a car, the player is returned to the bottom of the screen. Car direction and speed is indicated by 5 trail channels, the location of the trail gives direction while the specific channel indicates how frequently the car moves (from once every frame to once every 5 frames). Each time the player successfully reaches the top of the screen, the car speeds are randomized. Termination occurs after 2500 frames have elapsed.

github.com/kenjyoung/MinAtar - freeway.py

"},{"location":"minatar_freeway/#specs","title":"Specs","text":"Name Value Version v0 Number of players 1 Number of actions 3 Observation shape (10, 10, 7) Observation type bool Rewards {0, 1}"},{"location":"minatar_freeway/#observation","title":"Observation","text":"Index Channel [:, :, 0] Chicken [:, :, 1] Car [:, :, 2] Speed 1 [:, :, 3] Speed 2 [:, :, 4] Speed 3 [:, :, 5] Speed 4"},{"location":"minatar_freeway/#action","title":"Action","text":"

TBA

"},{"location":"minatar_freeway/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"minatar_freeway/#reference","title":"Reference","text":"
  • [Young&Tian+19] \"Minatar: An atari-inspired testbed for thorough and reproducible reinforcement learning experiments\" arXiv:1903.03176
"},{"location":"minatar_freeway/#license","title":"LICENSE","text":"

Pgx is provided under the Apache 2.0 License, but the original MinAtar suite follows the GPL 3.0 License. Therefore, please note that the separated MinAtar extension for Pgx also adheres to the GPL 3.0 License.

"},{"location":"minatar_seaquest/","title":"MinAtar Seaquest","text":""},{"location":"minatar_seaquest/#usage","title":"Usage","text":"

Note that the MinAtar suite is provided as a separate extension for Pgx (pgx-minatar). Therefore, please run the following command additionaly to use the MinAtar suite in Pgx:

pip install pgx-minatar\n

Then, you can use the environment as follows:

import pgx\nenv = pgx.make(\"minatar-seaquest\")\n
"},{"location":"minatar_seaquest/#description","title":"Description","text":"

MinAtar is originally proposed by [Young&Tian+19]. The Pgx implementation is intended to be the exact copy of the original MinAtar implementation in JAX. The Seaquest environment is described as follows:

The player controls a submarine consisting of two cells, front and back, to allow direction to be determined. The player can also fire bullets from the front of the submarine. Enemies consist of submarines and fish, distinguished by the fact that submarines shoot bullets and fish do not. A reward of +1 is given each time an enemy is struck by one of the player's bullets, at which point the enemy is also removed. There are also divers which the player can move onto to pick up, doing so increments a bar indicated by another channel along the bottom of the screen. The player also has a limited supply of oxygen indicated by another bar in another channel. Oxygen degrades over time, and is replenished whenever the player moves to the top of the screen as long as the player has at least one rescued diver on board. The player can carry a maximum of 6 divers. When surfacing with less than 6, one diver is removed. When surfacing with 6, all divers are removed and a reward is given for each active cell in the oxygen bar. Each time the player surfaces the difficulty is increased by increasing the spawn rate and movement speed of enemies. Termination occurs when the player is hit by an enemy fish, sub or bullet; or when oxygen reached 0; or when the player attempts to surface with no rescued divers. Enemy and diver directions are indicated by a trail channel active in their previous location to reduce partial observability.

github.com/kenjyoung/MinAtar - seaquest.py

"},{"location":"minatar_seaquest/#specs","title":"Specs","text":"Name Value Version v0 Number of players 1 Number of actions 6 Observation shape (10, 10, 10) Observation type bool Rewards {0, 1, ..., 10}"},{"location":"minatar_seaquest/#observation","title":"Observation","text":"Index Channel [:, :, 0] Player submarine (front) [:, :, 1] Player submarine (back) [:, :, 2] Friendly bullet [:, :, 3] Trail [:, :, 4] Enemy bullet [:, :, 5] Enemy fish [:, :, 6] Enemy submarine [:, :, 7] Oxygen guage [:, :, 8] Diver guage [:, :, 9] Diver"},{"location":"minatar_seaquest/#action","title":"Action","text":"

TBA

"},{"location":"minatar_seaquest/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"minatar_seaquest/#reference","title":"Reference","text":"
  • [Young&Tian+19] \"Minatar: An atari-inspired testbed for thorough and reproducible reinforcement learning experiments\" arXiv:1903.03176
"},{"location":"minatar_seaquest/#license","title":"LICENSE","text":"

Pgx is provided under the Apache 2.0 License, but the original MinAtar suite follows the GPL 3.0 License. Therefore, please note that the separated MinAtar extension for Pgx also adheres to the GPL 3.0 License.

"},{"location":"minatar_space_invaders/","title":"MinAtar Space Invaders","text":""},{"location":"minatar_space_invaders/#usage","title":"Usage","text":"

Note that the MinAtar suite is provided as a separate extension for Pgx (pgx-minatar). Therefore, please run the following command additionaly to use the MinAtar suite in Pgx:

pip install pgx-minatar\n

Then, you can use the environment as follows:

import pgx\nenv = pgx.make(\"minatar-space_invaders\")\n
"},{"location":"minatar_space_invaders/#description","title":"Description","text":"

MinAtar is originally proposed by [Young&Tian+19]. The Pgx implementation is intended to be the exact copy of the original MinAtar implementation in JAX. The Space Invaders environment is described as follows:

The player controls a cannon at the bottom of the screen and can shoot bullets upward at a cluster of aliens above. The aliens move across the screen until one of them hits the edge, at which point they all move down and switch directions. The current alien direction is indicated by 2 channels (one for left and one for right) one of which is active at the location of each alien. A reward of +1 is given each time an alien is shot, and that alien is also removed. The aliens will also shoot bullets back at the player. When few aliens are left, alien speed will begin to increase. When only one alien is left, it will move at one cell per frame. When a wave of aliens is fully cleared a new one will spawn which moves at a slightly faster speed than the last. Termination occurs when an alien or bullet hits the player.

github.com/kenjyoung/MinAtar - space_invaders.py

"},{"location":"minatar_space_invaders/#specs","title":"Specs","text":"Name Value Version v0 Number of players 1 Number of actions 4 Observation shape (10, 10, 6) Observation type bool Rewards {0, 1}"},{"location":"minatar_space_invaders/#observation","title":"Observation","text":"Index Channel [:, :, 0] Cannon [:, :, 1] Alien [:, :, 2] Alien left [:, :, 3] Alien right [:, :, 4] Friendly bullet [:, :, 5] Enemy bullet"},{"location":"minatar_space_invaders/#action","title":"Action","text":"

TBA

"},{"location":"minatar_space_invaders/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"minatar_space_invaders/#reference","title":"Reference","text":"
  • [Young&Tian+19] \"Minatar: An atari-inspired testbed for thorough and reproducible reinforcement learning experiments\" arXiv:1903.03176
"},{"location":"minatar_space_invaders/#license","title":"LICENSE","text":"

Pgx is provided under the Apache 2.0 License, but the original MinAtar suite follows the GPL 3.0 License. Therefore, please note that the separated MinAtar extension for Pgx also adheres to the GPL 3.0 License.

"},{"location":"othello/","title":"Othello","text":"darklight

"},{"location":"othello/#usage","title":"Usage","text":"
import pgx\nenv = pgx.make(\"othello\")\n

or you can directly load Othello class

from pgx.othello import Othello\nenv = Othello()\n
"},{"location":"othello/#description","title":"Description","text":"

Othello, or differing in not having a defined starting position, Reversi, is a two-player zero-sum and perfect information abstract strategy board game, usually played on a board with 8 rows and 8 columns and a set of light and a dark turnable pieces for each side. The player's goal is to have a majority of their colored pieces showing at the end of the game, turning over as many of their opponent's pieces as possible. The dark player makes the first move from the starting position, alternating with the light player. Each player has to place a piece on the board such that there exists at least one straight (horizontal, vertical, or diagonal) occupied line of opponent pieces between the new piece and another own piece. After placing the piece, the side turns over (flips, captures) all opponent pieces lying on any straight lines between the new piece and any anchoring own pieces.

Chess Programming Wiki

"},{"location":"othello/#specs","title":"Specs","text":"Name Value Version v0 Number of players 2 Number of actions 65 (= 8 x 8 + 1) Observation shape (8, 8, 2) Observation type bool Rewards {-1, 0, 1}"},{"location":"othello/#observation","title":"Observation","text":"Index Description [:, :, 0] represents (8, 8) squares colored by the current player [:, :, 1] represents (8, 8) squares colored by the opponent player of current player"},{"location":"othello/#action","title":"Action","text":"

Each action ({0, ..., 63}) represents the square index to be filled. The last 64-th action represents pass action.

"},{"location":"othello/#rewards","title":"Rewards","text":"

Non-zero rewards are given only at the terminal states. The reward at terminal state is described in this table:

Reward Win +1 Lose -1 Draw 0"},{"location":"othello/#termination","title":"Termination","text":"

Termination happens when all 64 (= 8 x 8) playable squares are filled.

"},{"location":"othello/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"play2048/","title":"2048","text":"darklight

"},{"location":"play2048/#usage","title":"Usage","text":"
import pgx\nenv = pgx.make(\"2048\")\n

or you can directly load Play2048 class

from pgx.paly2048 import Play2048\nenv = Play2048()\n
"},{"location":"play2048/#description","title":"Description","text":"

2048 ...

Wikipedia

"},{"location":"play2048/#specs","title":"Specs","text":"Name Value Version v0 Number of players 1 Number of actions 4 Observation shape (4, 4, 31) Observation type bool Rewards {0, 2, 4, ...}"},{"location":"play2048/#observation","title":"Observation","text":"

Our obseervation design basically follows [Antonoglou+22]:

In our 2048 experiments we used a binary representation of the observation as an input to our model. Specifically, the 4 \u00d7 4 board was flattened into a single vector of size 16, and a binary representation of 31 bits for each number was obtained, for a total size of 496 numbers.

However, instaead of 496-d flat vector, we employ (4, 4, 31) vector.

Index Description [i, j, b] represents that square (i, j) has a tile of 2 ^ b if b > 0"},{"location":"play2048/#action","title":"Action","text":"

Each action corresnponds to 0 (left), 1 (up), 2 (right), 3 (down).

"},{"location":"play2048/#rewards","title":"Rewards","text":"

Sum of merged tiles.

"},{"location":"play2048/#termination","title":"Termination","text":"

If all squares are filled with tiles and no legal actions are available, the game terminates.

"},{"location":"play2048/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"play2048/#reference","title":"Reference","text":"
  1. [Antonoglou+22] \"Planning in Stochastic Environments with a Learned Modell\", ICLR
"},{"location":"shogi/","title":"Shogi","text":"darklight

"},{"location":"shogi/#usage","title":"Usage","text":"
import pgx\nenv = pgx.make(\"shogi\")\n

or you can directly load Shogi class

from pgx.shogi import Shogi\nenv = Shogi()\n
"},{"location":"shogi/#description","title":"Description","text":"

TBA

"},{"location":"shogi/#specs","title":"Specs","text":"Name Value Version v0 Number of players 2 Number of actions 2187 Observation shape (9, 9, 119) Observation type bool Rewards {-1, 0, 1}"},{"location":"shogi/#observation","title":"Observation","text":"

We follow the observation design of dlshogi, an open-source shogi AI. Ther original dlshogi implementations are here. Pgx implementation has [9, 9, 119] shape and [:, :, x] denotes:

x Description 0:14 Where my piece x exists 14:28 Where my pieces x are attacking 28:31 Where the number of my attacking pieces are >= 1,2,3 respectively 31:45 Where opponent's piece x exists 45:59 Where opponent's pieces x are attacking 59:62 Where the number of opponent's attacking pieces are >= 1,2,3 respectively

The following planes are all ones ore zeros

x Description 62:70 My hand has >= 1, ..., 8 Pawn 70:74 My hand has >= 1, 2, 3, 4 Lance 74:78 My hand has >= 1, 2, 3, 4 Knight 78:82 My hand has >= 1, 2, 3, 4 Silver 82:86 My hand has >= 1, 2, 3, 4 Gold 86:88 My hand has >= 1, 2 Bishop 88:90 My hand has >= 1, 2 Rook 90:98 Oppnent's hand has >= 1, ..., 8 Pawn 98:102 Oppnent's hand has >= 1, 2, 3, 4 Lance 102:106 Oppnent's hand has >= 1, 2, 3, 4 Knight 106:110 Oppnent's hand has >= 1, 2, 3, 4 Silver 110:114 Oppnent's hand has >= 1, 2, 3, 4 Gold 114:116 Oppnent's hand has >= 1, 2 Bishop 116:118 Oppnent's hand has >= 1, 2 Rook 118 Ones if checked

Note that piece ids are

Piece Id \u6b69\u3000 PAWN 0 \u9999\u3000 LANCE 1 \u6842\u3000 KNIGHT 2 \u9280\u3000 SILVER 3 \u89d2\u3000 BISHOP 4 \u98db\u3000 ROOK 5 \u91d1\u3000 GOLD 6 \u7389\u3000 KING 7 \u3068\u3000 PRO_PAWN 8 \u6210\u9999 PRO_LANCE 9 \u6210\u6842 PRO_KNIGHT 10 \u6210\u9280 PRO_SILVER 11 \u99ac\u3000 HORSE 12 \u9f8d\u3000 DRAGON 13"},{"location":"shogi/#action","title":"Action","text":"

The design of action also follows that of dlshogi. There are 2187 = 81 x 27 distinct actions. The action can be decomposed into

  • direction from which the piece moves and
  • destination to which the piece moves

by direction, destination = action // 81, action % 81. The direction is encoded by

id direction 0 Up 1 Up left 2 Up right 3 Left 4 Right 5 Down 6 Down left 7 Down right 8 Up2 left 9 Up2 right 10 Promote + Up 11 Promote + Up left 12 Promote + Up right 13 Promote + Left 14 Promote + Right 15 Promote + Down 16 Promote + Down left 17 Promote + Down right 18 Promote + Up2 left 19 Promote + Up2 right 20 Drop Pawn 21 Drop Lance 22 Drop Knight 23 Drop Silver 24 Drop Bishop 25 Drop Rook 26 Drop Gold"},{"location":"shogi/#rewards","title":"Rewards","text":"

Non-zero rewards are given only at the terminal states. The reward at terminal state is described in this table:

Reward Win +1 Lose -1 Draw 0"},{"location":"shogi/#termination","title":"Termination","text":"

TBA

"},{"location":"shogi/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"},{"location":"sparrow_mahjong/","title":"Sparrow mahjong","text":"darklight

"},{"location":"sparrow_mahjong/#description","title":"Description","text":"

TBA

"},{"location":"tic_tac_toe/","title":"Tic-tac-toe","text":"darklight

"},{"location":"tic_tac_toe/#usage","title":"Usage","text":"
import pgx\nenv = pgx.make(\"tic_tac_toe\")\n

or you can directly load TicTacToe class

from pgx.tic_tac_toe import TicTacToe\nenv = TicTacToe()\n
"},{"location":"tic_tac_toe/#description","title":"Description","text":"

Tic-tac-toe is a paper-and-pencil game for two players who take turns marking the spaces in a three-by-three grid with X or O. The player who succeeds in placing three of their marks in a horizontal, vertical, or diagonal row is the winner.

Wikipedia

"},{"location":"tic_tac_toe/#specs","title":"Specs","text":"Name Value Version v0 Number of players 2 Number of actions 9 Observation shape (3, 3, 2) Observation type bool Rewards {-1, 0, 1}"},{"location":"tic_tac_toe/#observation","title":"Observation","text":"Index Description [:, :, 0] represents (3, 3) squares filled by the current player [:, :, 1] represents (3, 3) squares filled by the opponent player of current player"},{"location":"tic_tac_toe/#action","title":"Action","text":"

Each action represents the square index to be filled.

"},{"location":"tic_tac_toe/#rewards","title":"Rewards","text":"

Non-zero rewards are given only at the terminal states. The reward at terminal state is described in this table:

Reward Win +1 Lose -1 Draw 0"},{"location":"tic_tac_toe/#termination","title":"Termination","text":"

Termination happens when

  1. either one player places three of their symbols in a row (horizontally, vertically, or diagonally), or
  2. all nine squares are filled.
"},{"location":"tic_tac_toe/#version-history","title":"Version History","text":"
  • v0 : Initial release (v1.0.0)
"}]} \ No newline at end of file diff --git a/shogi/index.html b/shogi/index.html new file mode 100644 index 000000000..33683c991 --- /dev/null +++ b/shogi/index.html @@ -0,0 +1,1234 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Shogi - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Shogi

+
+
+
+

+ +

+
+
+

+ +

+
+
+
+

Usage

+
import pgx
+
+env = pgx.make("shogi")
+
+

or you can directly load Shogi class

+
from pgx.shogi import Shogi
+
+env = Shogi()
+
+

Description

+

TBA

+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players2
Number of actions2187
Observation shape(9, 9, 119)
Observation typebool
Rewards{-1, 0, 1}
+

Observation

+

We follow the observation design of dlshogi, an open-source shogi AI. +Ther original dlshogi implementations are here. +Pgx implementation has [9, 9, 119] shape and [:, :, x] denotes:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
xDescription
0:14Where my piece x exists
14:28Where my pieces x are attacking
28:31Where the number of my attacking pieces are >= 1,2,3 respectively
31:45Where opponent's piece x exists
45:59Where opponent's pieces x are attacking
59:62Where the number of opponent's attacking pieces are >= 1,2,3 respectively
+

The following planes are all ones ore zeros

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
xDescription
62:70My hand has >= 1, ..., 8 Pawn
70:74My hand has >= 1, 2, 3, 4 Lance
74:78My hand has >= 1, 2, 3, 4 Knight
78:82My hand has >= 1, 2, 3, 4 Silver
82:86My hand has >= 1, 2, 3, 4 Gold
86:88My hand has >= 1, 2 Bishop
88:90My hand has >= 1, 2 Rook
90:98Oppnent's hand has >= 1, ..., 8 Pawn
98:102Oppnent's hand has >= 1, 2, 3, 4 Lance
102:106Oppnent's hand has >= 1, 2, 3, 4 Knight
106:110Oppnent's hand has >= 1, 2, 3, 4 Silver
110:114Oppnent's hand has >= 1, 2, 3, 4 Gold
114:116Oppnent's hand has >= 1, 2 Bishop
116:118Oppnent's hand has >= 1, 2 Rook
118Ones if checked
+

Note that piece ids are

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PieceId
歩  PAWN0
香  LANCE1
桂  KNIGHT2
銀  SILVER3
角  BISHOP4
飛  ROOK5
金  GOLD6
玉  KING7
と  PRO_PAWN8
成香 PRO_LANCE9
成桂 PRO_KNIGHT10
成銀 PRO_SILVER11
馬  HORSE12
龍  DRAGON13
+

Action

+

The design of action also follows that of dlshogi. +There are 2187 = 81 x 27 distinct actions. +The action can be decomposed into

+
    +
  • direction from which the piece moves and
  • +
  • destination to which the piece moves
  • +
+

by direction, destination = action // 81, action % 81. +The direction is encoded by

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
iddirection
0Up
1Up left
2Up right
3Left
4Right
5Down
6Down left
7Down right
8Up2 left
9Up2 right
10Promote + Up
11Promote + Up left
12Promote + Up right
13Promote + Left
14Promote + Right
15Promote + Down
16Promote + Down left
17Promote + Down right
18Promote + Up2 left
19Promote + Up2 right
20Drop Pawn
21Drop Lance
22Drop Knight
23Drop Silver
24Drop Bishop
25Drop Rook
26Drop Gold
+

Rewards

+

Non-zero rewards are given only at the terminal states. +The reward at terminal state is described in this table:

+ + + + + + + + + + + + + + + + + + + + + +
Reward
Win+1
Lose-1
Draw0
+

Termination

+

TBA

+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml new file mode 100644 index 000000000..bc7f9dc6a --- /dev/null +++ b/sitemap.xml @@ -0,0 +1,118 @@ + + + + http://pgx.readthedocs.io/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/animal_shogi/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/api/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/api_usage/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/backgammon/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/bridge_bidding/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/chess/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/connect_four/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/gardner_chess/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/go/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/hex/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/kuhn_poker/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/leduc_holdem/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/minatar_asterix/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/minatar_breakout/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/minatar_freeway/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/minatar_seaquest/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/minatar_space_invaders/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/othello/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/play2048/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/shogi/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/sparrow_mahjong/ + 2023-07-21 + daily + + + http://pgx.readthedocs.io/tic_tac_toe/ + 2023-07-21 + daily + + \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz new file mode 100644 index 000000000..6c6c1cdb5 Binary files /dev/null and b/sitemap.xml.gz differ diff --git a/sparrow_mahjong/index.html b/sparrow_mahjong/index.html new file mode 100644 index 000000000..4ca099254 --- /dev/null +++ b/sparrow_mahjong/index.html @@ -0,0 +1,756 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Sparrow mahjong - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Sparrow mahjong

+
+
+
+

+ +

+
+
+

+ +

+
+
+
+

Description

+

TBA

+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/tic_tac_toe/index.html b/tic_tac_toe/index.html new file mode 100644 index 000000000..ba4de0b03 --- /dev/null +++ b/tic_tac_toe/index.html @@ -0,0 +1,956 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + Tic-tac-toe - Pgx Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + +
+ + +
+ +
+ + + + + + +
+
+ + + +
+
+
+ + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Tic-tac-toe

+
+
+
+

+ +

+
+
+

+ +

+
+
+
+

Usage

+
import pgx
+
+env = pgx.make("tic_tac_toe")
+
+

or you can directly load TicTacToe class

+
from pgx.tic_tac_toe import TicTacToe
+
+env = TicTacToe()
+
+

Description

+
+

Tic-tac-toe is a paper-and-pencil game for two players who take turns marking the spaces in a three-by-three grid with X or O. The player who succeeds in placing three of their marks in a horizontal, vertical, or diagonal row is the winner.

+

Wikipedia

+
+

Specs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameValue
Versionv0
Number of players2
Number of actions9
Observation shape(3, 3, 2)
Observation typebool
Rewards{-1, 0, 1}
+

Observation

+ + + + + + + + + + + + + + + + + +
IndexDescription
[:, :, 0]represents (3, 3) squares filled by the current player
[:, :, 1]represents (3, 3) squares filled by the opponent player of current player
+

Action

+

Each action represents the square index to be filled.

+

Rewards

+

Non-zero rewards are given only at the terminal states. +The reward at terminal state is described in this table:

+ + + + + + + + + + + + + + + + + + + + + +
Reward
Win+1
Lose-1
Draw0
+

Termination

+

Termination happens when

+
    +
  1. either one player places three of their symbols in a row (horizontally, vertically, or diagonally), or
  2. +
  3. all nine squares are filled.
  4. +
+

Version History

+
    +
  • v0 : Initial release (v1.0.0)
  • +
+ + + + + + +
+
+ + +
+ +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file