This project solves the coin game utilizing the Counterfactual Regret Minimization (CFR) algorithm.
The coin game is an imperfect information game where players must evaluate the choices of others to make better-informed decisions. The game can be played by
In this game, the first player in the turn order has the advantage of freely choosing any possible sum, but with minimal information about other players’ choices. Conversely, the last player benefits from observing the earlier guesses, which may hint at the number of coins in play, but has fewer guessing options available.
The original game is played with three coins and more than two players. However, our implementation allows both parameters to be input variables. For testing, we used a simpler configuration with three players and one coin per player.
Counterfactual Regret Minimization (CFR) is an iterative algorithm used to compute a Nash equilibrium for imperfect information games. CFR minimizes "regret" for each decision point (or information set) in the game by simulating gameplay and updating strategy weights based on observed regret. Over multiple iterations, the average strategy converges to a Nash equilibrium.
An information set represents all game states a player cannot distinguish between due to imperfect information.
In the coin game implementation, an information set is represented as a string where the first value indicates the player’s coin choice, and the rest represent the previous players’ guesses of the coin sum. This string is referred to as the "history."
The iterative script performs the following steps:
-
Belief Updates: In CFR, beliefs represent probabilities over the game states associated with each information set. In our implementation, beliefs are maintained for each possible sum given the "history." For every information set, we calculate
$B$ belief states, where$B = P \times C + 1$ -
Utility Updates: This step computes the expected utility for each information set based on the current strategy
$U(s) = \mathbb{E}_{a \sim \pi}[U(a, s_j)]$ . Utilities guide the adjustments made to strategies during subsequent updates. -
Likelihood Calculation: This step computes the likelihood of encountering each information set during simulated games. This involves summing the probabilities of reaching prior states, multiplied by the probability of taking the action that leads to the current state. The information set tree is updated from top to bottom, ensuring that when a prior state's probability is used, it has already been updated.
-
Gain Calculation: While the original algorithm focuses on regret, we calculate gain, which is the symmetric counterpart of regret. Gain GG is defined as the difference between the utility of an action in a given state and the expected utility of that state:
$G(a_i, s_j) = U(a_i, s_j) - U(s)$ . Here,$U(a_i, s_j)$ is the expected utility of taking action$a_i$ in state$s_j$ . Gains are set to zero if negative. These calculations align with the original CFR algorithm paper. -
Strategy Update: The strategy for each information set is updated based on cumulative gains. Actions with higher cumulative gain are assigned higher probabilities, reducing the exploitability of the strategy and limiting the potential gains other players can achieve. Since all players follow the same strategy, this adjustment drives the potential gains down, and the strategy towards equilibrium. The algorithm converges to a Nash equilibrium when the total gain reaches zero. However, this equilibrium may only be reached asymptotically, so we might choose to stop the algorithm after a predefined number of iterations or when the total gain falls below a specified threshold (e.g.
$0.1$ ).
In the installation process, we strongly recommend to use uv as a python package manager.
-
Clone the repository:
git clone https://github.com/yourusername/yourproject.git
-
Navigate to the project directory:
cd yourproject
-
Initiate uv project:
uv init
-
Install dependencies:
uv sync
You can find an optimal strategy for every version of the coin game and play against it following these steps:
-
Find strategy for
$C$ coins and$P$ players:uv run main.py -nc [C] -p [P]
-
Play with against the same strategy:
uv run play_coin_game.py -nc [C] -p [P]
Important
This implementation is single threaded for the moment and the search space grows exponentially given
During development, we implemented CFR in the Khun Poker game as a sanity check. You can also find the optimal strategy and play against it using the following steps:
- Find strategy:
uv run khun_poker.py
- Find strategy:
uv play_run khun_poker.py
- https://www.youtube.com/watch?v=NE7V8e77vg4&t=2375s&ab_channel=JonathanGardner
- https://github.com/jonathangardnermd/counterfactualRegretMin
This project is licensed under the MIT License - see the LICENSE file for details.