Skip to content

djava/Tetris-Evaluation-ML

Repository files navigation

Try this project out on Tetrisfish!

Objective

The goal of this project is to build a NES Tetris AI that can choose the best moves to make and maximize the score in a game. This goal will be carried out using a regression model that is able to evaluate the state of a board. We will exhaustively evaluate the board states for all move options, then choose the move corresponding to the most optimal board.

The way we are evaluating the board state is based on a pre-existing "golden standard" Tetris board evaluation program called StackRabbit. StackRabbit uses a depth 3 tree search to play out all the possible game boards, with hand-tuned heuristics to determine the value of a board state. Our goal is to emulate StackRabbit’s results using much simpler and computationally cheaper ML-based regression models.

Data Collection, Cleaning and Pre-Processing

Placement Dataset

The dataset comes from the Tetrisfish website (https://www.tetrisfish.com), created by our team member Ansel. This website uses video capture of people playing NES Tetris for real-time evaluation and analysis. Each of the placements from these recorded games are stored in a database for replay and analytical purposes. There are about 750,000 placements and counting from ~8,500 games in the database at the moment. We used these board states from human games as a dataset, extracted using https://www.tetrisfish.com/ml-dataset, where the dataset is filtered to contain only placements on level 18, on games where the player completed at least up to level 18. There are around 141k entries in this after filtering only for games that complete level 18.

Instead of passing the boolean value of each cell as an input to regression, which would result in an excessively large 20x10=200 dimension input, we instead decided to condense the input as a 10-element surface array, which contains the maximum height of each of the 10 columns in a Tetris board. Although this is a lossy operation that loses data such as the location of holes, it retains the information of how accommodating the surface is, which is the key feature we are trying to evaluate. Our hope is to build an AI that relies purely on comparing board surface arrays to determine the best move for any board.

Additionally, the entries are also each labeled with their boards' StackRabbit score as our ground truth y-value, which is a single continuous value normalized between 0 and 1. Since we simplify the problem to consider only the surface array, we must exclude the boards that have holes in them, as they cannot be well represented by our predictors and would add excessive noise to the data. After removing boards with holes, the final cleaned dataset has approximately 105k entries.

Peak-To-Peak Predictors

In pre-processing the dataset, we added "N-local peak-to-peak" predictors to each entry. The value of these predictors is the difference between the highest and lowest columns in an N-column-wide neighborhood. This means that there are peak-to-peak predictors added for each adjacent pair of columns, each adjacent trio, etc. all the way up to the 10-wide neighborhood (which contains the full board). This pushes the full number of predictors all the way up to 55, from the 10 that were originally present.

Adding all the peak-to-peak predictors makes the model much more complex, but also allows the regression techniques to be far more effective, as they're able to reason about the board using predictors that are more relevant to how it should evaluate. For example, it allows the model to recognize when there is a deep hole that requires specific pieces to be able to escape from ("dependency"), or see the height variance of the board without considering the well in the right-most column.

Game-Level-Discriminated Evaluation

An additional complication in the domain of this problem is that optimal strategies in Tetris are dynamic over the course of the game. As the game progresses, the level increases, which changes the speed at which the pieces fall. In the higher levels, the speed becomes fast enough that not every move is possible to reach. For that reason, optimal strategies will play more "safely" or "defensively" in higher game levels, which means that it tries to keep the stack lower to ensure that all the moves are reachable.

The three "relevant" levels to high-level NES Tetris play are level 18, 19 and 29, as the transitions between these levels are the points at which the game speed changes in a way that affects the reachability of moves. The "golden standard" StackRabbit board evaluator is able to evaluate board states differently depending on what game level they are played on. Level 18 is the slowest level - thus, StackRabbit plays more aggressively by stacking higher and avoiding any line clears besides tetrises, maintaining efficiency. However, on level 29, the much quicker game speed limits piece movement and StackRabbit is forced to play much lower and safer for survivability. Thus, it will take a lot more line clears for safety, which may result in a lower tetris rate.

We created three datasets, in which the labeled StackRabbit scores for each entry is generated using a different one of the three relevant levels. This will be used to create three different models for each regression method, which adds another dimension to the model selection. We predicted that our models would mirror StackRabbit behavior, with models trained on level 18 playing stacking higher and more aggressively while models trained on level 29 would play much lower, resulting in less efficiency but potentially more survivability.

Board Score Normalization

The board evaluations generated by StackRabbit are not, by definition, bounded. However, the maximum score in our dataset is 46.5, with the minimum being -22,063. The "goodness" of a board is not represented linearly by the space of StackRabbit scores, with the score dropping far into the negatives far more readily than it rises. The difference between a board score of 30 and 35 is far more significant than the difference between -500 and -5,000.

To combat this, we created a normalization function to run the StackRabbit board scores through, in an attempt to linearize the prediction space and normalize into approximately [0, 1] to improve the predictive power of the linear models. The normalization function is (graphed in figure 1). We also invert this function for when we compare test scores and outputs between the linear models and the non-linear models.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published