A specialized RWKV model for solving Sudoku puzzles.
- rwkv
- tkinter
- Run
demo.py
orminimum_inference.py
to solve Sudoku puzzles - Run
generate_sudoku_data.py
to generate training data
The current sudoku_rwkv_20241120.pth
model is a specialized RWKV-v6 model trained on 2M Sudoku samples (~39.2B tokens) specifically for solving Sudoku puzzles.
Model specifications:
- Parameters: ~12.7M
- Vocabulary size: 133
- Architecture: 8 layers, 320 dimensions
The model includes a simple improvement for better performance (see model.py
line 372). Corresponding modifications were made in the inference code (rwkv_model.py
lines 852, 893-896).
The model was trained using the RWKV-LM repository.
Hyperparameters:
M_BSZ
: 48CTX_LEN
: 8192LR
: 12e-4 to 3e-5ADAM_EPS
: 1e-18ADAM_BETA1
: 0.9ADAM_BETA2
: 0.95WEIGHT_DECAY
: 0.1
- Below are the old results. The current model seems to be able to solve any solvable Sudoku. If you find any failed cases, please let me know.
I tested the model on samples of varying difficulty levels, with results shown below:
Note: Difficulty is measured by the number of empty cells in the Sudoku puzzle