RoMath: A Mathematical Reasoning Benchmark in Romanian

Adrian Cosma, Ana-Maria Bucur, Emilian Radoi

📘 Abstract | ⚒️ Usage | ♻️ Reproducing the Results | 📖 Citation | 📝 License

TL;DR

📘 Abstract

Mathematics has long been conveyed through natural language, primarily for human understanding. With the rise of mechanized mathematics and proof assistants, there's a growing need to translate informal mathematical text into formal languages. However, most existing benchmarks focus solely on English, overlooking other languages. This paper introduces RoMath, a Romanian mathematical reasoning benchmark suite comprising three datasets: RoMath-Synthetic, RoMath-Baccalaureate, and RoMath-Competitions. These datasets cover a range of mathematical domains and difficulty levels, aiming to improve non-English language models and promote multilingual AI development. By focusing on Romanian, a low-resource language with unique linguistic features, RoMath addresses the limitations of Anglo-centric models and emphasizes the need for dedicated resources beyond simple automatic translation. We benchmark several language models, highlighting the importance of creating resources for underrepresented languages.

⚒️ Usage

Loading the data from 🤗 Huggingface Datasets:

import datasets

subset = 'bac' # could be comps or synthetic

train_dataset = datasets.load_dataset('cosmadrian/romath', subset, split = 'train')
test_dataset = datasets.load_dataset('cosmadrian/romath', subset, split = 'test')

# Do your thing ...

♻️ Reproducing the Results

Generating your own split for `RoMath-Synthetic`

While a pre-generated split for RoMath-Synthetic is provided for convenience on 🤗 HuggingFace, you can generate your own problems using the original DeepMind code with key phrases translated.

See romath-synthetic/ directory for instructions.

Running Experiments

Experiments for the paper are organized in the in the experiments/ directory, with separate scripts for each experiment in the paper. We used SLURM on a private cluster to train, make predictions and evaluate models. Use ./do_sbatch.sh <script.sh> <n_gpus> to run a particular bash script. Modify the ./do_sbatch.sh file to suit your needs.

To run a particular model on a dataset use the following commands:

# Optional LoRA-Fine-tuning
python fine_tune.py --model <hf_model_name> --dataset [bac|comps|synthetic] --output checkpoints/

# Use a (trained) model to make predictions on a test set.
python predict.py --model <hf_model_name> --dataset [bac|comps|synthetic] --temperature 0.5 --k 3 --shots 5 --output predictions/

# Evaluate the predictions of a model using a judge model.
python evaluate.py --pred_file predictions/Qwen-Qwen2-1.5B-Instruct_bac_2_0.5.csv --judge_model <hf_model_name> --output results/

# Compute the relevant metrics for all evaluated prediction files in a folder.
python evaluate/compute_metrics.py --input_dir results/ --output_dir metrics/

For translation, use the translate.py python script, alongside the predict_translated.py script.

For constructing the Judge Dataset (i.e., Table 3), run the evaluate/make_judge_dataset.py with the appropriate arguments and run evaluate_judge.py script.

📖 Citation

If you found our work useful, please cite our paper:

RoMath: A Mathematical Reasoning Benchmark in 🇷🇴 Romanian 🇷🇴

@misc{cosma2024romath,
      title={RoMath: A Mathematical Reasoning Benchmark in Romanian},
      author={Adrian Cosma and Ana-Maria Bucur and Emilian Radoi},
      year={2024},
      eprint={2409.11074},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.11074},
}

📝 License

This work is protected by Attribution-NonCommercial 4.0 International

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
assets		assets
evaluate		evaluate
experiments		experiments
romath-synthetic		romath-synthetic
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
evaluate_judge.py		evaluate_judge.py
fine_tune.py		fine_tune.py
predict.py		predict.py
predict_translations.py		predict_translations.py
requirements.txt		requirements.txt
translate.py		translate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RoMath: A Mathematical Reasoning Benchmark in Romanian

TL;DR

📘 Abstract

⚒️ Usage

♻️ Reproducing the Results

Generating your own split for `RoMath-Synthetic`

Running Experiments

📖 Citation

📝 License

About

Contributors 2

Languages

License

cosmaadrian/romath

Folders and files

Latest commit

History

Repository files navigation

RoMath: A Mathematical Reasoning Benchmark in Romanian

TL;DR

📘 Abstract

⚒️ Usage

♻️ Reproducing the Results

Generating your own split for RoMath-Synthetic

Running Experiments

📖 Citation

📝 License

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages

Generating your own split for `RoMath-Synthetic`