Running the Evaluation scripts

The pre-training script will perform evaluations if you set the do_eval argument to True and evaluation_strategy to step. However, you can also re-run evaluations, or run them separately, by using the evaluation.py script.

To speed up experiments, the evaluation.py scripts expects you to set a folder path where the dataset will be stored locally. The dataset folder must contain a list of parquet files, and you can achieve this by simply cloning the dataset from the hub to a local directory:

git lfs install
git clone https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-tokenized

Then, you should separate the dataset into train and test folders. Or you can modify the script and load the dataset like this. If the dataset is set to be saved in your cache folder, you will only need to download it once:

eval_dataset = load_dataset("nicholasKluge/Pt-Corpus-tokenized", split='test')

Note: Our scripts do not support streaming since much of the arithmetic behind the stipulation of the training uses the length of the dataloaders as a factor. If you want to allow streaming (recommended for larger datasets, but it results in a slower training when compared to having the dataset loaded in memory), you will need to modify how these calculations are made by, for example, hard coding the number of steps, examples in each training split, etc.

You can run this script like this:

python evaluation.py \
--logger_name "TeenyTinyLlama" \
--model_checkpoint_path "nicholasKluge/TeenyTinyLlama-460m" \
--revision "step100000" \
--attn_implementation "flash_attention_2" \
--per_device_eval_batch_size 16 \
--completed_steps 100000 \
--total_energy_consumption 3.34

These are the arguments you pass to this script:

Argument	Description
`logger_name`	The logger name
`model_checkpoint_path`	Path to the model checkpoint to be used for evaluation
`revision`	Specify the revision for the model (e.g., "step100000")
`attn_implementation`	Specify the attention implementation for evaluation
`per_device_eval_batch_size`	Set the batch size per device for evaluation
`completed_steps`	Specify the number of completed training steps (e.g., 100000).
`total_energy_consumption`	Specify the total energy consumption made thus far

Benchmark Evaluation

The lm-evaluation-harness-pt.ipynb notebook showcases how to evaluate a model on the Laiviet version of the LM-Evaluation-Harness. To run it, run the cells in the notebook in an environment with access to a GPU (e.g., Colab). Evaluation on Portuguese benchmarks are available in the New-EVAL folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Running the Evaluation scripts

Benchmark Evaluation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Running the Evaluation scripts

Benchmark Evaluation