This repository is the official implementation of Reward Imputation with Sketching for Contextual Batched Bandits.
This paper was accepted by NeurIPS 2023.
To install requirements:
pip install -r requirements.txt
To train models except DFM-S in the paper, run this command:
python algo_main.py Algorithm_name
To train DFM-S in the paper, run this command:
python algo_main2.py
We recommend you tuning hyper-parameters by using nni module. In our experiments, we use nni to tune hyper-parameters.
Because we calculate average reward in each episode, you can export reward data using nni after running code. To export reward data, run:
nnictl experiment export [experiment_id] --filename [file_path] --type json --intermediate
If you have any questions, please contact us at the email address [email protected], or submit an issue here.