This repo contains the official implementation for the paper [SOS: Score-based Oversampling for Tabular Data]
by Jayoung Kim, Chaejeong Lee, Yehjin Shin, Sewon Park, Minjung Kim, Noseong Park and Jihoon Cho
Run the following to install a subset of necessary python packages for our code
pip install -r requirements.txt
Train and evaluate our models through main.py
.
main.py:
--config: Training configuration.
(default: 'None')
--mode: <train|fine_tune>: Running mode: train or fine_tune
--workdir: Working directory
-
config
is the path to the config file. Our prescribed config files are provided inconfigs/
. They are formatted according toml_collections
and should be quite self-explanatory.Naming conventions of config files: the path of a config file is a combination of the following dimensions:
- dataset: One of
Default
,Shoppers
,WeatherAUS
,Satimage
. - continuous: train the model with continuously sampled time steps.
- dataset: One of
-
workdir
is the path that stores all artifacts of one experiment, like checkpoints, samples, and evaluation results. -
mode
is either "train" or "fine_tune". When set to "train", it starts the training of a new model, or resumes the training of an old model if its meta-checkpoints exist inworkdir/checkpoints-meta
. When set to "fine_tune", it can do fine-tune the model.
Checkpoint for WeatherAUS
is provided in this Google drive.
This work is built upon some previous papers which might also interest you:
- Song, Yang, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. "Score-Based Generative Modeling through Stochastic Differential Equations." Proceedings of the 10th Annual Conference on International Conference on Learning Representations. 2021.
- Song, Yang, and Stefano Ermon. "Generative Modeling by Estimating Gradients of the Data Distribution." Proceedings of the 33rd Annual Conference on Neural Information Processing Systems. 2019.
- Song, Yang, and Stefano Ermon. "Improved techniques for training score-based generative models." Proceedings of the 34th Annual Conference on Neural Information Processing Systems. 2020.
- Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Proceedings of the 34th Annual Conference on Neural Information Processing Systems. 2020.
Copyright (C) 2023 Samsung SDS Co., Ltd. All rights reserved.
Released under the Samsung SDS Public License V1.0.
For details on the scope of licenses, please refer to the License.md file (https://github.com/JayoungKim408/SOS/blob/master/License.md).
This project was basically developed based on previous open-source codes: https://github.com/yang-song/score_sde_pytorch.