Skip to content

Latest commit

 

History

History
140 lines (109 loc) · 6.69 KB

README.md

File metadata and controls

140 lines (109 loc) · 6.69 KB

aesthetics-scorer

Predicts aesthetic scores for images. Trained on AI Horde community ratings of Stable Diffusion generated images.

Huggingface demo

Visualized results

Validation split of diffusiondb dataset

OpenClip models

Convnext models

Subset of laion5b

OpenClip models

Convnext models

Benchmarks

Validation Loss

ImageReward accuracy

Accuracy score on testset from https://github.com/THUDM/ImageReward#reproduce-experiments-in-table-2

Usage

Model files in aesthetics_scorer/models folder

Simple gradio demo

python aesthetics_scorer/demo.py

Train

Prepare dataset

  1. dataset-process/dataset_downloader.py downloads zipped diffusiondb dataset images, change path to where you want it stored (~200gb)
  2. dataset-process/dataset_parquet_files.py downloads dataset parquet files and sets up train and validation splits
  3. dataset-process/dataset_image_extract.py extract the rated images from the zipped dataset files
  4. dataset-process/clip_encode_dataset.py precomputes clip embeddings for all rated images (change config if you don't need the different clip versions)

If 1) is already downloaded then 2) can be rerun to update dataset parquet files and 3) and 4) will only perform work on new images needed that hasn't already been processed.

Training

In aesthetics_scorer/train.py change whatever configs you want. Mostly importantly EMBEDDING_FILE to whatever embeddingfile you preprocessed. There are a bunch of different hyperparams that can be changed.

python aesthetics_scorer/train.py

Credits

@article{wangDiffusionDBLargescalePrompt2022,
  title = {Large-Scale Prompt Gallery Dataset for Text-to-Image Generative Models},
  author = {Wang, Zijie J. and Montoya, Evan and Munechika, David and Yang, Haoyang and Hoover, Benjamin and Chau, Duen Horng},
  year = {2022},
  journal = {arXiv:2210.14896 [cs]},
  url = {https://arxiv.org/abs/2210.14896}
}
@software{ilharco_gabriel_2021_5143773,
  author       = {Ilharco, Gabriel and
                  Wortsman, Mitchell and
                  Wightman, Ross and
                  Gordon, Cade and
                  Carlini, Nicholas and
                  Taori, Rohan and
                  Dave, Achal and
                  Shankar, Vaishaal and
                  Namkoong, Hongseok and
                  Miller, John and
                  Hajishirzi, Hannaneh and
                  Farhadi, Ali and
                  Schmidt, Ludwig},
  title        = {OpenCLIP},
  month        = jul,
  year         = 2021,
  note         = {If you use this software, please cite it as below.},
  publisher    = {Zenodo},
  version      = {0.1},
  doi          = {10.5281/zenodo.5143773},
  url          = {https://doi.org/10.5281/zenodo.5143773}
}
@misc{xu2023imagereward,
      title={ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation}, 
      author={Jiazheng Xu and Xiao Liu and Yuchen Wu and Yuxuan Tong and Qinkai Li and Ming Ding and Jie Tang and Yuxiao Dong},
      year={2023},
      eprint={2304.05977},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}