Multimodal Final Project
Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
In this project we reproduce the results of the paper and propose some improvements at our attempt to boost the performance of and speed up the generation of text-to-image diffusion model InitNO. The detailed information can be found in our course report.
Python libraries: You can use the following commands to create and activate your InitNO Python environment:
# Create conda environment
conda env create -f environment.yaml
# Activate conda environment
conda activate initno_env
Generating images: Run the following command to generate images.
python run_sd_initno.py
You can specify the following arguments in run_sd_initno.py
:
SEEDS
: a list of random seedsPROMPT
: text prompt for image generationtoken_indices
: a list of target token indicesresult_root
: path to save generated results
For Our Improvements, we provide the following arguments:
USE_CROSS_ATTN_CONFLICT_LOSS
: whether to use the cross-attention conflict lossOPT
: assign the optimizer for the initial noise optimization, providingadam
,adamw
,rmsprop
,sgd
options
The code is built upon InitNO, and we adopt the official evaluation prompts from Attend and Excite. We thank the authors for open-sourcing.