Stable_diffusion_acceleration_and_lightweight

Fewer parameters and inference time steps,more aesthetically results

Introduction

Text to Image(T2I):

Text-to-image uses AI to understand your words and convert them to a unique image each time.

The development of T2I model:

Stable diffusion: a milestone work on latent space. A framework that trains the diffusion models on latent space, which is a scaled-up version of Latent Diffusion Model (LDM).

Relate work: On distillation of guided T2I diffusion models:

W-condition(CVPR 2023 2023 CVPR Award Candidate)

Two-stage distillation model:

stage1: Guidance weight stage, Guidance weight w as extra parameter to the UNet to prevent positive prompts and negative prompts of the Classifier-Free Guidance (CFG).

stage2: Progressive distillation, the progress distillation pipeline similarity the work is employ to decrease the inference steps.

Snapfusion

stage1:In direct distillation, the teacher only distills once (16 → 8).Balancing the CFG-Aware loss and Vanilla step distillation loss for step reduction

Challenge

An extra multi-stage distillation pipeline will lead to accumulated error.512→256→128→64→32→16→8
Poor scalability:simple UNet output with v-prediction, the w-condition and snapfusion are based on diffusion model outputs in the form of v-prediction predictions(stabel diffusion v2.0).But stable diffusion v1.x are not suitable.

The prediction type of stable diffusion consist of 'epsilon','sample', and 'v_prediction'. More information can be found in Progressive distillation.

Our Framework

The high quality and large scale image-text dataset are counstructed from Laion-art, MSCOCO, DiffunsionDB and HPSV2. For the quality of image, the hpsv2 score is used for evaluate the human references. More detail information can be found in HPSv2.For the quality of prompts, the minigpt, blip are used to clean the prompts, the process follows the work:recastai/LAION-art-EN-improved-captions.

Step distillation for guided T2I diffusion models:

Algorithm Description:

Result

Generation Details

Images of size 512 × 512 are generated from the publicly released model with the DPM Solver++ sampler, 8, 16 and 20 sampling steps, and a guidance scale of 7.5. For MSCOCO 2017 caption dataset, the first prompt of each image are selected for zero-shot inference.

For HPSv2 dataset, we infer the test dataset benchmark include the four types prompts,"Animation", "Concept-art", "Painting", and "Photo".

Zero-shot quantitative results on MSCOCO 2017 caption dataset and HPSv2 dataset

Visualization results

Reference

https://snap-research.github.io/SnapFusion/

https://github.com/Nota-NetsPresso/BK-SDM

https://github.com/tgxs002/HPSv2

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
asset		asset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stable_diffusion_acceleration_and_lightweight

Introduction

Text to Image(T2I):

The development of T2I model:

Relate work: On distillation of guided T2I diffusion models:

W-condition(CVPR 2023 2023 CVPR Award Candidate)

Snapfusion

Challenge

Our Framework

Step distillation for guided T2I diffusion models:

Result

Generation Details

Zero-shot quantitative results on MSCOCO 2017 caption dataset and HPSv2 dataset

Visualization results

Reference

TODO List

About

Releases

Packages

License

YangPanHZAU/Stable_diffusion_acceleration_and_lightweight

Folders and files

Latest commit

History

Repository files navigation

Stable_diffusion_acceleration_and_lightweight

Introduction

Text to Image(T2I):

The development of T2I model:

Relate work: On distillation of guided T2I diffusion models:

W-condition(CVPR 2023 2023 CVPR Award Candidate)

Snapfusion

Challenge

Our Framework

Step distillation for guided T2I diffusion models:

Result

Generation Details

Zero-shot quantitative results on MSCOCO 2017 caption dataset and HPSv2 dataset

Visualization results

Reference

TODO List

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages