DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation

Jing He^1✱, Haodong Li^1✱, Yongzhe Hu, Guibao Shen¹, Yingjie Cai³, Weichao Qiu³, Ying-Cong Chen^1,2✉

¹HKUST(GZ) ²HKUST ³Noah's Ark Lab
^✱Both authors contributed equally. ^✉Corresponding author.

We present DisEnvisioner, without cumbersome tuning or relying on multiple reference images, DisEnvisioner is capable of generating a variety of exceptional customized images. Characterized by its emphasis on the interpretation of subject-essential attributes, DisEnvisioner effectively discerns and enhances the subject-essential feature while filtering out irrelevant attributes, achieving superior personalizing quality in both editability and ID consistency.

📢 News

2024-10-25: The inference code is now available.
2024-10-04: Paper released.

🛠️ Setup

We have tested the inference code on: Ubuntu 20.04 LTS, Python 3.9, CUDA 12.3, NVIDIA A800-SXM4-80GB.

Clone the repository (requires git):

git clone https://github.com/EnVision-Research/DisEnvisioner.git
cd DisEnvisioner

Install dependencies (requires conda):

conda create -n disenvisioner python=3.9 -y
conda activate disenvisioner
pip install -r requirements.txt

🕹️ Usage

Download pre-trained models (~3.5GB) (requires git-lfs)

# it may take a few minutes :)
git lfs install
git clone https://huggingface.co/jingheya/disenvisioner_models

Test on your own images

You can use the following script to generate customized images. (OR RUN: bash run_disenvisioner.sh)

CUDA_VISIBLE_DEVICES=0 python run_disenvisioner.py \
    --pretrained_model_name_or_path "SG161222/Realistic_Vision_V4.0_noVAE" \
    --pretrained_CLIP "openai/clip-vit-large-patch14" \
    --half_precision \
    --resolution 512 \
    --seed 42 \
    --num_samples 5 \
    --scale_object $SOBJ \
    --scale_others 0.0 \
    --disvisioner_path $DV_PATH \
    --envisioner_path $EV_PATH \
    --infer_image $IMAGE_PATH \
    --class_name $CLASS_NAME \
    --infer_prompt $PROMPT \
    --output_dir $YOUR_OUTDIR

$SOBJ: The scale for the customized object. Default: 0.7.
$DV_PATH: The path of pre-trained disvisioner model. Default: disenvisioner_models/disenvisioner/disvisioner.pt.
$EV_PATH: The path of pre-trained envisioner model. Default: disenvisioner_models/disenvisioner/envisioner.pt.
$IMAGE_PATH: The path of the input image which contains your customized object.
$CLASS_NAME: The class name of your customized object.
$PROMPT: Editing prompt.
$YOUR_OUTDIR: The output directory.

(Optional) For non-live objects, we recommand running the following script, which incorporates the weights of IP-Adapter, to enhance object details and improve ID consistency. (OR RUN: bash run_disenvisioner_w_ip.sh)

CUDA_VISIBLE_DEVICES=0 python run_disenvisioner_w_ip.py \
    --pretrained_model_name_or_path "SG161222/Realistic_Vision_V4.0_noVAE" \
    --pretrained_CLIP "openai/clip-vit-large-patch14" \
    --ip_image_encoder_path "disenvisioner_models/image_encoder" \
    --half_precision \
    --resolution 512 \
    --seed 42 \
    --num_samples 1 \
    --scale_object $SOBJ \
    --scale_others 0.0 \
    --scale_ip $SIP \
    --disvisioner_path $DV_PATH \
    --envisioner_path $EV_PATH \
    --infer_image $IMAGE_PATH \
    --class_name $CLASS_NAME \
    --infer_prompt $PROMPT \
    --output_dir $OUTDIR

$SIP: The scale of the image embedding from IP-Adapter.

🖼️ Generation Examples

Here, we provide some example results. The first column is the input image.

For the input image: assets/example_inputs/dog.jpg :

dog/"best quality, high quality, a dog is running"/scale_object=0.7/seed=42

dog/"best quality, high quality, a dog standing in front of a fountain"/scale_object=0.7/seed=42

dog/"best quality, high quality, a dog with Zebra-like pattern"/scale_object=0.7/seed=42

dog/"best quality, high quality, a dog in a purple wizard outfit"/scale_object=0.7/seed=42

🎓 Citation

If you find our work useful in your research, please consider citing our paper:

@article{he2024disenvisioner,
    title={DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation},
    author={Jing He and Haodong Li and Yongzhe Hu and Guibao Shen and Yingjie Cai and Weichao Qiu and Ying-Cong Chen},
    journal={arXiv preprint arXiv:2410.02067},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
disvisioner_modules		disvisioner_modules
envisioner_modules		envisioner_modules
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
global_var.py		global_var.py
requirements.txt		requirements.txt
run_disenvisioner.py		run_disenvisioner.py
run_disenvisioner.sh		run_disenvisioner.sh
run_disenvisioner_w_ip.py		run_disenvisioner_w_ip.py
run_disenvisioner_w_ip.sh		run_disenvisioner_w_ip.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation

📢 News

🛠️ Setup

🕹️ Usage

Download pre-trained models (~3.5GB) (requires git-lfs)

Test on your own images

🖼️ Generation Examples

🎓 Citation

About

Releases

Packages

Contributors 2

Languages

License

EnVision-Research/DisEnvisioner

Folders and files

Latest commit

History

Repository files navigation

DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation

📢 News

🛠️ Setup

🕹️ Usage

Download pre-trained models (~3.5GB) (requires git-lfs)

Test on your own images

🖼️ Generation Examples

🎓 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages