Jing He1✱, Haodong Li1✱, Yongzhe Hu, Guibao Shen1, Yingjie Cai3, Weichao Qiu3, Ying-Cong Chen1,2✉
1HKUST(GZ)
2HKUST
3Noah's Ark Lab
✱Both authors contributed equally.
✉Corresponding author.
We present DisEnvisioner, without cumbersome tuning or relying on multiple reference images, DisEnvisioner is capable of generating a variety of exceptional customized images. Characterized by its emphasis on the interpretation of subject-essential attributes, DisEnvisioner effectively discerns and enhances the subject-essential feature while filtering out irrelevant attributes, achieving superior personalizing quality in both editability and ID consistency.
- 2024-10-25: The inference code is now available.
- 2024-10-04: Paper released.
We have tested the inference code on: Ubuntu 20.04 LTS, Python 3.9, CUDA 12.3, NVIDIA A800-SXM4-80GB.
- Clone the repository (requires git):
git clone https://github.com/EnVision-Research/DisEnvisioner.git
cd DisEnvisioner
- Install dependencies (requires conda):
conda create -n disenvisioner python=3.9 -y
conda activate disenvisioner
pip install -r requirements.txt
# it may take a few minutes :)
git lfs install
git clone https://huggingface.co/jingheya/disenvisioner_models
You can use the following script to generate customized images.
(OR RUN: bash run_disenvisioner.sh
)
CUDA_VISIBLE_DEVICES=0 python run_disenvisioner.py \
--pretrained_model_name_or_path "SG161222/Realistic_Vision_V4.0_noVAE" \
--pretrained_CLIP "openai/clip-vit-large-patch14" \
--half_precision \
--resolution 512 \
--seed 42 \
--num_samples 5 \
--scale_object $SOBJ \
--scale_others 0.0 \
--disvisioner_path $DV_PATH \
--envisioner_path $EV_PATH \
--infer_image $IMAGE_PATH \
--class_name $CLASS_NAME \
--infer_prompt $PROMPT \
--output_dir $YOUR_OUTDIR
$SOBJ
: The scale for the customized object. Default: 0.7.$DV_PATH
: The path of pre-trained disvisioner model. Default:disenvisioner_models/disenvisioner/disvisioner.pt
.$EV_PATH
: The path of pre-trained envisioner model. Default:disenvisioner_models/disenvisioner/envisioner.pt
.$IMAGE_PATH
: The path of the input image which contains your customized object.$CLASS_NAME
: The class name of your customized object.$PROMPT
: Editing prompt.$YOUR_OUTDIR
: The output directory.
(Optional) For non-live objects, we recommand running the following script, which incorporates the weights of IP-Adapter, to enhance object details and improve ID consistency.
(OR RUN: bash run_disenvisioner_w_ip.sh
)
CUDA_VISIBLE_DEVICES=0 python run_disenvisioner_w_ip.py \
--pretrained_model_name_or_path "SG161222/Realistic_Vision_V4.0_noVAE" \
--pretrained_CLIP "openai/clip-vit-large-patch14" \
--ip_image_encoder_path "disenvisioner_models/image_encoder" \
--half_precision \
--resolution 512 \
--seed 42 \
--num_samples 1 \
--scale_object $SOBJ \
--scale_others 0.0 \
--scale_ip $SIP \
--disvisioner_path $DV_PATH \
--envisioner_path $EV_PATH \
--infer_image $IMAGE_PATH \
--class_name $CLASS_NAME \
--infer_prompt $PROMPT \
--output_dir $OUTDIR
$SIP
: The scale of the image embedding from IP-Adapter.
Here, we provide some example results. The first column is the input image.
For the input image: assets/example_inputs/dog.jpg
:
If you find our work useful in your research, please consider citing our paper:
@article{he2024disenvisioner,
title={DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation},
author={Jing He and Haodong Li and Yongzhe Hu and Guibao Shen and Yingjie Cai and Weichao Qiu and Ying-Cong Chen},
journal={arXiv preprint arXiv:2410.02067},
year={2024}
}