This repository provides a fork of the 🤗 Diffusers library with an example script for LoRA training on the new FLUX.1-Fill models. The script isn't optimized and was just tested on an NVIDIA A100 GPU. If anyone has a similar script for frameworks like SimpleTuner or SD-scripts, that run on consumer hardware, I would be more than happy to hear!
The provided script implements a specific masking strategy, in my case applying a mask to the right half of the image. If your use case requires a different masking approach, you’ll need to adapt the random_mask
function accordingly.
Note:
Validation images and masks are currently hardcoded in the script. You will need to modify these to suit your dataset. See the lines:
val_image = load_image("https://huggingface.co/datasets/sebastianzok/validationImageAndMask/resolve/main/image.png")
val_mask = load_image("https://huggingface.co/datasets/sebastianzok/validationImageAndMask/resolve/main/mask.png")
Known Issue Validation only works at the start and end of training. During intermediate validation steps, only black images occur (See this open issue). Luckily the LoRa was able to catch my concept just with 300 steps, so I did not really depend on the validation images.
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install -e .
Then cd in the examples/research_projects/dreambooth_inpaint
folder and run
pip install -r requirements_flux.txt
And initialize an 🤗Accelerate environment with:
accelerate config
Or for a default accelerate configuration without answering questions about your environment
accelerate config default
Or if your environment doesn't support an interactive shell (e.g., a notebook)
from accelerate.utils import write_basic_config
write_basic_config()
When running accelerate config
, if we specify torch compile mode to True there can be dramatic speedups.
Note also that we use PEFT library as backend for LoRA training, make sure to have peft>=0.6.0
installed in your environment.
For my case the dataset consisted of just plain images without image captions. Since I trained the LoRa on a specific task, I used the instance_prompt parameter for all generations. This is much more convinient than the in-context LoRa approach, that I used to learn concepts using the normal FLUX.1-dev model. Also there are no mask images, since it was hard coded for my use case (see random_mask).
Now, we can launch training using:
export MODEL_NAME="black-forest-labs/FLUX.1-Fill-dev"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="trained-flux"
accelerate launch train_dreambooth_inpaint_lora_flux.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--mixed_precision="bf16" \
--instance_prompt="A character turnaround 45-degreed to the left" \
--resolution=1024 \
--train_batch_size=1 \
--guidance_scale=1 \
--gradient_accumulation_steps=4 \
--optimizer="prodigy" \
--learning_rate=1. \
--report_to="wandb" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--validation_prompt="A character turnaround 45-degreed to the left" \
--validation_epochs=25 \
--seed="0" \
--push_to_hub
As you might have noticed there is a lot of room for improvement 🙃. Feel free to open issues or submit pull requests to improve this project. If you have insights on adapting this script for other frameworks like SimpleTuner, please share your experiences!