Semantic Score Distillation Sampling for Compositional Text-to-3D Generation

TL;DR: Official repo of SemanticSDS. By leveraging program-aided layout planning, augmenting 3D Gaussians with semantic embeddings, and guiding SDS with rendered semantic maps, SemanticSDS unlocks the compositional capabilities of pre-trained diffusion models, generating complex 3D scenes comprising multiple objects with various attributes.

Click for full abstract

Generating high-quality 3D assets from textual descriptions remains a pivotal challenge in computer graphics and vision research. Due to the scarcity of 3D data, state-of-the-art approaches utilize pre-trained 2D diffusion priors, optimized through Score Distillation Sampling (SDS). Despite progress, crafting complex 3D scenes featuring multiple objects or intricate interactions is still difficult. To tackle this, recent methods have incorporated box or layout guidance. However, these layout-guided compositional methods often struggle to provide fine-grained control, as they are generally coarse and lack expressiveness. To overcome these challenges, we introduce a novel SDS approach, Semantic Score Distillation Sampling (SemanticSDS), designed to effectively improve the expressiveness and accuracy of compositional text-to-3D generation. Our approach integrates new semantic embeddings that maintain consistency across different rendering views and clearly differentiate between various objects and parts. These embeddings are transformed into a semantic map, which directs a region-specific SDS process, enabling precise optimization and compositional generation. By leveraging explicit semantic guidance, our method unlocks the compositional capabilities of existing pre-trained diffusion models, thereby achieving superior quality in 3D content generation, particularly for complex objects and scenes. Experimental results demonstrate that our SemanticSDS framework is highly effective for generating state-of-the-art complex 3D content.

Video results

rabbit.mp4

A rabbit sits atop a large, expensive watch with many shiny gears, made half of iron and half of gold, eating a birthday cake that is in front of the rabbit.

corgi_car_house.mp4

A corgi is positioned to the left of a LEGO house, while a car with its front half made of cheese and its rear half made of sushi is situated to the right of the house made of LEGO.

hamburger.mp4

A hamburger, a loaf of bread, an order of fries, and a cup of Coke.

cookie.mp4

A cozy scene with a plush triceratops toy surrounded by a plate of chocolate chip cookies, a glistening cinnamon roll, and a flaky croissant.

More results

mannequin.mp4

A mannequin adorned with a dress made of feathers and moss stands at the center, flanked by a vase with a single blue tulip and another with blue roses.

pyramid.mp4

A pyramid-shaped burrito artistically blended with the Great Pyramid.

train.mp4

A train with a front made of cake and a back of a steam engine.

Quick Start

Install the requirements:

It is recommended to use CUDA 11.8, but other versions should also work fine.
```
conda create -n semanticSDS python=3.9
conda activate semanticSDS
pip install -r requirements.txt
```
To install PyTorch3D, follow the instructions from PyTorch3D's official installation guide. Here is a quick rundown to install it from a stable branch with CUDA support:
```
git clone --branch stable https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d
FORCE_CUDA=1 pip install .
```
(Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions:
```
pip install ninja
```
Build the extension for Gaussian Splatting:
```
cd gs
./build.sh
```
Start! Run the main program using the following command. Make sure to specify the appropriate config name:
```
python main.py --config-name=pyramid
```

Program-aided layout planning

Add your OpenAI API key in the generate_layouts_PAL.py file at the location marked with a # openai token comment.

Single Prompt Usage

To generate a layout for a single user prompt, use the following command:

python generate_layouts_PAL.py --template_version "v0.14_PAL_updated_incontext" --llm_name "gpt-4-32k" --user_prompt "A corgi is situated to the left of a house, while a car is positioned to the right of the house. The car above is split into two layers along the depth axis. The front layer of the car is constructed from wood. The left half of the rear layer is made of sushi, and the right half is made of cheese."

You can view the results in the layouts_cache folder.

Batch Prompt Usage

For processing multiple prompts, you can use a batch file containing the prompts. Use the following command, replacing ./prompts_to_ask.txt with the path to your text file:

python generate_layouts_PAL.py --template_version "v0.14_PAL_updated_incontext" --llm_name "gpt-4-32k" --batch_prompt_file "./prompts_to_ask.txt"

If you are not satisfied with the responses obtained before, add --disable_response_cache to your command.

Running the Main Program

Configuration: Edit the .yaml files in the conf folder:
- Set llm_init.enabled to true.
- Update json_path to point to the normalized.json file for the corresponding layouts:
```
llm_init:
  enabled: true
  json_path: layouts_cache/v0.14_PAL_updated_incontext/gpt-4-32k/mannequin/normalized.json
```
Execute: Run the main program with the appropriate config name:
```
python main.py --config-name=mannequin
```

Tips

To resolve ImportError: libXrender.so.1: cannot open shared object file: No such file or directory, try:
```
apt-get update && apt-get install libxrender1
```
To resume from a checkpoint:
- Add +ckpt=<path_to_your_ckpt> to the run command, where <path_to_your_ckpt> is the actual path to your .pt checkpoint file.
- Ensure that the .yaml configuration file you're using is the same as the one used to save the checkpoint.
Example command to resume from a checkpoint:
```
python main.py --config-name=car +ckpt="checkpoints/a_dslr_photo_of_a_car_made_out_of_lego/2024-10-13/080848/ckpts/step_2000.pt"
```
Enable wandb: To monitor and log your runs using Weights & Biases (wandb.ai), add wandb=true to the run command.

Acknowledgement

Thanks to the awesome open-source projects of GSGEN for their outstanding work, which have significantly contributed to this codebase.

Citation

@article{yang2024semanticsds,
  title={Semantic Score Distillation Sampling for Compositional Text-to-3D Generation},
  author={Yang, Ling and Zhang, Zixiang and Han, Junlin and Zeng, Bohan and Li, Runjia and Torr, Philip and Zhang, Wentao},
  journal={arXiv preprint arXiv:2410.09009},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
conf		conf
data		data
gs		gs
guidance		guidance
layouts_cache/v0.14_PAL_updated_incontext/gpt-4-32k/mannequin		layouts_cache/v0.14_PAL_updated_incontext/gpt-4-32k/mannequin
point_e		point_e
prompt		prompt
shap_e		shap_e
templates		templates
utils		utils
README.md		README.md
generate_layouts_PAL.py		generate_layouts_PAL.py
main.py		main.py
requirements.txt		requirements.txt
sweep.py		sweep.py
trainer.py		trainer.py
vis.py		vis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Score Distillation Sampling for Compositional Text-to-3D Generation

Video results

Quick Start

Program-aided layout planning

Single Prompt Usage

Batch Prompt Usage

Running the Main Program

Tips

Acknowledgement

Citation

About

Releases

Packages

Contributors 2

Languages

YangLing0818/SemanticSDS-3D

Folders and files

Latest commit

History

Repository files navigation

Semantic Score Distillation Sampling for Compositional Text-to-3D Generation

Video results

Quick Start

Program-aided layout planning

Single Prompt Usage

Batch Prompt Usage

Running the Main Program

Tips

Acknowledgement

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages