Project Plan: Depthmap to 3d scene encoder.

Phase One: Encoder from Visual to 3D

Dataset

Use the SYNTHIA dataset or similar.
Preprocess the depth maps into point clouds for alignment during training.

Goals

Develop a system to encode a depth map into 3D scene representations:
- Prototypes: Differentiable 3D meshes representing objects.
- Object Parameters: Each object is parameterized by scale, position, orientation, and prototype weights.

Steps

Input and Network Design
- Input: Depth map.
- Encoder Architecture:
  - Use a ResNet or similar to extract a latent vector for each object slot.
  - Decompose as follows:
    - Scale: scale = vecta[:1]
    - Transform (Position & Orientation): transform = vecta[1:7] (x, y, z, yaw, pitch, roll)
    - Prototype Weights: logits = vecta[7:x]
- Apply softmax to logits to assign a weighted combination of prototypes, allowing smooth interpolation between prototypes.
Prototype Handling
- Define a bank of differentiable 3D meshes (prototypes).
- Blurring Prototypes: interpolate between prototypes based on softmax weights to enable gradient flow.
Scene Reconstruction
- For each object:
  - Select and blend prototypes using the softmax-weighted combination of logits.
  - Transform prototypes using scale and transform.
- Render for fun.
Loss Function
- Use one directional point cloud loss (e.g., Chamfer Distance):
  - Compare the transformed meshes with the point cloud derived from the input depth map.
  - For each point in the depth map's point cloud, find the closest point on any mesh and bring it closer.
Output
- Train the network to minimize loss, learning prototypes and how to use them to reconstruct 3D scenes from depth maps.

Phase Two: Autoregressive Model for Video Sequences

Dataset

Use SYNTHIA's video sequences for temporal data with camera motion and object dynamics.

Goals

Extend the system to process video input, learning object trajectories and enforcing temporal consistency.

Steps

Slot-Based Representation
- Represent each object slot with a vector for each frame.
- Penalize changes in prototype weights to maintain object consistency across frames.
Temporal Regularization
- Motion Loss: Apply regularization for natural motion patterns, such as:
  - Parabolic trajectories for object movement.
  - Smooth transitions in object transformations (scale, position, orientation).
- Loss Application: Apply motion loss to all object transformations
Camera Integration (optional as motion is relative, but might help)
- Add a camera vector (position and rotation) to model relative camera motion.
- Use the camera vector to align object transformations with the global frame.
Loss Function
- Temporal Consistency Loss: Penalize abrupt changes in object parameters across frames.
- Depth Alignment: Continue using depth map and point cloud losses for each frame.
Output
- Train the network to reconstruct consistent 3D scenes across video frames.

Stretch Goals

1. Movement Cycles for Prototypes

Add specific motion cycles (e.g., wheel turning, walking) for manipulating prototypes.
Incorporate motion cycle parameters into the object vectors.

2. Associating Prototypes with Words

Map prototypes to nouns and motions to verbs for semantic understanding.
Use language models to integrate text annotations with visual data.

3. Coupling Objects

Learn relationships between objects, such as:
- A person entering a car and moving with it.
Model coupled dynamics in the loss function.

4. Prototypical Textures

Extend the system to learn textures (e.g., colors, fonts) for prototypes.
Render scenes with both 3D structure and textural realism.

4. Hierarcical protoypes

Something more like this: (https://arxiv.org/pdf/1905.05622)
Wheels are on bikes, cars... find a way to have these objects share prototypes.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
dataloader.py		dataloader.py
debug_figure.png		debug_figure.png
debug_figure_sqrt_scale.png		debug_figure_sqrt_scale.png
loss.py		loss.py
main.py		main.py
max_depth.py		max_depth.py
old_loss.py		old_loss.py
pyproject.toml		pyproject.toml
training_animation.gif		training_animation.gif
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Plan: Depthmap to 3d scene encoder.

Phase One: Encoder from Visual to 3D

Dataset

Goals

Steps

Phase Two: Autoregressive Model for Video Sequences

Dataset

Goals

Steps

Stretch Goals

1. Movement Cycles for Prototypes

2. Associating Prototypes with Words

3. Coupling Objects

4. Prototypical Textures

4. Hierarcical protoypes

About

Releases

Packages

Languages

grahamseamans/proto_depth

Folders and files

Latest commit

History

Repository files navigation

Project Plan: Depthmap to 3d scene encoder.

Phase One: Encoder from Visual to 3D

Dataset

Goals

Steps

Phase Two: Autoregressive Model for Video Sequences

Dataset

Goals

Steps

Stretch Goals

1. Movement Cycles for Prototypes

2. Associating Prototypes with Words

3. Coupling Objects

4. Prototypical Textures

4. Hierarcical protoypes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages