Here I list papers about learning different tasks in latent space.
- Neural Discrete Representation Learning (VQ-VAE). [code]
- Latent Diffusion Models. [paper] [code]
- Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models. [paper] [code]
- Scalable Diffusion Models with Transformers (DiT). [code]
- Language Quantized AutoEncoders. [code]
- Use fixed RoBerta codebook for image encoder pretraining
- Learning Generalizable Feature Fields for Mobile Manipulation. [project]
- Improving Vision-and-Language Navigation by Generating Future-View Image Semantics. [project]
- CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory. [code]
- Latent Plans for Task-Agnostic Offline Reinforcement Learning. [project]
- Fourier latent dynamics. [code]
- Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings. [code]
- PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs. [paper].
- Insight: VLM strugglese to produce precise spatial outputs directly, but they can readily select among a discrete set of coarse choices, and this in turn can be used to refine the set to provide more precise choices at the next iteration.