A paper list that includes world models or generative video models for embodied agents. The papers with real robot experiments are marked with 🤖. The papers with open-sourced code are marked with 🌟.
-
[arXiv 2024.11] Understanding World or Predicting Future? A Comprehensive Survey of World Models [paper]
-
[arXiv 2024.07] Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI [paper] [repo]
-
[arXiv 2024.05] Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond [paper] [repo]
-
🌟[paper 2025.01] GameFactory: Creating New Games with Generative Interactive Videos [paper] [website] [code]
-
🌟[paper 2025.01] Cosmos World Foundation Model Platform for Physical AI [paper] [website] [code]
-
[arXiv 2025.01] EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation [paper] [website]
-
[arXiv 2024.12] GenEx: Generating an Explorable World [paper] [website]
-
[blog 2024.12] Genie 2: A large-scale foundation world model [blog]
-
🌟[arXiv 2024.12] PlayGen: Playable Game Generation [paper] [website] [code]
-
🌟[arXiv 2024.12] Motion Dreamer: Realizing Physically Coherent Video Generation through Scene-Aware Motion Reasoning [paper] [website] [code]
-
[arXiv 2024.10] EVA: An Embodied World Model for Future Video Anticipation [paper]
-
🌟[arXiv 2024.10] AVID: Adapting Video Diffusion Models to World Models [paper] [website] [code]
-
🌟[blog 2024.10] Oasis: A Universe in a Transformer [blog] [website] [code]
-
[arXiv 2024.08] GameNGen: Diffusion Models Are Real-Time Game Engines [paper] [website]
-
🌟[arXiv 2024.06] IRASim: Learning Interactive Real-Robot Action Simulators [paper] [website] [code]
-
🌟[arXiv 2024.06] Pandora: Towards General World Model with Natural Language Actions and Video States [paper] [website] [code]
-
🌟[arXiv 2024.05] iVideoGPT: Interactive VideoGPTs are Scalable World Models [paper] [website] [code]
NeurIPS 2024
-
🌟[arXiv 2024.05] DIAMOND: Diffusion for World Modeling: Visual Details Matter in Atari [paper] [website] [code]
NeurIPS 2024 Spotlight
-
🌟[arXiv 2024.04] RoboDreamer: Learning Compositional World Models for Robot Imagination [paper] [website] [code]
ICML 2024
-
🌟[arXiv 2024.03] 3D-VLA: A 3D Vision-Language-Action Generative World Model [paper] [website] [code]
ICML 2024
-
[arXiv 2024.02] Genie: Generative Interactive Environments [paper] [website]
ICML 2024 Best Paper
-
🤖[arXiv 2023.10] UniSim: Learning Interactive Real-World Simulators [paper] [website]
ICLR 2024
-
[arXiv 2023.02] UniPi: Learning Universal Policies via Text-Guided Video Generation [paper] [website]
NeurIPS 2023