From Video Generation to World Model
CVPR 2025 Tutorial
6/11/2025 9:00-17:00 (GMT-5)
Room 204
Introduction
In recent years, the research community has made significant strides in generative models, particularly in the area of video generation. Despite challenges in generating temporally coherent and physically realistic videos, recent breakthroughs such as SORA, Genie, and MovieGen show promising progress toward controllable, high-fidelity visual world models. This tutorial offers a deep dive into recent advances in text-to-video generation, diffusion-based video models, and the bridge from generative video to physical and interactive world modeling. We aim to provide attendees with a comprehensive understanding of these cutting-edge methods and how they contribute to building embodied world models.
Schedule
Time (GMT-5) | Programme |
---|---|
09:20 - 09:30 | Opening Remarks |
09:40 - 10:20 |
Invited Talk: The Placeholder of the Talk Title ![]()
Jack Parker-Holder
Research Scientist, Google DeepMind |
10:20 - 10:40 | Coffee Break |
10:40 - 11:20 |
Invited Talk: The Placeholder of the Talk Title ![]()
Hong-Xing "Koven" Yu
Ph.D. candidate at Stanford University |
11:20 - 13:30 | Lunch Break |
13:30 - 14:10 |
Invited Talk: Breaking the Algorithmic Ceiling in Pre-Training with an Inference-first Perspective ![]()
Jiaming Song
Chief Scientist at Luma AI |
14:10 - 14:20 | Coffee Break |
14:20 - 15:10 |
Invited Talk: The Placeholder of the Talk Title ![]()
Pengfei Wan
Head of KLing AI, Kuaishou |
15:20 - 15:30 | Coffee Break |
15:30 - 16:00 |
Invited Talk: The Placeholder of the Talk Title ![]()
Angjoo Kanazawa
Assistant Professor, UC Berkeley |
16:00 - 16:10 | Coffee Break |
16:10 - 16:50 |
Generative World Modeling for Embodied Learning ![]()
Sherry Yang
Assistant Professor, New York University |
16:50 - 17:00 | Ending Remarks (Lucky Draw) |