Method Overview

We propose a novel zero-shot video generation method, named MotionCraft, where an image (real or generated), serving as a starting frame $I^0$, is animated according to a physical simulation, by means of a (possibly time-varying) optical-flow generator $\mathcal{W}$ in the noise latent space. The outcome is a video made of $N$ frames $I^0,\dots,I^{N-1}$ that follows the motion prescribed by the physical simulation and evolves the content of the first frame coherently. Inspired by the previous observation, this animation is obtained by warping the noisy latent representation of an image in the latent diffusion space. Regarding the physics simulation for the optical flow generation, we use different libraries to simulate different physics, as explained in the experimental section, such as fluid dynamics, rigid motion and multi-agent systems. It is also possible, albeit not shown in this paper, to use animation software to generate the required optical flows.

MotionCraft Overview