IMF

Recent advancements in video modeling extensively rely on optical flow to represent the relationships across frames, but this approach often lacks efficiency and fails to model the probability of the intrinsic motion of objects. In addition, conventional encoder-decoder frameworks in video processing focus on modeling the correlation in the encoder, leading to limited generative capabilities and redundant intermediate representations. To address these challenges, this paper proposes a novel Implicit Motion Function (IMF) method. Our approach utilizes a low-dimensional latent token as the implicit representation, along with the use of cross-attention, to implicitly model the correlation between frames. This enables the implicit modeling of temporal correlations and understanding of object motions. Our method not only improves sparsity and efficiency in representation but also explores the generative capabilities of the decoder by integrating correlation modeling within it. The IMF framework facilitates video editing and other generative tasks by allowing the direct manipulation of latent tokens. We validate the effectiveness of IMF through extensive experiments on multiple video tasks, demonstrating superior performance in terms of reconstructed video quality, compression efficiency and generation ability.

Implicit Motion Function

Abstract

Talking Head Video Reconstruction

General Video Reconstruction

Comparisons with Explicit Methods

Free Head Pose Editing

BibTeX