High-Fidelity and Freely Controllable Talking Head Video Generation

Microsoft Research

PEChead generates high-fidelity head pose and expression editing results.

Abstract

Talking head generation is to generate video based on a given source identity and target motion. However, current methods face several challenges that limit the quality and controllability of the generated videos. First, the generated face often has unexpected deformation and severe distortions. Second, the driving image does not explicitly disentangle movement-relevant information, such as poses and expressions, which restricts the manipulation of different attributes during generation. Third, the generated videos tend to have flickering artifacts due to the inconsistency of the extracted landmarks between adjacent frames.

In this paper, we propose a novel model that produces high-fidelity talking head videos with free control over head pose and expression. Our method leverages both self-supervised learned landmarks and 3D face model-based landmarks to model the motion. We also introduce a novel motion-aware multi-scale feature alignment module to effectively transfer the motion without face distortion. Furthermore, we enhance the smoothness of the synthesized talking head videos with a feature context adaptation and propagation module. We evaluate our model on challenging datasets and demonstrate its state-of-the-art performance.

Head Pose Free Editing

Using PECHead you can create great visual results for head pose and expression editing.

Portrait Frontalization

Using PECHead you can create great visual results for portrait frontalization.

Same-Identity Video Reconstruction

Using PECHead you can create great visual results for portrait reconstruction.

Cross-Identity Video Face Reenactment

Using PECHead you can create great visual results for portrait reenactment.

Facial Expression Transferring

Using PECHead you can create great visual results for expression editing.

Video Face Reenactment on Wild Identities

Using PECHead you can create great visual results for portrait reconstruction on wild identities.

BibTeX

@inproceedings{gao2023high,
  title={High-fidelity and freely controllable talking head video generation},
  author={Gao, Yue and Zhou, Yuan and Wang, Jinglu and Li, Xiao and Ming, Xiang and Lu, Yan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5609--5619},
  year={2023}
}