Microsoft VASA-1( Unreleased)

Lifelike Audio-Driven Talking Faces Generated in Real Time

VASA-1:  single portrait photo + speech audio = hyper-realistic talking face video with precise lip-audio synclifelike facial behavior, and naturalistic head movements, generated in real time.

Realism and liveliness

Our method is capable of not only producing precious lip-audio synchronization, but also generating a large spectrum of expressive facial nuances and natural head motions. It can handle arbitary-length audio and stably output seamless talking face videos.



