With its new AI tool, Microsoft achieves realistic, expressive and synchronized avatars in videos

Microsoft has presented its work on a new artificial intelligence (AI) model that gives expressiveness to realistic avatars while they appear speaking in videos generated from a static image and a voice clip.

VASA is Microsoft’s proposal to generate virtual faces that speak and gesture with great expressiveness and realism in real time, in which the movement of the lips is “exquisitely synchronized with the audio.”

The faces that pretend to be real people have been generated by AI tools StyleGAN2 and DALL·E-3, but none of them correspond to a real identity, as the technology company clarifies.

This realism is reinforced by synchronization and “the great spectrum of emotions and facial nuances” that combines with the natural movement of the head, as he explains on his official blog.

VASA requires only a static image and a snippet of audio with voice to create videos with a size of 512 x 512 pixels at 45 frames per second in ‘offline’ mode, although ‘online’ supports 40fps with a latency of 170ms. The company has evaluated it with a desktop computer equipped with an NVIDIA RTX 4090 GPU.

Microsoft has assured that it does not plan to release this demo of this tool given the potential risks it has for its misuse in the impersonation of real people.

For Latest Updates Follow us on Google News

Related posts