Microsoft’s AI that creates hyper-realistic avatars

Microsoft has presented an artificial intelligence capable of generating Hyper-realistic avatars from an image and a voice file. VASA-1 can bring your photos to life by adding expressions, while synchronizing the movement of your lips with the sound clip. The end result is surprising and could revolutionize the way we interact in the digital world.

According to the researchers, VESA-1 captures the full range of human expressionsincluding natural head movements, to generate talking avatars truly credible. This is possible by separating elements such as facial features, head position and expressions, allowing detailed control of each attribute and the ability to edit the content separately.

VESA-1 goes beyond other artificial intelligence models that add audio to an image and synchronize lip movement. Researchers have created realistic expressions with movements in a defined space. This produces a more authentic and less rigid image.

“We consider all possible facial dynamics, including lip movement, expression (without lips), gaze and blinking, among others, as a single latent variable and model their probabilistic distribution in a unified way,” mention the authors of MicrosoftResearch. “Our holistic modeling of facial dynamics, together with jointly learned head movement patterns, leads to the generation of a wide range of emotive and realistic conversational behaviors.”

How VASA-1, Microsoft’s new artificial intelligence, works

Microsoft trained his model with a giant collection of videos with people talking. The idea was to create a system that could understand faces and separate different aspects of its, such as their identity, expression and head movement, assigning codes to each of them. These cues would be used to create new faces, allowing you to change someone’s expression in a video without affecting their identity, or make their head nod without altering their smile.

To achieve this, researchers They used a 3D approach to capture more details about the face and how it moves in a three-dimensional space. The diffusion model accepts additional cues, such as primary gaze direction and head distance, as well as emotions. With the same audio track, VASA-1 can generate happy, angry or nervous avatars (with excess of expressions) that seek to approach realism.

Microsoft VASA-1Microsoft VASA-1

VASA-1 can produce high-quality videos in a resolution of 512 x 512 pixels at 45 frames per second. The researchers highlighted its efficiency, since the tool can be run on a computer with an NVIDIA RTX 4090 GPU.

Microsoft’s artificial intelligence It is not limited to real photographs, but can also be applied in illustrations or paintings, like Mona Lisa singing Paparazzi. As additional information, all the examples presented were built from photographs generated with DALL-E 3 and StyleGAN2. “We are exploring affective visual abilities for virtual and interactive characters, NOT impersonating any person in the real world,” the company mentioned.

Hyperrealistic avatars could lead to misinformation

One of the latent dangers of these models is that could be used to deceive users. Given this, Microsoft declared that it is against any negative application and pointed out that they will not publish this tool until we are sure that their technology will be used responsibly.

“We oppose any behavior that creates misleading or harmful content from real people and are interested in applying our technique to advance counterfeit detection. We are dedicated to developing AI responsibly, with the goal of promoting human well-being”

Receive our newsletter every morning. A guide to understanding what matters in relation to technology, science and digital culture.


Ready! You are already subscribed

There was an error, refresh the page and try again

Also in Hipertextual:

For Latest Updates Follow us on Google News


PREV the new iPhone SE with Face ID will be the cheapest yet
NEXT Engineers Invent a “Dune Stillsuit” That Turns Pee and Sweat into Drinking Water