A GenAI model for “Visual Affective Skills” that enhances the expressiveness of Digital Humans (and DeepFakes & FakeNews)

(and from DeepFakes & FakeNews)

For you to understand the work, you must first understand what these “Talking Heads” consist of that I already told you about in 2019, where with a single frame or a series of them, a model of communication can be applied. Transfer Learning to get a head movement to be made. A process that is also known as Face Renaissance, and of which you have explained the video of the year 2019.

Now with the model VASA-1the process goes further, and given a photograph plus an audio file that you want to mount with a “Talking Head“, you get an animation with “Visual Affective Skills” or what is the same, with very humanized gestures.

The final result is an animated file of the person used as input saying and gesturing very humanly to the input audio, with lip synchronization. To carry out the tests, the researchers used Synthetic Humans generated with StyleGAN-2 – I also told you about them in 2019 in the article: “Style GAN: An AI to create profiles of people who DO NOT exist” and where the website “This Person Does Not Exist” is discussed – and which allows creating people who do not exist from two photographs of humans (real or synthetic).

In their work, the researchers have stretched the model, managing to do this process at high quality in real time, generating the frames with the gesticulation, and the synchronization of the lips as they process the audio file, which could have an impact on the world of cybersecurity, such as DeepFakes almost perfect, or to accompany FakeNews more credible. This means that there are still no plans to release one API of the implemented model, nor a product. Here you can see how it works in Real time.

This is a work carried out by the company BeHumans, and of course these advances made by research such as VASA-1 They are oriented towards the positive part of the GenAIthat is, to further humanize people’s interactions with technology, increasing its adoption, and reducing the digital divide with older people, who will have an easier time using new digital services.

Figure 9: An AutoVerifAI Prototype video made by TID

On the other hand, for the game of Generating Content detection by GenAI, which is what we do at AutoVerifAI, these new algorithms force us to review the detection algorithms in order to find new ways to detect them, and see which ones work best. They detect signs of it.

Figure 10: DeepFakes video detection algorithms in AutoVerifAI

few signs are detected in the videos made with VASA-1 given its

extreme realism in “Visual Affection Skills”,

which makes them put a lot of detail into human micro-gestures.

For example, with the algorithms Headpose, Blink and LRCN/VIT that we now have in the free version, in this video of the article there are very few signs that it is a DeepFakewhile if we take a frame with only the person’s face, the indications increase, to be based on a StyleGan.

Figure 11: With GenAI Generated detection for images,

AutoVerifAI increases the indications that it is generated by AI by up to 28%.

But as you see, the works of perfection in GenAI To make Digital Humans more perfect, they demand more and more work to do a deterministic forensic analysis, and more and more tests have to be executed, in order to have a formed opinion.

Evil Greetings!

For Latest Updates Follow us on Google News

Related posts