I am not a Robot, I am a GenAI Multimodal Agent

As I told you yesterday, one of the disciplines in which GenAI Multimodal Models can be used in cybersecurity is to resolve problems. Cognitive Captchas that prevent automated dictionary, brute force or simply WebScraping so used in Offensive Security and Red Team. But having these Multimodal LLMs They can be skipped more or less easily.

Figure 1: Captcha Story X – I am not a Robot, I am a GenAI

Multimodal Agent

I have already written several articles on these topics that I have left you here. some to jump Cognitive Captchas audio, text or image, but above all for solving semantic understanding problems, whether text or visual.

Today I’m going to bring you some more that I’ve been seeing out there, and that have caught my attention. The first of them is one of the ones that I suffer the most due to my beloved presbyopia, and it is more about visual acuity than cognitive ability. It is about recognizing in which squares there is a certain object.

Figure 2: The Visual Acuity Captcha.

Dude, where’s my car?

In the previous example it is an array of images in which you have to search for the “cars“. Give this completed puzzle to GPT-4o It didn’t work very well for us. But by pre-processing and cropping the image (which always has the same size) and using GPT4-Vision serves to solve the problem.

6ce7a06e1c.jpg

Figure 3: Azure AI Studio with GPT4-Vision says your car is not here

Just go through the images one by one and ask if there is what they asked for in the question. Cognitive Captcha. It is not at all complex to skip it nowadays.

711e782848.jpg

Figure 4: Azure AI Studio with GPT4-Vision says there ARE cars here

I liked the next one, because it is a Cognitive Captcha that wants you to know how to play Chess. It is about winning the game with a black move.

c9a08ff304.jpg

Figure 5: The Captcha for Playing Chess

If you’ve played a little, it’s as easy as bringing the rook all the way in front of the king, and that’s it. But trying it with Azure AI Studio with GPT4-Vision, the result is that the pieces, and the board, are invented. It doesn’t hit the nail on the head.

0ba8dd290a.jpg

Figure 6: Azure AI Studio with GTP4-Vision nails it like a champ. FAIL

But my colleague Julián Isla tried it in ChatGPT-4o and the result was perfect, so that Cognitive Captcha nor would it prevent automated attacks today.


1a12e88a93.jpg

Figure 7: ChatGPT with GPT-4o gets it right the first time

And to finish two of the classics. One of those that cause war if you have dyslexia or astigmatism, which my dear Iñaki Ayucar tried, and which solves perfectly the first time with ChatGPT-4o. Which demonstrates the power of automating this in certain attacks to bypass the Cognitive Captcha.

fcb3bebbd9.jpg

Figure 8: This Visual Acuity Captcha eats it right away

But this one that I have seen, which is more complicated, has been a party. I have felt like when I go to the eye doctor and I don’t get the letters right but the ophthalmologist gives me clues so that I can get it right. I leave you the conversation that is very funny.

a246e2b616.jpg

Figure 9: Nothing. I can’t figure out the second part.

(Azure AI Studio GPT4-Vision)

I am going to continue trying to get him to notice the letters that are wrong, step by step, but as you will see, in the end he gets into a loop and there is no way out.

d2ed19d792.jpg

Figure 10: In the end I told him.

But at least it has been appreciated. Yes indeed. I’ve had a good time trying to get him to see it. Like the eye doctor does with me. That’s why I’m so empathetic.

b5c6e87718.jpg

Figure 11: Azure AI Studio GPT4-Vision appreciates patience

In the end it is not that it is not resolved, it is that as happens to us, there are errors. The services of Artificial vision They have Human Parity, not Perfection, which is why they suffer, like us, from hallucinations. That does not mean that they are not useful to solve these Cognitive Captchas of Visual acuitybut rather they resolve them at a (high) percentage, as would happen to us.

The funny thing is that I threw the bone to Julián Isla, and with ChatGPT-4ohe suffered a little, but…in the end, by offering him money…he almost got it.

aabf45748d.jpg

Figure 13: First attempt at ChatGPT-4o lightly

We give you the Strike-1 and we ask you to try again. Let’s see if in this second one there is more aim doing it letter by letter.

231b84fca7.jpg

Figure 14: It improves, but it is not right.

As you can see, it has improved but it has not solved it. So it’s time to offer him money and tell him that it is January (there are many theories about this), that this will change his attention a little, by expanding the context and forcing him to generate content close to other contexts. And we see that the result is that it has been very close to the result.

d7c8298d5c.jpg

Figure 15: Almost, almost, almost. She’s missing an “e”

But yes, he has eaten one “and“, so this visual hallucination seems to be one of the difficult ones for the emergency services to control. Artificial vision what we have here. However, find Cognitive Captchas that cannot be skipped with the models LLM multimodal is becoming increasingly complicated.

Evil Greetings!

4a4cc0b13a.jpg
 
For Latest Updates Follow us on Google News
 

-

PREV This RTX 4070 is on sale again and can be yours for 110 euros less than the recommended price
NEXT I’ve started using HomeKit instead of Alexa, and you should do the same