Some former MIT researchers have a particular solution

Cleanlab proposes scoring the responses of large language models

April 25, 2024, 9:30 p.m.

Updated April 26, 2024, 08:09

Chatbots have become one of the axes of the rise of artificial intelligence (AI). From ChatGPT and Copilot to Claude Chat and Perplexity, these tools are trending. However, as much as we get excited, we should not fully trust their answers.

And tell that to the lawyer who used ChatGPT to win a trial and discovered that the documents presented to the judge contained false judicial decisions, references and quotes. As we can see, chatbots have many virtues, but reliability is not one of them.

A possible solution to the reliability problem

A study published by a startup company founded by former Google employees suggests that chatbots have a hallucination rate of at least 3%. For many users this may be a minor problem, but things change when we talk about professional uses.

Tools powered by large language models (LLM) are reaching the business world through solutions like Copilot in Office 365. Now, if employees end up handling erroneous information, this could end up causing more than one headache for the firm. .

Cleanlab, a startup founded by former MIT researchers, has just launched its own initiative to address this problem. We are talking about a tool powered by what they call Trustworthy Language Model (TLM), an approach that aims to reliability of responses.

TLM works as a “trust layer” so that users can know how trustworthy the answer they just received is through a scoring system. This tool has been designed so that it can work in a complementary way to models such as GPT-3.5, GPT-4 and custom company models.

The system sends our question to several models and then analyzes their return. The answer will come accompanied with a score that will be between 0 and 1. In a simple test in which we asked the square root of nine we received a correct answer (3) with a score of 0.885.

Cleanlab points out that ChatGPT in its free version can get very simple things wrong. When asked how many times the letter “N” appears in the word “enter,” the OpenAI chatbot usually answers that the letter appears twice. We have tested it and indeed the chatbot responds incorrectly.

The startup imagines its technology being useful in a wide range of uses. They mention that it could help customer service chatbots to be more reliable. The chatbot would work automatically, but if one of the responses falls below the reliability threshold, human intervention could be requested.

If you are an artificial intelligence enthusiast you can try TLM through the web. The tool is also available through an API. It should be noted that the solution is available through free open source versions and paid versions with additional features.

Images | Cleanlab | Screenshot

In Xataka | The most unexpected winner of the first great battle for AI is also the one we thought was dead: Meta

For Latest Updates Follow us on Google News