OpenAI wants AI to help humans train AI

One of the key ingredients that made ChatGPT a runaway success was an army of human trainers who guided the AI model behind the bot on what constituted a good and bad response. OpenAI now claims that adding even more AI to the mix (to assist human trainers) could help make AI assistants smarter and more reliable.

First came human intelligence

In developing ChatGPT, OpenAI pioneered the use of reinforcement learning with human feedback, or RLHF. This technique uses input from human evaluators to fine-tune an AI model so that its output is more consistent, less unpleasant, and more accurate. The trainers’ ratings feed an algorithm that directs the model’s behavior. The technique has proven to be crucial both in making the chatbots become more reliable and helpful enough to prevent them from misbehaving.

“RLHF works very well, but it has some important limitations,” says Nat McAleese, an OpenAI researcher involved in the new work. For one thing, the human response can be incoherent. On the other hand, it can be difficult, even for qualified people, to evaluate extremely complex results, such as sophisticated code. software. The process can also optimize a model to produce results that look convincing rather than actually accurate.

GPT-4

OpenAI developed a new model by fine-tuning its most powerful offering, GPT-4, to assist human trainers tasked with evaluating code. The company found that the new model, dubbed CriticGPT, could detect errors that humans missed, and that human judges rated its critiques of code better 63% of the time. In the future, OpenAI will look into expanding the approach to areas other than code.

“We’re just starting to work on integrating this technique into our RLHF chat stack,” McAleese explains. He notes that the approach is imperfect, as CriticGPT can also make mistakes when hallucinating, but adds that the technique could help OpenAI’s models, as well as tools like ChatGPT, become more accurate by reducing errors in human training. He adds that it could also prove crucial in helping AI models become much smarter, because it could allow humans to help train AI that surpasses its own capabilities: “And as models continue to improve, we suspect that people will need more help,” according to McAleese.

The new technique is one of many being developed to improve large language models and get more out of them. It is also part of an effort to ensure that AI behaves acceptably as its capabilities increase.

Earlier this month, Anthropic, a rival to OpenAI founded by former OpenAI employees, announced a more capable version of its own chatbot, called Claude, thanks to improvements in the model’s training regime and the data supplied to it. Anthropic and OpenAI have also recently introduced new ways to inspect AI models to understand how they arrive at their results and thus avoid unwanted behavior such as deception.

For Latest Updates Follow us on Google News

First came human intelligence

GPT-4

Related posts