OpenAI released ChatGPT, a conversational AI model based on their GPT-3.5 language model (LM). ChatGPT is fine-tuned using Reinforcement Learning from Human Feedback (RLHF) and includes a moderation filter to block inappropriate interactions.
The release was announced on the OpenAI blog. ChatGPT is trained using the same RLHF methods as those used to train InstructGPT, OpenAI’s instruction-following language model. RHLF uses two datasets. one of human-written examples for supervised fine-tuning of the GPT-3.5 LM, and human-labeled comparisons of LM outputs to train a reward model for reinforcement learning. OpenAI released ChatGPT to get feedback from users and to explore its limitations.
Today’s research release of ChatGPT is the latest step in OpenAI’s iterative deployment of increasingly safe and useful AI systems. Many lessons from deployment of earlier models like GPT-3 and Codex have informed the safety mitigations in place for this release, including substantial reductions in harmful and untruthful outputs achieved by the use of reinforcement learning from human feedback…We know that many limitations remain…and we plan to make regular model updates to improve in such areas. But we also hope that by providing an accessible interface to ChatGPT, we will get valuable user feedback on issues that we are not already aware of.
GPT-3.5 is the latest in OpenAI’s GPT series of large language models. Earlier this year, OpenAI published a technical paper on InstructGPT, which attempts to reduce toxicity and hallucinations in the LM’s output by “aligning” it with the user’s intent. First, a baseline “policy” for the LM is fine-tuned on a dataset of a set of prompts for the LM along with human-written desired responses. Next, a reward model is trained from a dataset of LM-generated responses for a prompt which are ranked by human labelers. Finally, the baseline policy is further fine-tuned via Proximal Policy Optimization (PPO) using the reward model.
Image Source: https://openai.com/blog/chatgpt/
Using this technique, OpenAI improved GPT-3’s hallucination rate from 41% to 21%. InstructGPT also generated “about 25% fewer toxic outputs than GPT-3 when prompted to be respectful.” ChatGPT was trained using the same general method, but in the first step, the humans generated a dataset by creating conversations between themselves and an imaginary chatbot. The OpenAI researchers found that this created a bias in their training data (“longer answers that look more comprehensive”), which results in the model sometimes producing verbose responses.
The tech community has been actively experimenting with the model. In a Hacker News discussion about ChatGPT, several users pointed out that the model’s responses were “dull” and “more filtered” than GPT-3’s would have been. One user replied:
I understand why people are somewhat frustrated by the “safety bumpers” on this. But I would say that I am actually really impressed by the quality of those safety controls. This is an AI that seems to know what it can and cannot give a decent response to. I don’t know if that’s hard coded or trained in, but it’s really impressive when you compare it to the hallucinations that typically show up in GPT3.
On Twitter, linguist and NLP educator Rachael Tatman wondered if OpenAI had published a technical paper about ChatGPT. AI entrepreneur: Will Spagnoli replied::
They published a paper with the first [InstructGPT] model’s release which explains how they did it, and the new ChatGPT and text-davinci-003 are just more recent versions of the same thing just now they have way more labeled data from human feedback which caused the performance gains.
OpenAI has not released the code or models for ChatGPT, but a free demo is available on the web.