Training language models to follow instructions with human feedback
A groundbreaking paper from OpenAI introducing RLHF (Reinforcement Learning from Human Feedback) to align language models with human intent, enabling applications like ChatGPT.
Read more