[Ai] AI Alignment Reading Group

9 Feb 2022

      Hello everyone,

We'll be meeting Friday at 2 PM PST to discuss OpenAI's recent paper "Training
language models to follow instructions with human feedback
<https://cdn.openai.com/papers/Training_language_models_to_follow_instructions_with_human_feedback.pdf>
".

Making language models bigger does not inherently make them better at
...
following a user’s intent. For example, large language models can generate
outputs that are untruthful, toxic, or simply not helpful to the user. In
other words, these models are not aligned with their users. In this
paper, we show an avenue for aligning language models with user intent on a
wide range of tasks by fine-tuning with human feedback. Starting with a set
of labeler-written prompts and prompts submitted through the OpenAI API, we
collect a dataset of labeler demonstrations of the desired model behavior,
which we use to fine-tune GPT-3 using supervised learning. We then collect
a dataset of rankings of model outputs, which we use to further fine-tune
this supervised model using reinforcement learning from human feedback
(RLHF). We call the resulting models InstructGPT. In human evaluations on
our prompt distribution, outputs from the 1.3B parameter InstructGPT model
are preferred to outputs from the 175B GPT-3, despite having 100x fewer
parameters. Moreover, InstructGPT models show improvements in truthfulness
and reductions in toxic output generation while having minimal performance
regressions on public NLP datasets. Even though InstructGPT still makes
simple mistakes, our results show that fine-tuning with human feedback is a
promising direction for aligning language models with human intent.
Anyone interested in language modeling, reinforcement learning, their
intersection, or language models that can actually follow instructions
should feel welcome to join!

Join Zoom Meeting
https://oregonstate.zoom.us/j/95843260079?pwd=TzZTN0xPaFZrazRGTElud0J1cnJLUT...

Password: 961594

Phone Dial-In Information
+1 971 247 1195 US (Portland)
+1 253 215 8782 US (Tacoma)
+1 301 715 8592 US (Washington DC)

Meeting ID: 958 4326 0079

All the best,
Quintin

[Ai] AI Alignment Reading Group

Pope, Quintin