[Ai] Meeting time for AI alignment reading group

13 Nov 2021

      Hello everyone,

The survey results are in. Our first meeting will be at 2 PM (PST) on
Friday, November 19th, and will repeat weekly.

For our first discussion, I thought we'd start with something short and
optimistic: Alignment by Default
<https://www.lesswrong.com/posts/Nwgdq6kHke5LY692J/alignment-by-default>.
It argues that powerful models may have internal "natural abstractions"
that straightforwardly represent human values. Some combination of
finetuning, interpretability tools and luck may be enough to "wire up" the
human values representation to the model's output and thereby get an
aligned model.

I'm interested in how plausible alignment by default seems to everyone,
whether there are any architecture or training modifications we can make
that raise the odds of alignment by default, and in how to best manage the
"wire up" step where we get a globally aligned model out of a model with a
human values subcomponent.

Anyone who's at all interested is welcome to join!

Join Zoom Meeting
https://oregonstate.zoom.us/j/95843260079?pwd=TzZTN0xPaFZrazRGTElud0J1cnJLUT...

Password: 961594

Phone Dial-In Information
+1 971 247 1195 US (Portland)
+1 253 215 8782 US (Tacoma)
+1 301 715 8592 US (Washington DC)

Meeting ID: 958 4326 0079

[Ai] Meeting time for AI alignment reading group

Pope, Quintin