[Ai] Alignment Reading Group Meeting

19 Jan 2022

      Hello everyone,

No one expressed issues with our meeting time, so we'll meet at 2 PM PST on
Friday like normal. We'll discuss Eliciting latent knowledge: How to tell
if your eyes deceive you
<https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit?usp=sharing>,
which Alex expressed interest in.

In this post, we’ll present ARC’s approach to an open problem we think is
...
central to aligning powerful machine learning (ML) systems:
Suppose we train a model to predict what the future will look like
according to cameras and other sensors. We then use planning algorithms to
find a sequence of actions that lead to predicted futures that look good to
us.
But some action sequences could tamper with the cameras so they show
happy          humans regardless of what’s really happening. More
generally, some futures look great on camera but are actually
catastrophically bad.
In these cases, the prediction model "knows" facts (like "the camera was
tampered with") that are not visible on camera but would change our
evaluation of the predicted future if we learned them. How can we train
this model to report its latent knowledge of off-screen events?
We’ll call this problem eliciting latent knowledge (ELK). In this report
we’ll focus on detecting sensor tampering as a motivating example, but we
believe ELK is central to many aspects of alignment.
Join Zoom Meeting
https://oregonstate.zoom.us/j/95843260079?pwd=TzZTN0xPaFZrazRGTElud0J1cnJLUT...

Password: 961594

Phone Dial-In Information
+1 971 247 1195 US (Portland)
+1 253 215 8782 US (Tacoma)
+1 301 715 8592 US (Washington DC)

Meeting ID: 958 4326 0079

All the best,
Quintin

[Ai] Alignment Reading Group Meeting

Pope, Quintin