[Ai] AI alignment reading group: inaccessible information

29 Apr 2021

      This Friday, we'll discuss "Inaccessible information":

"Suppose that I have a great model for predicting “what will Alice
say next?”

I can evaluate and train this model by checking its predictions against
reality, but there may be many facts this model “knows” that I can’t
easily access.

For example, the model might have a detailed representation of Alice’s
thoughts which it uses to predict what Alice will say, *without* being able
to directly answer “What is Alice thinking?” In this case, I can only
access that knowledge indirectly, e.g. by asking about what Alice would say
under different conditions.

I’ll call information like “What is Alice thinking?” inaccessible. I think
it’s very plausible that AI systems will build up important inaccessible
knowledge, and that this may be a central feature of the AI alignment
problem."

Post:
https://www.alignmentforum.org/posts/ZyWyAJbedvEgRT2uF/inaccessible-informat...

We'll meet Friday at 1.
https://oregonstate.zoom.us/j/2739792686?pwd=VkRUeHJkYnhvTzlvZzR6YnZWNERKQT0...

[Ai] AI alignment reading group: inaccessible information

Turner, Alex