This Friday, we'll discuss "Inaccessible information":

"Suppose that I have a great model for predicting “what will Alice say next?”

I can evaluate and train this model by checking its predictions against reality, but there may be many facts this model “knows” that I can’t easily access.

For example, the model might have a detailed representation of Alice’s thoughts which it uses to predict what Alice will say, without being able to directly answer “What is Alice thinking?” In this case, I can only access that knowledge indirectly, e.g. by asking about what Alice would say under different conditions.

I’ll call information like “What is Alice thinking?” inaccessible. I think it’s very plausible that AI systems will build up important inaccessible knowledge, and that this may be a central feature of the AI alignment problem."

Post: https://www.alignmentforum.org/posts/ZyWyAJbedvEgRT2uF/inaccessible-information

We'll meet Friday at 1. https://oregonstate.zoom.us/j/2739792686?pwd=VkRUeHJkYnhvTzlvZzR6YnZWNERKQT09