
This Friday, we'll discuss "Inaccessible information": "Suppose that I have a great model for predicting “what will Alice say next?” I can evaluate and train this model by checking its predictions against reality, but there may be many facts this model “knows” that I can’t easily access. For example, the model might have a detailed representation of Alice’s thoughts which it uses to predict what Alice will say, *without* being able to directly answer “What is Alice thinking?” In this case, I can only access that knowledge indirectly, e.g. by asking about what Alice would say under different conditions. I’ll call information like “What is Alice thinking?” inaccessible. I think it’s very plausible that AI systems will build up important inaccessible knowledge, and that this may be a central feature of the AI alignment problem." Post: https://www.alignmentforum.org/posts/ZyWyAJbedvEgRT2uF/inaccessible-informat... We'll meet Friday at 1. https://oregonstate.zoom.us/j/2739792686?pwd=VkRUeHJkYnhvTzlvZzR6YnZWNERKQT0...