AI alignment reading group: Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges

This week, we'll be reading Rudin et al's "Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges": "Interpretability in machine learning (ML) is crucial for high stakes decisions and troubleshooting. In this work, we provide fundamental principles for interpretable ML, and dispel common misunderstandings that dilute the importance of this crucial topic. We also identify 10 technical challenge areas in interpretable machine learning and provide history and background on each problem. Some of these problems are classically important, and some are recent problems that have arisen in the last few years. These problems are: (1) ... (9) Characterization of the "Rashomon set" of good models; and (10) Interpretable reinforcement learning." This is a long paper, but I'm particularly interested in discussing (9): Rashomon sets of models. In what situations should we expect black-box models (e.g. deep neural networks) to have interpretable counterparts (e.g. sparse logical models / decision trees) with similar performance? The answer to this question will help govern competitive pressures for using/not using interpretable models, and also inform the difficulty of supervising the computation performed by potential future human-level ML systems. The paper: https://arxiv.org/abs/2103.11251 We'll meet Friday at 1. https://oregonstate.zoom.us/j/2739792686?pwd=VkRUeHJkYnhvTzlvZzR6YnZWNERKQT0... Best, Alex Turner
participants (1)
-
Turner, Alex