Hello everyone,

Attendance for recent reading groups has dropped off. I've been wondering if there's an issue with the timeslot or content. Please let me know if you have any conflicts with the current time (Friday at 2 PM PST). Currently, reading group content focuses strongly on interpretability. If you'd be more interested in some other aspect of alignment research, please let me know.

For this week, we'll meet at 2 PM PST on Friday. We'll be discussing "A Mathematical Framework for Transformer Circuits"

In this paper, we attempt to take initial, very preliminary steps towards reverse-engineering transformers. Given the incredible complexity and size of modern language models, we have found it most fruitful to start with the simplest possible models and work our way up from there. Our aim is to discover simple algorithmic patterns, motifs, or frameworks that can subsequently be applied to larger and more complex models. Specifically, in this paper we will study transformers with two layers or less which have only attention blocks – this is in contrast to a large, modern transformer like GPT-3, which has 96 layers and alternates attention blocks with MLP blocks.

Join Zoom Meeting
https://oregonstate.zoom.us/j/95843260079?pwd=TzZTN0xPaFZrazRGTElud0J1cnJLUT09

Password: 961594

Phone Dial-In Information
+1 971 247 1195 US (Portland)
+1 253 215 8782 US (Tacoma)
+1 301 715 8592 US (Washington DC)

Meeting ID: 958 4326 0079

All the best,

Quintin