
Hello everyone, This is just a quick reminder that the alignment reading group is meeting at 1 PM today. Anyone interested is welcome to join. All the best, Quintin Sent from my iPhone
On Feb 22, 2022, at 12:45 AM, Pope, Quintin <popeq@oregonstate.edu> wrote:
Hello everyone,
We'll be meeting again at 1 PM PST on Friday. We'll discuss Locating and Editing Factual Knowledge in GPT.
We investigate the mechanisms underlying factual knowledge recall in autoregressive transformer language models. First, we develop a causal intervention for identifying neuron activations capable of altering a model's factual predictions. Within large GPT-style models, this reveals two distinct sets of neurons that we hypothesize correspond to knowing an abstract fact and saying a concrete word, respectively. This insight inspires the development of ROME, a novel method for editing facts stored in model weights. For evaluation, we assemble CounterFact, a dataset of over twenty thousand counterfactuals and tools to facilitate sensitive measurements of knowledge editing. Using CounterFact, we confirm the distinction between saying and knowing neurons, and we find that ROME achieves state-of-the-art performance in knowledge editing compared to other methods. An interactive demo notebook, full code implementation, and the dataset are available at this https URL.
I look forward to meeting you then!
Join Zoom Meeting https://oregonstate.zoom.us/j/95843260079?pwd=TzZTN0xPaFZrazRGTElud0J1cnJLUT...
Password: 961594
Phone Dial-In Information +1 971 247 1195 US (Portland) +1 253 215 8782 US (Tacoma) +1 301 715 8592 US (Washington DC)
Meeting ID: 958 4326 0079
All the best, Quintin