[Ai] AI Alignment Reading Group

13 Dec 2021

      Hello everyone,

We will meet on Friday at 2 PM PST.

We're continuing to explore transformer interpretability. Our next paper is
"Knowledge Neurons in Pretrained Transformers
<https://arxiv.org/abs/2104.08696>" (GitHub implementation
<https://github.com/EleutherAI/knowledge-neurons>). This paper is able to
identify where pretrained transformers store knowledge and suppress,
amplify or modify that knowledge. If you're at all interested in how
transformers learn and represent information, you're welcome to attend!

Abstract:
...
Large-scale pretrained language models are surprisingly good at recalling
factual knowledge presented in the training corpus. In this paper, we
explore how implicit knowledge is stored in pretrained Transformers by
introducing the concept of knowledge neurons. Given a relational fact, we
propose a knowledge attribution method to identify the neurons that express
the fact. We present that the activation of such knowledge neurons is
highly correlated to the expression of their corresponding facts. In
addition, even without fine-tuning, we can leverage knowledge neurons to
explicitly edit (such as update, and erase) specific factual knowledge for
pretrained Transformers.
Join Zoom Meeting
https://oregonstate.zoom.us/j/95843260079?pwd=TzZTN0xPaFZrazRGTElud0J1cnJLUT...

Password: 961594

Phone Dial-In Information
+1 971 247 1195 US (Portland)
+1 253 215 8782 US (Tacoma)
+1 301 715 8592 US (Washington DC)

Meeting ID: 958 4326 0079

All the best,
Quintin

[Ai] AI Alignment Reading Group

Pope, Quintin