Hello everyone,

We will meet on Friday at 2 PM PST.

We're continuing to explore transformer interpretability. Our next paper is "Knowledge Neurons in Pretrained Transformers" (GitHub implementation). This paper is able to identify where pretrained transformers store knowledge and suppress, amplify or modify that knowledge. If you're at all interested in how transformers learn and represent information, you're welcome to attend!

Abstract:

Large-scale pretrained language models are surprisingly good at recalling factual knowledge presented in the training corpus. In this paper, we explore how implicit knowledge is stored in pretrained Transformers by introducing the concept of knowledge neurons. Given a relational fact, we propose a knowledge attribution method to identify the neurons that express the fact. We present that the activation of such knowledge neurons is highly correlated to the expression of their corresponding facts. In addition, even without fine-tuning, we can leverage knowledge neurons to explicitly edit (such as update, and erase) specific factual knowledge for pretrained Transformers.

Join Zoom Meeting
https://oregonstate.zoom.us/j/95843260079?pwd=TzZTN0xPaFZrazRGTElud0J1cnJLUT09

Password: 961594

Phone Dial-In Information
+1 971 247 1195 US (Portland)
+1 253 215 8782 US (Tacoma)
+1 301 715 8592 US (Washington DC)

Meeting ID: 958 4326 0079

All the best,

Quintin