New subject: AI alignment reading group: Research Taste

1 Apr 2021

      This Friday, we'll be discussing a fun paper about adversarial examples.

"Over the past few years, adversarial examples – or inputs that have been
slightly perturbed by an adversary to cause unintended behavior in machine
learning systems – have received significant attention in the machine
learning community. There has been much work on training models that are
not vulnerable to adversarial examples... but all this research does not
really confront the fundamental question: why do these adversarial examples
arise in the first place?"

Blog post: https://gradientscience.org/adv/
Paper: https://arxiv.org/abs/1905.02175
Summaries of paper and counterpoints:
https://www.alignmentforum.org/posts/NTwA3J99RPkgmp6jh/an-62-are-adversarial...

We'll meet Friday at 1.
https://oregonstate.zoom.us/j/2739792686?pwd=VkRUeHJkYnhvTzlvZzR6YnZWNERKQT0...

Alex Turner

AI alignment reading group: Adversarial Examples Are Not Bugs, They Are Features

Turner, Alex

Turner, Alex

tags

participants (1)