
Hi all, Our last AI seminar for this term is on Friday, March 14th. Title: The Manchurian Classifier: Invisible Triggers and Vulnerabilities in Text Models<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fengineering.oregonstate.edu%2Fevents%2Fai-seminar-manchurian-classifier-invisible-triggers-and-vulnerabilities-text-models&data=05%7C02%7Cai%40engr.oregonstate.edu%7C04560fa0e9be4b98d6b808dd61db2506%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638774316026999061%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=kyWP0RnQ%2B%2FlB37ynliG8VnMFKJOkjAvOZcPWFsoJPIE%3D&reserved=0> Speaker: Daniel Lowd, Professor of Computer Science, University of Oregon Time: 2:00 PM Location: KEC 1001 and Zoom Zoom link: https://oregonstate.zoom.us/s/98357211915<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Foregonstate.zoom.us%2Fs%2F98357211915&data=05%7C02%7Cai%40engr.oregonstate.edu%7C04560fa0e9be4b98d6b808dd61db2506%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638774316027031092%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=M6z%2F%2FMpIHxYnW0wHkyHIo%2BRQ5mUSmmdRb3kND0p1whA%3D&reserved=0> Abstract: Machine learning models are increasingly deployed in critical applications, yet their vulnerability to data poisoning remains a significant security concern. In this talk, I will explore backdoor attacks on text classifiers - a form of adversarial manipulation where hidden "trigger patterns" are embedded during training, causing models to behave normally on standard inputs but produce targeted misclassifications when triggers are present. I will first introduce the concept of backdoor attacks in the text domain and demonstrate why they represent a unique challenge compared to traditional adversarial examples. Then, I will present LLMBkd, our method that leverages large language models to insert backdoor triggers as subtle stylistic variations in text. These attacks are particularly concerning as they require minimal technical expertise while offering attackers a wide range of possible trigger mechanisms that can bypass conventional inspection. Building on this foundation, I will discuss AttrBkd, our more advanced approach that achieves even greater subtlety by extracting fine-grained attributes from existing backdoor attacks. Our comprehensive human evaluations reveal a critical insight: these attacks remain highly effective while being virtually indistinguishable from normal text to human reviewers. This disconnect between human perception and automated detection metrics exposes fundamental limitations in current defensive measures. Finally, I'll discuss the current state of defenses against backdoor attacks and broader security implications for machine learning systems that process text. If time permits, I'll also introduce our ongoing work on Malicious Programming Prompt (MaPP) attacks, which demonstrate how even state-of-the-art coding assistants can be manipulated to produce security vulnerabilities through carefully crafted prompts Biography: Daniel Lowd is a Professor of Computer Science at the University of Oregon. His research interests include machine learning and artificial intelligence, focusing on adversarial methods and statistical relational models. He received his Ph.D. in 2010 from the University of Washington. He has received a Google Faculty Award, an ARO Young Investigator Award, and a best paper award from DEXA. AI Seminars Link: https://engineering.oregonstate.edu/EECS/research/AI-seminars
participants (1)
-
Tadepalli, Prasad