
Dear all, Our next AI seminar on *"**Estimating Long-term Rewards by Off-policy Reinforcement Learning" *by Lihong Li is scheduled to be on November 10th (Tomorrow), 1-2 PM PST. There will be *NO *graduate student Q&A session. Hence everyone is encouraged to ask questions during the seminar. Note that this is a zoom event. Zoom Link: https://oregonstate.zoom.us/j/93591935144?pwd=YjZaSjBYS0NmNUtjQzBEdzhPeDZ5UT... *Estimating Long-term Rewards by Off-policy Reinforcement Learning * Lihong Li Senior Principal Scientist Amazon *Abstract*: One of the core problems in reinforcement learning (RL) is estimating the long-term reward of a given policy. In many real-world applications such as healthcare, robotics and dialogue systems, running a new policy on users or robots can be costly or risky. This gives rise to the need for off-policy, or counterfactual, estimation: estimate the long-term reward of a given policy using data previously collected by another policy (e.g., the one currently deployed). This talk will describe some recent advances in this problem, for which many standard estimators suffer an exponentially large variance (known as "the curse of horizon"). Our approach is based on a dual linear program formulation of the long-term reward, and can be extended to estimate confidence intervals. *Bio*: Lihong Li is a Senior Principal Scientist at Amazon. He obtained a PhD degree in Computer Science from Rutgers University. After that, he held research positions in Yahoo!, Microsoft and Google, before joining Amazon. His main research interests are in reinforcement learning, including contextual bandits, and other related problems in AI. His work is often inspired by applications in recommendation, advertising, Web search and conversational systems. Homepage: http://lihongli.github.io *Please watch this space for future AI Seminars :* * https://eecs.oregonstate.edu/ai-events <https://eecs.oregonstate.edu/ai-events>* Rajesh Mangannavar, Graduate Student Oregon State University ---- AI Seminar Important Reminders: -> The AI Seminar has a strict "no electronics" and "no recordings" policy. -> For graduate students in the AI program, attendance is strongly encouraged.