Dear all,
Our next AI seminar on *"* *Estimating Long-term Rewards by Off-policy
Reinforcement Learning" *by Lihong Li is scheduled to be on November 10th,
1-2 PM PST.
Note that this is a zoom event.
Zoom Link:
https://oregonstate.zoom.us/j/93591935144?pwd=YjZaSjBYS0NmNUtjQzBEdzhPeDZ5U…
*Estimating Long-term Rewards by Off-policy Reinforcement Learning *
Lihong Li
Senior Principal Scientist
Amazon
*Abstract*: One of the core problems in reinforcement learning (RL) is
estimating the long-term reward of a given policy. In many real-world
applications such as healthcare, robotics and dialogue systems, running a
new policy on users or robots can be costly or risky. This gives rise to
the need for off-policy, or counterfactual, estimation: estimate the
long-term reward of a given policy using data previously collected by
another policy (e.g., the one currently deployed). This talk will describe
some recent advances in this problem, for which many standard estimators
suffer an exponentially large variance (known as "the curse of horizon").
Our approach is based on a dual linear program formulation of the long-term
reward, and can be extended to estimate confidence intervals.
*Bio*: Lihong Li is a Senior Principal Scientist at Amazon. He obtained a
PhD degree in Computer Science from Rutgers University. After that, he held
research positions in Yahoo!, Microsoft and Google, before joining Amazon.
His main research interests are in reinforcement learning, including
contextual bandits, and other related problems in AI. His work is often
inspired by applications in recommendation, advertising, Web search and
conversational systems. Homepage: http://lihongli.github.io
*Please watch this space for future AI Seminars :*
* https://eecs.oregonstate.edu/ai-events
<https://eecs.oregonstate.edu/ai-events>*
Rajesh Mangannavar,
Graduate Student
Oregon State University
----
AI Seminar Important Reminders:
-> The AI Seminar has a strict "no electronics" and "no recordings" policy.
-> For graduate students in the AI program, attendance is strongly
encouraged.
Hello everyone,
I'm looking to find a weekly timeslot for the AI alignment reading group
that works for everyone. If you're interested, please fill out this
whenisgood survey <http://whenisgood.net/nszizrb> with your availability.
The timezone is UTC-07 (Corvallis time).
To clarify, the indicated times represent the start of the one hour reading
group meetings.
All the best,
Quintin
(I forgot to include @ai in my original message. Apologies to everyone on
the alignment mailing list who got this twice!)
Dear all,
Our next AI seminar on *"Modularity and Compositionality in Multi-Step
Robot Manipulation" *by Caelan Garrett is scheduled to be on November 3rd
(tomorrow), 1-2 PM PST. It will be followed by a 30 minute Q&A session by
the graduate students.
Note that this is a zoom event.
Zoom Link:
https://oregonstate.zoom.us/j/93591935144?pwd=YjZaSjBYS0NmNUtjQzBEdzhPeDZ5U…
*Modularity and Compositionality in Multi-Step Robot Manipulation*
Caelan Garrett
Research Scientist
NVIDIA
*Abstract:*We seek to program a robot to autonomously complete complex
tasks in a variety of real-world settings involving different environments,
objects, manipulation skills, degrees of observability, initial states, and
goal objectives. In order to successfully generalize across these settings,
we take a model-based approach to building the robot's policy, which
enables it to reason about the effects of it executing different sequences
of parameterized manipulation skills. Specifically, we introduce a
general-purpose hybrid planning framework that uses streams, modules that
encode sampling procedures, to generate continuous parameter-value
candidates. We present several domain-independent algorithms that
efficiently combine streams in order to solve for parameter values that
jointly satisfy the constraints necessary for a sequence of skills to
achieve the goal. Each stream can be either engineered to perform a
standard robotics subroutine, like inverse kinematics and collision
checking, or learned from data to capture difficult-to-model behaviours,
such as pouring, scooping, and grasping. Streams are also able to represent
probabilistic inference operations, which enables our framework to plan in
belief space and intentionally select actions that reduce the robot's
uncertainty about the unknown world. We demonstrate the generality of our
approach by applying it to several real-world tabletop, kitchen, and
construction tasks and show that it can even be effective in settings
involving objects that the robot has never seen before.
*Speaker Bio:*
Caelan Garrett is a research scientist at NVIDIA's Seattle Robotics Lab
which is led by Professor Dieter Fox. He received his PhD at MIT in the
Learning and Intelligent Systems group within CSAIL where he was advised by
Professors Tomás Lozano-Pérez and Leslie Pack Kaelbling. His research is on
integrating robot motion planning, discrete AI planning, and machine
learning to flexibly and efficiently plan for autonomous mobile
manipulators operating in human environments. He recently authored the
first survey paper on integrated task and motion planning. He is a
recipient of the NSF Graduate Research Fellowship. He has previously
interned in the autonomous vehicle industry while at Optimus Ride and in
the autonomous fulfilment industry while at Amazon Robotics.
This talk will be followed by a talk on the topic "*Estimating Long-term
Rewards by Off-policy Reinforcement Learning*" by Lihong Li on 10th
November 2021.
*Please watch this space for future AI Seminars: *
* https://eecs.oregonstate.edu/ai-events
<https://eecs.oregonstate.edu/ai-events>*
Rajesh Mangannavar,
Graduate Student
Oregon State University
----
AI Seminar Important Reminders:
-> The AI Seminar has a strict "no electronics" and "no recordings" policy.
-> For graduate students in the AI program, attendance is strongly
encouraged.