
Hello everyone, We'll be discussing the paper On the Expressivity of Markov Reward <https://arxiv.org/abs/2111.00876>. Reward is the driving force for reinforcement-learning agents. This paper
is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.
(Thank you to Professor Dietterich for bringing this paper to my attention) We'll be sure to discuss the first post, and hopefully the other two if we have time. Anyone interested in trying to better understand goal directed behavior is welcome to join! We're meeting at 2 PM PST on Friday like normal. Anyone interested in reinforcement learning should feel welcome to attend! Join Zoom Meeting https://oregonstate.zoom.us/j/95843260079?pwd=TzZTN0xPaFZrazRGTElud0J1cnJLUT... Password: 961594 Phone Dial-In Information +1 971 247 1195 US (Portland) +1 253 215 8782 US (Tacoma) +1 301 715 8592 US (Washington DC) Meeting ID: 958 4326 0079 All the best, Quintin
participants (1)
-
Pope, Quintin