[Ai] AI Alignment Reading Group

27 Jan 2022

      Hello everyone,

We're continuing to meet at 2 PM PST on Fridays. Our next topic will be
Selection Theorems:

What’s the type signature of an agent?
...
For instance, what kind-of-thing is a “goal”? What data structures can
represent “goals”? Utility functions are a common choice among theorists,
but they don’t seem quite right
<https://www.lesswrong.com/posts/RQpNHSiWaXTvDxt6R/coherent-decisions-imply-consistent-utilities?commentId=GyE8wvZuWcuiCaySb>.
And what are the inputs to “goals”? Even when using utility functions,
different models use different inputs - Coherence Theorems
<https://www.lesswrong.com/posts/RQpNHSiWaXTvDxt6R/coherent-decisions-imply-consistent-utilities>
 imply that utilities take in predefined “bet outcomes”, whereas AI
researchers often define utilities over “world states” or “world state
trajectories”, and human goals seem to be over latent variables in
humans’ world models
<https://www.lesswrong.com/posts/gQY6LrTWJNkTv8YJR/the-pointers-problem-human-values-are-a-function-of-humans>
.
And that’s just goals. What about “world models”? Or “agents” in general?
What data structures can represent these things, how do they interface with
each other and the world, and how do they embed
<https://www.lesswrong.com/posts/p7x32SEt43ZMC9r7r/embedded-agents> in
their low-level world? These are all questions about the type signatures of
agents.
One general strategy for answering these sorts of questions is to look for
what I’ll call Selection Theorems. Roughly speaking, *a Selection Theorem
tells us something about what agent type signatures will be selected for
(by e.g. natural selection or ML training or economic profitability) in
some broad class of environments*. In inner/outer agency terms, it tells
us what kind of inner agents will be selected by outer optimization
processes.
There are three selection theorems posts so far:
Selection Theorems: A Program For Understanding Agents
<https://www.lesswrong.com/posts/G2Lne2Fi7Qra5Lbuf/selection-theorems-a-program-for-understanding-agents>
Some Existing Selection Theorems
<https://www.lesswrong.com/posts/N2NebPD78ioyWHhNm/some-existing-selection-theorems>
What Selection Theorems Do We Expect/Want?
<https://www.lesswrong.com/posts/RuDD3aQWLDSb4eTXP/what-selection-theorems-do-we-expect-want>

We'll be sure to discuss the first post, and hopefully the other two if we
have time. Anyone interested in trying to better understand goal directed
behavior is welcome to join!

Join Zoom Meeting
https://oregonstate.zoom.us/j/95843260079?pwd=TzZTN0xPaFZrazRGTElud0J1cnJLUT...

Password: 961594

Phone Dial-In Information
+1 971 247 1195 US (Portland)
+1 253 215 8782 US (Tacoma)
+1 301 715 8592 US (Washington DC)

Meeting ID: 958 4326 0079

All the best,
Quintin

[Ai] AI Alignment Reading Group

Pope, Quintin