HOUSE_OVERSIGHT_016837.jpg
2.48 MB
Extraction Summary
5
People
0
Organizations
0
Locations
0
Events
2
Relationships
2
Quotes
Document Information
Type:
Book page or academic report snippet
File Size:
2.48 MB
Summary
The text discusses the concept of Cooperative Inverse-Reinforcement Learning (CIRL), a framework designed to align machine actions with human preferences through a game-theoretic approach involving partial information. Using a hypothetical example of agents named Harriet and Robby, it illustrates how uncertainty about preferences encourages cooperation and teaching, and further applies this framework to solve the "off-switch problem" by incentivizing robots to allow themselves to be deactivated.
People (5)
| Name | Role | Context |
|---|---|---|
| Harriet | ||
| Robby | ||
| Wiener | ||
| Turing | ||
| Hadfield-Menell |
Relationships (2)
→
is a human agent in the CIRL example
→
→
is a robot agent in the CIRL example
→
Key Quotes (2)
"The machine may learn more about human preferences as it goes along, of course, but it will never achieve complete certainty."Source
HOUSE_OVERSIGHT_016837.jpg
Quote #1
"A robot that’s uncertain about human preferences actually benefits from being switched off, because it understands that the human will press the off-switch to prevent the robot from doing something counter to those preferences."Source
HOUSE_OVERSIGHT_016837.jpg
Quote #2
Discussion 0
No comments yet
Be the first to share your thoughts on this epstein document