HOUSE_OVERSIGHT_016837.jpg

2.48 MB
View Original

Extraction Summary

5
People
0
Organizations
0
Locations
0
Events
2
Relationships
2
Quotes

Document Information

Type: Book page or academic report snippet
File Size: 2.48 MB
Summary

The text discusses the concept of Cooperative Inverse-Reinforcement Learning (CIRL), a framework designed to align machine actions with human preferences through a game-theoretic approach involving partial information. Using a hypothetical example of agents named Harriet and Robby, it illustrates how uncertainty about preferences encourages cooperation and teaching, and further applies this framework to solve the "off-switch problem" by incentivizing robots to allow themselves to be deactivated.

People (5)

Relationships (2)

is a human agent in the CIRL example
is a robot agent in the CIRL example

Key Quotes (2)

"The machine may learn more about human preferences as it goes along, of course, but it will never achieve complete certainty."
Source
HOUSE_OVERSIGHT_016837.jpg
Quote #1
"A robot that’s uncertain about human preferences actually benefits from being switched off, because it understands that the human will press the off-switch to prevent the robot from doing something counter to those preferences."
Source
HOUSE_OVERSIGHT_016837.jpg
Quote #2

Discussion 0

Sign in to join the discussion

No comments yet

Be the first to share your thoughts on this epstein document