HOUSE_OVERSIGHT_016838.jpg

1.86 MB

Extraction Summary

1
People
1
Organizations
0
Locations
0
Events
1
Relationships
3
Quotes

Document Information

Type: Academic paper or report page
File Size: 1.86 MB
Summary

This document discusses the challenges and necessity of designing Artificial Intelligence systems that align with human values, often referred to as the AI control problem. It explores the difficulties of teaching robots human preferences given human irrationality and conflicting desires, while arguing for a redefinition of AI research to focus on systems that are provably beneficial to humans rather than just intelligent.

People (1)

Name Role Context
Bostrom

Organizations (1)

Name Type Context
domestic-robot industry

Relationships (1)

Bostrom quoted regarding the importance of the problem AI control problem

Key Quotes (3)

"If one poorly designed domestic robot cooks the cat for dinner, not realizing that its sentimental value outweighs its nutritional value, the domestic-robot industry will be out of business."
Source
HOUSE_OVERSIGHT_016838.jpg
Quote #1
"Finding a solution to the AI control problem is an important task; it may be, in Bostrom’s words, “the essential task of our age.”"
Source
HOUSE_OVERSIGHT_016838.jpg
Quote #2
"This problem requires a change in the definition of AI itself—from a field concerned with pure intelligence, independent of the objective, to a field concerned with systems that are provably beneficial for humans."
Source
HOUSE_OVERSIGHT_016838.jpg
Quote #3

Full Extracted Text

Complete text extracted from the document (2,756 characters)

designs and provides at least one case of a provably beneficial system in the sense
introduced above. The overall approach resembles mechanism-design problems in
economics, wherein one incentivizes other agents to behave in ways beneficial to the
designer. The key difference here is that we are building one of the agents in order to
benefit the other.
There are reasons to think this approach may work in practice. First, there is
abundant written and filmed information about humans doing things (and other humans
reacting). Technology to build models of human preferences from this storehouse will
presumably be available long before superintelligent AI systems are created. Second,
there are strong, near-term economic incentives for robots to understand human
preferences: If one poorly designed domestic robot cooks the cat for dinner, not realizing
that its sentimental value outweighs its nutritional value, the domestic-robot industry will
be out of business.
There are obvious difficulties, however, with an approach that expects a robot
to learn underlying preferences from human behavior. Humans are irrational,
inconsistent, weak-willed, and computationally limited, so their actions don’t always
reflect their true preferences. (Consider, for example, two humans playing chess.
Usually, one of them loses, but not on purpose!) So robots can learn from nonrational
human behavior only with the aid of much better cognitive models of humans.
Furthermore, practical and social constraints will prevent all preferences from being
maximally satisfied simultaneously, which means that robots must mediate among
conflicting preferences—something that philosophers and social scientists have struggled
with for millennia. And what should robots learn from humans who enjoy the suffering
of others? It may be best to zero out such preferences in the robots’ calculations.
Finding a solution to the AI control problem is an important task; it may be,
in Bostrom’s words, “the essential task of our age.” Up to now, AI research has focused
on systems that are better at making decisions, but this is not the same as making better
decisions. No matter how excellently an algorithm maximizes, and no matter how
accurate its model of the world, a machine’s decisions may be ineffably stupid in the eyes
of an ordinary human if its utility function is not well aligned with human values.
This problem requires a change in the definition of AI itself—from a field
concerned with pure intelligence, independent of the objective, to a field concerned with
systems that are provably beneficial for humans. Taking the problem seriously seems
likely to yield new ways of thinking about AI, its purpose, and our relationship to it.
35
HOUSE_OVERSIGHT_016838

Discussion 0

Sign in to join the discussion

No comments yet

Be the first to share your thoughts on this epstein document