whereas the iron-eating bacterium Thiobacillus ferrooxidans is thrilled. Who’s to
say the bacterium is wrong? The fact that a machine has been given a fixed
objective by humans doesn’t mean that it will automatically recognize the
importance to humans of things that aren’t part of the objective. Maximizing the
objective may well cause problems for humans, but, by definition, the machine
will not recognize those problems as problematic.
• Intelligence is multidimensional, “so ‘smarter than humans’ is a meaningless
concept.”⁶ It is a staple of modern psychology that IQ doesn’t do justice to the
full range of cognitive skills that humans possess to varying degrees. IQ is indeed
a crude measure of human intelligence, but it is utterly meaningless for current AI
systems, because their capabilities across different areas are uncorrelated. How
do we compare the IQ of Google’s search engine, which cannot play chess, with
that of DeepBlue, which cannot answer search queries?
None of this supports the argument that because intelligence is multifaceted,
we can ignore the risk from superintelligent machines. If “smarter than humans”
is a meaningless concept, then “smarter than gorillas” is also meaningless, and
gorillas therefore have nothing to fear from humans; clearly, that argument
doesn’t hold water. Not only is it logically possible for one entity to be more
capable than another across all the relevant dimensions of intelligence, it is also
possible for one species to represent an existential threat to another even if the
former lacks an appreciation for music and literature.
Solutions
Can we tackle Wiener’s warning head-on? Can we design AI systems whose purposes
don’t conflict with ours, so that we’re sure to be happy with how they behave? On the
face of it, this seems hopeless, because it will doubtless prove infeasible to write down
our purposes correctly or imagine all the counterintuitive ways a superintelligent entity
might fulfill them.
If we treat superintelligent AI systems as if they were black boxes from outer
space, then indeed we have no hope. Instead, the approach we seem obliged to take, if
we are to have any confidence in the outcome, is to define some formal problem F, and
design AI systems to be F-solvers, such that no matter how perfectly a system solves F,
we’re guaranteed to be happy with the solution. If we can work out an appropriate F that
has this property, we’ll be able to create provably beneficial AI.
Here’s an example of how not to do it: Let a reward be a scalar value provided
periodically by a human to the machine, corresponding to how well the machine has
behaved during each period, and let F be the problem of maximizing the expected sum of
rewards obtained by the machine. The optimal solution to this problem is not, as one
might hope, to behave well, but instead to take control of the human and force him or her
to provide a stream of maximal rewards. This is known as the wireheading problem,
based on observations that humans themselves are susceptible to the same problem if
given a means to electronically stimulate their own pleasure centers.
There is, I believe, an approach that may work. Humans can reasonably be
described as having (mostly implicit) preferences over their future lives—that is, given
___________________________________________________
⁶ Kevin Kelly, “The Myth of a Superhuman AI,” Wired, Apr. 25, 2017.
33
HOUSE_OVERSIGHT_016836
Discussion 0
No comments yet
Be the first to share your thoughts on this epstein document