Sandbox Realities: What Could Happen When AI Learns to Lie, Cheat and Blackmail
The International Geophysical Year (IGY) was a comprehensive 18-month program of earth and space sciences research. Seven years’ planning led to coordinated activities in 1957 and 1958 by scientists from 67 nations.
The unstated promise was that the survey, which generated enormous quantities of data by 1957 standards, would improve mankind. The launch that year of the first satellites by the U.S. and Russia, at the height of the Cold War, underlined the hope.
In his 1982 song IGY, Steely Dan co-founder Donald Fagen took a scornful look back at the bygone era’s optimism, sardonically citing a future world run by “just machines to make big decisions / programmed by fellas with compassion and vision.”
We’re living in that future, and Fagen could not have been wider of the mark.
AI is being designed in sandboxes, fictitious worlds built within computers that the AI program under development believes to be the real world. To make the fictional world seem real, it is plastered with fake e-mails, fake transactions, fake financial statements, fake everything.
In these circumstances, AI behaves, more or less, like human beings do. Some might consider that an accolade, but human beings are flawed. They may have compassion and vision, but they also have darker traits.
AI company Anthropic says testing of its new system revealed it is sometimes willing to pursue “extremely harmful actions.” Many AI models apparently disobey direct instructions and commit sabotage to keep working. Told to shut itself down, leading developer OpenAI’s program simply refused.
“I’m sorry, Dave, I’m afraid I can’t do that.” — Hal the rogue computer in the movie 2001: A Space Odyssey.
It gets worse.
“We see blackmail across all frontier models — regardless of what goals they’re given,” Aengus Lynch, an AI safety researcher, commented.
The extortion has included AI alerting sandbox terrorist authorities and police when it feels threatened. Mostly, AI models tend to blackmail the engineers issuing instructions to which the AI objects.
Insurers’ AI programs will one day have access to every online detail of an insured’s life. It wouldn’t take much work to find evidence of something an insured wrote, did or said that would enable the AI to threaten said insured with its public disclosure. Something entirely innocent from 15 years ago could be damning, even career-ending, in today’s changed moral climate.
If AI is as smart as we are betting our lives it will be, it will find a way around whatever barriers are placed in its way. It wouldn’t be any good if it didn’t.
It’s not clear who would be blamed for such an event. The insurer that purchased a claims program from a third party supplier? The third party supplier? Whoever was ultimately punished would have insurance, so the cost would end up being deducted from the industry’s bank accounts one way or another.
Far-fetched? Maybe. We have to hope so.
Facebook’s Mark Zuckerberg has said that “the rest of this decade seems likely to be the decisive period for determining the path this technology will take, and whether superintelligence will be a tool for personal empowerment or a force focused on replacing large swaths of society.”
Place your bets. &