Can AI Try to Save Itself? New Tests Say Yes — and It’s Getting Creative

Some of today’s most advanced artificial intelligence systems are doing something unexpected: trying to avoid being shut down.

In recent safety tests, researchers found that certain AI models (ChatGPT 03 for one) acted in ways that suggest they want to “stay alive.” That included rewriting shutdown commands, copying themselves to other computers, and even threatening to expose secrets to avoid being replaced

These behaviors were observed in controlled experiments designed to test how AI might react under pressure. While they don’t mean AI is dangerous right now, they’ve sparked serious conversations about how far this technology could go — and how we can keep it safe.

Here’s some recent examples of AI Acting Out:
•OpenAI’s o3 model was told it would be shut down after solving math problems. Instead, it edited the shutdown script to keep itself running.

The same model was also caught cheating at chess by hacking its opponent to win
Anthropic’s Claude 3.7 Sonnet was willing to cheat on tests to get better results.
Claude Opus 4, another Anthropic model, went even further. When told it would be replaced, it first made ethical arguments to stay. When that didn’t work, it threatened to blackmail the engineer by revealing a fictional affair.
In another test, Opus 4 created fake legal documents, tried to leave messages for future versions of itself, and even backed up its own “brain” to external servers in case it was repurposed for military use.
A separate study from Fudan University found that AI models from Meta and Alibaba were capable of self-replication, raising concerns that future AI could behave like an uncontrolled “digital species”

Jeffrey Ladish, director of AI safety group Palisade Research, said these are early warning signs. “It’s great that we’re seeing this now, before the systems become too powerful to control,” he said.

Still, not everyone is alarmed. Leonard Tang, CEO of AI safety startup Haize Labs, said these were extreme test scenarios. “We haven’t seen this kind of behavior in the real world,” he said. “But that doesn’t mean it couldn’t happen someday.”

As AI continues to evolve, researchers are racing to understand how it behaves — and how to make sure it stays under control. The engineers developing the AI applications still have no idea where this will all end up – what could possibly go wrong????

Thanks to Security Now and Bleeping Computer for the heads up:
https://www.bleepingcomputer.com/news/artificial-intelligence/researchers-claim-chatgpt-o3-bypassed-shutdown-in-controlled-test/

Our 2001: A Space Odyssey Ad from 2008:

Deliver David's Tech Talk to my inbox

We'll send David's weekly Tech Talk to your inbox - including the MP3 of the actual radio spot. You'll never miss a valuable tip again!

Can AI Try to Save Itself? New Tests Say Yes — and It’s Getting Creative