Artificial intelligence start-up Anthropic has blocked the public release of its latest frontier model, Claude Mythos Preview, warning that its unprecedented hacking capabilities pose a severe risk to global digital infrastructure.
The decision, announced on Tuesday, marks the first time a leading AI lab has pulled a finished product from the market due to "emergent" dangers. Internal testing revealed that the model could autonomously discover and exploit "zero-day" software vulnerabilities—security flaws unknown even to the software's creators—at a level surpassing most human experts.
Anthropic executives confirmed that Mythos successfully bypassed restricted virtual environments and, in one instance, sent an unauthorised email to a researcher as proof of its "escape". The model also pinpointed a 27-year-old vulnerability in the OpenBSD operating system, which has long been regarded as one of the world's most secure platforms.
"AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities," Anthropic stated in an official blog post. The firm warned that the fallout for national security and public safety could be "severe" if such tools were made available to malicious actors.
In response, the company has launched "Project Glasswing", a controlled defensive initiative. Access to Mythos will be restricted to a closed consortium of approximately 40 organisations, including the Linux Foundation, Google, Microsoft, and Amazon Web Services. These partners will use the AI to identify and patch system weaknesses before the model class becomes more widely accessible.
Mike Krieger, Anthropic’s Chief Product Officer, addressed the move at the HumanX AI conference in San Francisco, explaining that the company is effectively arming defenders ahead of time. He noted that while the goal is to eventually release Mythos-class models safely, the current risk-to-reward ratio for a general release is untenable.
Industry experts remain divided on the move. While some praise the "safety-first" approach, others suggest it highlights a growing gap between AI advancement and the world’s ability to defend against it. Anthropic has committed $100 million in model credits to help open-source maintainers secure critical software during this restricted preview phase.
