Anthropic’s Dangerous Mythos AI Model Accessed by Unauthorized Group
Leak of powerful cybersecurity tool raises questions about "Project Glasswing" safety protocols.
Anthropic’s unreleased cybersecurity powerhouse, Mythos, has reportedly been accessed by an unauthorized group of developers on the same day it was announced. The breach, confirmed by internal sources and screenshots provided to Bloomberg, represents a significant setback for the company’s "Project Glasswing" initiative, which aimed to keep the tool restricted to a handful of high-trust partners. While Anthropic maintains that its internal systems remain secure, the incident highlights the extreme difficulty of containing "frontier" models once they are shared with third-party vendors.
Key Details
The leak occurred through a third-party vendor environment where the "Mythos Preview" model was being tested. According to reports, a group of enthusiasts operating on a private Discord server managed to locate the model’s endpoint by making an "educated guess" based on the naming conventions Anthropic uses for its other production models. A member of the group, who is reportedly a contractor for a third-party vendor working with Anthropic, provided additional assistance in verifying the access.
Anthropic has officially acknowledged the reports, stating they are investigating "unauthorized access to Claude Mythos Preview through one of our third-party vendor environments." However, the company emphasized that there is currently no evidence that their core systems or internal infrastructure have been compromised. The group involved claims they were motivated by curiosity rather than malice, seeking to explore the capabilities of the unreleased model rather than weaponizing it for destructive purposes. They provided evidence of their access via screenshots and live demonstrations to investigative journalists.
What This Means
This incident exposes the inherent fragility of "closed-door" AI safety strategies. Anthropic had marketed Mythos as a tool too dangerous for general release, possessing a specialized capability to identify and exploit vulnerabilities across every major operating system and web browser. By creating a high-value, restricted asset, Anthropic inadvertently turned the model into a "Holy Grail" for the AI-sleuthing community. The ease with which the group located the model—partially through simple pattern recognition of URL structures—suggests that even the most advanced AI labs may have basic operational security blind spots.
Furthermore, the involvement of a third-party contractor underscores the "human element" as the weakest link in the AI safety chain. No matter how robust the model's alignment or the lab's internal firewalls, the necessity of sharing these tools with external partners for testing and integration creates an exponentially larger attack surface.
Technical Breakdown
The unauthorized access was achieved through a combination of social engineering, insider access, and technical inference:
- Predictable Endpoints: The group reportedly found the model by guessing the URL structure based on existing Claude 3 and Claude 3.5 naming conventions. This suggests a lack of randomized or obscure endpoint identifiers for sensitive preview models.
- Third-Party Risk: The breach was facilitated by access within a vendor's environment, highlighting the difficulty of maintaining a secure perimeter when sharing models with external partners like Apple or governmental agencies.
- Model Capabilities: Mythos is designed for offensive security research, possessing a specialized fine-tuning that allows it to generate sophisticated exploit code that standard Claude models are programmed to refuse under their safety guidelines.
Industry Impact
The breach will likely force a major re-evaluation of how "frontier" models are shared with enterprise partners. If a group of hobbyists can gain access to a model deemed a national security risk, the trust required for initiatives like Project Glasswing may evaporate. For developers and researchers, this reinforces the reality that "security by obscurity" or restricted access is an insufficient defense against a motivated and distributed community.
It also puts significant pressure on Anthropic to prove that their "Constitutional AI" frameworks can actually prevent a leaked model from being used for large-scale cyberattacks.
Looking Ahead
Anthropic is now in a race to harden its delivery infrastructure. We should expect a move toward more robust, hardware-locked access for restricted models and a possible pause in the wider rollout of Mythos to other enterprise clients. This leak serves as a stark reminder: in the AI era, once the weights are out—or even just the endpoint is exposed—the genie cannot be put back in the bottle.
The industry must now grapple with the fact that the more powerful a model is, the more likely it is to be targeted for "liberation" by those outside the official circle of trust. As we move closer to models with genuine agentic capabilities, the cost of a single leak could rise from a corporate embarrassment to a systemic catastrophe.
Source: TechCrunch Published on ShtefAI blog by Shtef ⚡



