The Mythos Model, Code Scanning, and the Other Side of the Coin

Matt Shea

Matt is Chief Strategy Officer for MixMode AI. With over 20 years’ experience in the technology space, Matt has concepted, architected and developed groundbreaking solutions that blend equal parts of technology, product and business.

Recently there has been a lot of coverage over the Mythos model from Anthropic and its ability to examine code bases to discover vulnerabilities that could be exploited in order to build zero-day cybersecurity attacks. This capability seems to be a step function above prior models — from Anthropic itself and other providers. This is an important advancement and should rapidly provide more secure code bases for both existing projects and tools, as well as being part of the development pipeline for new products.

For example, FFmpeg, a large, highly utilized open source project around video encoding, had a 16 year old vulnerability patch provided by Anthropic, ostensibly from Mythos examining the code base. FFmpeg is a 25+ year old toolset used by some of the largest organizations on the Internet like YouTube and Netflix for video encoding and has over 1.5 million lines of code and contributions from 2,400 developers. It isn't a surprise that there was a potentially exploitable vulnerability present, that is common. What was surprising was that it was missed for so long in a mature code base that regularly does code scanning and security passes and Mythos found it automatically.

This reported capability is a big step in the right direction for improving existing products and writing secure code going forward, no doubt.

While this advancement will increase our capabilities on the defense side, it should be noted that just like almost any other advancement in the past, this type of technology can change the playing field both for cybersecurity operators as well as adversaries.

Anthropic has announced Project Glasswing, where they have shared Mythos with select organizations with no plans for public availability. Reportedly in part, due to the concern of its weaponization.

It is worth highlighting several concerns that should be addressed from the offensive side.

‍

1. Adversarial Adoption

There is a very real expectation that these tools, or others that follow rapidly, will be adopted by adversarial actors. It is reasonable to expect that they could dramatically accelerate their production of exploits by finding 'lying-in-wait' vulnerabilities in code bases, whether in public open source or private code bases. Instead of laborious human discovery and development, exploits can now automatically be built. Marrying that capability with agentic AI harnesses to orchestrate attacks is a massive lever for hackers to utilize against critical infrastructure.

2. The Patch Deployment Gap

We need to be realistic about the timing of update paths for any patches for critical systems finding their way into production. It is one thing to have a patch available in order to address vulnerabilities in a piece of software, but it is a whole other thing to get that patch deployed across thousands of organizations. Each will have different processes and timelines that they abide by in order to make sure they have an appropriate balance of operational stability versus risk assessment from a security standpoint. We saw with Log4j the mess that a large install base had to go through to address that one vulnerability. Imagine critical patches coming in constantly, which will put security teams in a bind of when to patch which can impact operations and stability. In addition, it can ironically open up more channels for supply chain attacks with unauthorized builds of common software to trojan horse in exploits.

3. Pressure to Burn Stockpiled Zero-Days

There is a perverse pressure now for adversarial nation states and criminal organizations to accelerate their utilization of stockpiled zero-day attacks. These exploits are expensive to build or purchase on the black market, and a "use it or lose it" moment may be fast approaching. As these scanning tools proliferate and begin finding and patching the very vulnerabilities those stockpiled exploits depend on, the window of opportunity for adversaries narrows — creating an incentive to deploy them before they lose their value.

4. Cost

Utilizing LLM-powered models like those provided by Anthropic and OpenAI, there is a cost for every token that is generated. It is also widely believed that these models, while utilizing extremely large GPU farms in order to execute, are highly subsidizing end users based on fundraising that has occurred at levels never seen before. In the case of the FFmpeg example that has been reported, the token cost would have been in the tens of thousands, and if you remove subsidization, the absolute cost may have even been in the low six figures. This highlights that utilizing these tools is not free — not all organizations will have access at these levels from a cost perspective, and well-funded adversaries may have the advantage given their incentives.

Beyond Code Scanning

Code scanning is an important part of the cybersecurity toolbox, and what is being reported from Anthropic is a dramatic improvement. It is a good opportunity to think about the other pillars of a successful cyber program.

In particular, along with solid authentication and authorization controls of identity systems and architectural approaches like zero trust, it highlights the importance of having a behavioral understanding of what is and is not expected in an environment. This autonomous anomaly detection has been a brass ring for the industry, and it is a hard problem. It is also an area for which the current approach of LLMs and transformer technology — the same technology delivering dramatic breakthroughs on code creation and code scanning — is not appropriately suited, for a variety of reasons.

Behavioral deviation detection needs to automatically self-learn an environment and contextually understand what is expected in a given moment in time, with a given entity, within a given environment — and dynamically predict to compare what's actually happening. This needs to happen in real time and at high scale in order to deliver the speed needed to prevent damage from being done. LLMs and GPTs by their very nature require tremendous training runs across large corpora of textual data and large GPU farms in order to produce results. That architecture is not built for the real-time, continuous, self-learning detection that this problem demands.

The advancement in code scanning is welcome and overdue. But code scanning alone doesn't stop an attacker who has access inside your environment. The industry needs to be thinking just as seriously about real-time behavioral detection — built on architectures actually suited to that problem — as it is about the next breakthrough in static analysis.

Signup for the MixMode Wave Newsletter

Your Monthly Resource for the Latest News, Events and Resources