They literally asked for it. Two days ago Amodei wrote an essay urging the government to regulate them. He explicitly cited Mythos, as proof that frontier AI has acquired autonomous hacking capabilities that threaten critical infrastructure and national security.
"Mythos Preview scrambled the global cybersecurity landscape. But its broader significance is that it proves beyond doubt that AI models are now tools of global and national strategic consequence."
"The government should have the power to block or deter deployment of the model if it is determined, in light of third-party assessment, to present unacceptable risks. This power must be scoped to the above four specific risks and there must be protective measures against political favoritism or arbitrary decisions"
A third-party demonstrated that it was possible to jailbreak the safety measures of Fable to access the raw Mythos abilities. Abilities which Anthropic say are too dangerous for the public.
Edit. From David Sacks:
— A highly credible trusted partner of both Anthropic and the USG who was testing Fable came forward with a jailbreak of those guardrails. The Admin asked Dario to fix the jailbreak or de-deploy the model. Dario refused.
— In their blog post, Anthropic defended its decision by saying the jailbreak isn’t serious. That is not what the trusted partner and the USG believe; nor is that kind of minimizing language consistent with Anthropic’s brand as the AI safety company. It’s difficult to fathom how they could claim a jailbreak allowing operability of a cyber weapon could be defined as not “serious".
David Sacks could not be further from a reliable or impartial narrator on this topic.
And before someone calls this an ad hominem, it isn’t; I am not saying he is bad or morally wrong or anything else (you are free to think that or not, as am I).
But Sacks has skin in the game. And that makes him both unreliable and partial.
I'm sure it's also a step towards requiring id and limiting access for us plebians to real power and keeping it for maintaining or growing power of those in charge. It's all an excuse to give us a Westworld season 3. Probably a better example out there..
> A third-party demonstrated that it was possible to jailbreak the safety measures of Fable to access the raw Mythos abilities. Abilities which Anthropic say are too dangerous for the public.
Pressure test this assumption before getting behind this position.
That is a strawman. My contention is what you just implicitly acknowledged - there is not information put out yet to validate the quoted claim. There are claims to the contrary, as well, from Anthropic themselves.
Anthropic’s claims are as follows if you read their post:
* this is not a universal jailbreak method
* the jailbreak affords you the same capabilities you get already with other models, not Mythos.
In this situation it’s which party do you trust more and history would suggest this administration is very playful with the truth, especially when it comes to economically damaging the company that’s become their political enemy
The existence of a jailbreak free llm in 2026 is extremely contentious to me. You can argue about the specifics of this exact jailbreak, but generally pliny and amazon both reported mythos jailbreaks in <7 days. It seems very reasonable to expect that a well funded state actor could achieve better results given significantly more funding, determination and most importantly unfettered access.
Nobody here is claiming fable is jailbreak free. Not anthropic and not in this thread. This was known before launch. The question remains one of degree and capabilities.
Yeah, if you're arguing that "this, according to anthropic, existentially dangerous model has only had its safeguards partially circumvented so we shouldn't step in" ... it's hard for me to take you seriously?
Put another way, the thing we are all concerned with is the complete circumvention of safeguards that is normally possible with llms. If you _aren't_ arguing that this isn't possible, you're not engaging in discussing the the thing that is concerning to regulators or those discussing the regulation.
A disappointing trend is to frame the opposing argument in extreme terms rather than engaging with the substance of the assertion.
The latter portion is grand standing about how incredulous the commenter is that someone might trust an LLM company about the strength of their harnesses' if-then-else statements for request routing.
The one I quoted, which contradicts Anthropic’s post and has no supporting evidence publicly available. That a jailbreak was found that accesses the model’s _raw_ capabilities. Something Anthropic has explained was not the case.
It is pretty clear, no? Anthropic claims that the jailbreaks they were made aware of did not access the model’s raw capability, explained that there are protections to mitigate the impact of successful jailbreaks, etc. Coming here and stating something to the contrary with zero explanation or actual evidence is the assumption.
“This power must be scoped to the above four specific risks and there must be protective measures against political favoritism or arbitrary decisions.”
Yes, and rape victims are "asking for it" by wearing short skirts. I thought we stopped with this nonsense a couple decades ago?
There's a huge difference between "we want regulation", and the government swinging it's dick at random.
If the government had said, a week ago, don't release Fable? That wouldn't have gotten nearly this reaction. And the government has known that these capabilties exist since they were announced TWO MONTHS AGO.
It should be easy for a company like Anthropic to prove this beyond a doubt. Why don't they? Why don't they have a collection of prompts and side-by-side comparisons with other models showing how far ahead they are?
I think it's mainly because the difference in models at the frontier isn't "response to prompt X", but rather "coherence with 500K tokens of context and instructions in play"
Mercury-2 is amazing. I am using it frequently as the arbiter in llm-consortium
The context window is relatively small, so to make it work with larger consortiums I can construct a recursive sort-of meta consortium like this:
I've found the average output of many suboptimal models is still suboptimal, especially when it comes to judging the accuracy/correctness of the work of other models.
I did some benchmarks recently of how well various models find security vulnerabilities, and then follow up testing of the judging process of whether the models found the right bug and whether other bugs it reported were false positives or legitimate other bugs. A committee of good-not-great models (DeepSeek, MiMo, Gemma 4) cannot replicate the accuracy of Opus by itself. Even when all three of the other models disagreed with Opus, Opus was almost always the one that was actually right.
It's an interesting area for research. And, a model that's very fast can make a lot more attempts at a solution, and in cases where there is an unambiguous "right" solution that can be proven by some sort of static rule, "very fast" may be a useful characteristic. Small classification problems, where you need to make thousands of decisions about some specific aspect of a large corpus of data, seems like a sweet spot for a model like Mercury.
I have had a better experience with my own use. I use it every day and it rarely fails to improve tasks. Perhaps the prompts and rubrics make a difference. And finding bugs is one of the better use cases because it is essentially a search problem. As long as models are non-deterministic and there is some diversity in training data, then an ensemble that iterates on the problem is more likely to cover the ground needed to find solve a problem.
Some tasks benefit from this approach more than others. There was a paper from google on a version they made which was very similar and achieved SOTA then on planning and pathfinding benchmarks.
Price is based on perceived value, not cost to produce. There is no international court of price justifications; if customers are willing to pay $X you can charge $X.
Exactly. The company should care because it drives margins. But pricing to customers should not change unless it was artificially high (competitors offer more value for same money) due to profitability concerns.
"we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).
...
Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."
This is just the sales team doing their thing, applying the Law of Scarcity to drive demand.
It's the same exact speed as opus >=4.5, sonnet 4.5, and twice the speed of opus <=4.1
It must have about the same active parameters, or else its a larger model running in turbo mode (smaller batches) and being heavily subsidized for some reason. But given most of the benchmarks are within 5% I doubt it is a much larger model. Most perplexing.
Anthropic has again changed the set of benchmarks they use[0]. This time they have also moved all benchmark scores to the PDF. At a glance it looks like it gains about ~5-10% over other models. the speed is about the same as opus >=4.5, sonnet 4.5, and double the speed of opus <=4.1
Edit: Also in the system card...
"we’ve
implemented new interventions that limit Claude’s effectiveness for requests targeting
frontier LLM development (for example, on building pretraining pipelines, distributed
training infrastructure, or ML accelerator design).
...
Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts,
these safeguards will not be visible to the user."
llm-consortium: prompts multiple models in parallel, loops until confidence_threshold, and iteratively refines a response.
This was inspired by a karpathy tweet [0] and the prototype created using another tool of mine: The LLM Plugin Generator plugin (essentially a curated collection of plugins for simonws llm cli as a few-shot prompt)
The llm-model-gateway companion plugin lets you serve models from the LLM cli as a an openai API. This allows you to use saved consortiums in your various clients as if they where a regular model. Bringing massive parallel reasoning to any workflow.
It occured to me at some time that an collection of parallel LLMs was not really a consortium. A consortium is a group of organizations. A group of groups. To rectify this I added for actual consortiums, where each member of an llm-consortium can itself be a consortium of models. e.g.
llm consortium save cns-glm-n3 -m glm-5.1 -n 3 --arbiter mercury-2
llm consortium save cns-k2-n3 -m kimi-k2.6:3 --arbiter mercury-2
llm consortium save cns-meta-glm-k2 -m cns-k2-n3 -m cns-glm-n3 --arbiter cns-k2-n3
Yes, even the arbiter/judge can be comprised of a consortium of models, bringing parallel reasoning to the task of judging parallel reasoning chains.
Consortiums can also now contain groups of specialists. These custom user-defined expert characters address the prompt from a different perspective. And a Westworld style Attribute matrix can be randomized to inject some more entropy into the process.
classifai
generates labels with approximate confidence derived from logprobs
llm-alias-options
saves inference parameters such as reasoning effort with a model alias. (good for setting the provider in openrouter or creating a consortium of high temperature models)
llm-prompt-json
adds a --json flag to return the llm logs object (good for getting conversion_id, or reasoning output in scripts)
llm-jina adds support for all jina AI specialised models and tools like web fetching, embedding and reranking.
A third-party demonstrated that it was possible to jailbreak the safety measures of Fable to access the raw Mythos abilities. Abilities which Anthropic say are too dangerous for the public.
Edit. From David Sacks:
reply