Hacker Newsnew | past | comments | ask | show | jobs | submit | irthomasthomas's commentslogin

They literally asked for it. Two days ago Amodei wrote an essay urging the government to regulate them. He explicitly cited Mythos, as proof that frontier AI has acquired autonomous hacking capabilities that threaten critical infrastructure and national security.

  "Mythos Preview scrambled the global cybersecurity landscape. But its broader significance is that it proves beyond doubt that AI models are now tools of global and national strategic consequence." 


  "The government should have the power to block or deter deployment of the model if it is determined, in light of third-party assessment, to present unacceptable risks. This power must be scoped to the above four specific risks and there must be protective measures against political favoritism or arbitrary decisions" 
https://darioamodei.com/post/policy-on-the-ai-exponential

A third-party demonstrated that it was possible to jailbreak the safety measures of Fable to access the raw Mythos abilities. Abilities which Anthropic say are too dangerous for the public.

Edit. From David Sacks:

  — A highly credible trusted partner of both Anthropic and the USG who was testing Fable came forward with a jailbreak of those guardrails. The Admin asked Dario to fix the jailbreak or de-deploy the model. Dario refused.

   — In their blog post, Anthropic defended its decision by saying the jailbreak isn’t serious. That is not what the trusted partner and the USG believe; nor is that kind of minimizing language consistent with Anthropic’s brand as the AI safety company. It’s difficult to fathom how they could claim a jailbreak allowing operability of a cyber weapon could be defined as not “serious".

David Sacks could not be further from a reliable or impartial narrator on this topic.

And before someone calls this an ad hominem, it isn’t; I am not saying he is bad or morally wrong or anything else (you are free to think that or not, as am I).

But Sacks has skin in the game. And that makes him both unreliable and partial.


Cynically: this is an attempt to quash open source or discount model competition through regulatory capture.

I'm sure it's also a step towards requiring id and limiting access for us plebians to real power and keeping it for maintaining or growing power of those in charge. It's all an excuse to give us a Westworld season 3. Probably a better example out there..

> A third-party demonstrated that it was possible to jailbreak the safety measures of Fable to access the raw Mythos abilities. Abilities which Anthropic say are too dangerous for the public.

Pressure test this assumption before getting behind this position.


I will certainly revisit it as more information comes out, but is it your contention that Anthropic solved jailbreaking with Mythos?

What you claim contradicts Anthropic’s statements. I assume that is the contention.

That is a strawman. My contention is what you just implicitly acknowledged - there is not information put out yet to validate the quoted claim. There are claims to the contrary, as well, from Anthropic themselves.

In the absence of information, maybe it’s better to ask which claim is more extraordinary.

That,

A. Anthropic solved the llm jailbreak problem with mythos (despite no claim to have done so on their part)

B. That a full jailbreak of mythos is possible.


That’s not what the claim is though.

Anthropic’s claims are as follows if you read their post:

* this is not a universal jailbreak method

* the jailbreak affords you the same capabilities you get already with other models, not Mythos.

In this situation it’s which party do you trust more and history would suggest this administration is very playful with the truth, especially when it comes to economically damaging the company that’s become their political enemy


There is not an absence of information.

There is information, from Anthropic, concerning the jailbreaks that motivated this action, that directly contradicts the statement.

There is just an absence of information backing the statement I responded to.

I find it so odd this is apparently so contentious a take.


The existence of a jailbreak free llm in 2026 is extremely contentious to me. You can argue about the specifics of this exact jailbreak, but generally pliny and amazon both reported mythos jailbreaks in <7 days. It seems very reasonable to expect that a well funded state actor could achieve better results given significantly more funding, determination and most importantly unfettered access.

Nobody here is claiming fable is jailbreak free. Not anthropic and not in this thread. This was known before launch. The question remains one of degree and capabilities.

Yeah, if you're arguing that "this, according to anthropic, existentially dangerous model has only had its safeguards partially circumvented so we shouldn't step in" ... it's hard for me to take you seriously?

Put another way, the thing we are all concerned with is the complete circumvention of safeguards that is normally possible with llms. If you _aren't_ arguing that this isn't possible, you're not engaging in discussing the the thing that is concerning to regulators or those discussing the regulation.


Im pointing out what is the argument. You were saying it is something different.

Now you add the word "complete". Anthropic IS arguing _complete_ circumventing is NOT possible.


A disappointing trend is to frame the opposing argument in extreme terms rather than engaging with the substance of the assertion.

The latter portion is grand standing about how incredulous the commenter is that someone might trust an LLM company about the strength of their harnesses' if-then-else statements for request routing.

Why bother with an unsubstantial comment?


What assumption?

The one I quoted, which contradicts Anthropic’s post and has no supporting evidence publicly available. That a jailbreak was found that accesses the model’s _raw_ capabilities. Something Anthropic has explained was not the case.

It is pretty clear, no? Anthropic claims that the jailbreaks they were made aware of did not access the model’s raw capability, explained that there are protections to mitigate the impact of successful jailbreaks, etc. Coming here and stating something to the contrary with zero explanation or actual evidence is the assumption.

“This power must be scoped to the above four specific risks and there must be protective measures against political favoritism or arbitrary decisions.”

> They literally asked for it.

Yes, and rape victims are "asking for it" by wearing short skirts. I thought we stopped with this nonsense a couple decades ago?

There's a huge difference between "we want regulation", and the government swinging it's dick at random.

If the government had said, a week ago, don't release Fable? That wouldn't have gotten nearly this reaction. And the government has known that these capabilties exist since they were announced TWO MONTHS AGO.


It should be easy for a company like Anthropic to prove this beyond a doubt. Why don't they? Why don't they have a collection of prompts and side-by-side comparisons with other models showing how far ahead they are?

I think it's mainly because the difference in models at the frontier isn't "response to prompt X", but rather "coherence with 500K tokens of context and instructions in play"

according to this opencode and cursor cli perform better than claude code: https://x.com/kunchenguid/status/2065345999682568593

The analysis at the bottom directly contradicts the statement.

I am experimenting with LFM2.5-8B-1A and getting 250tps on a 3060

Mercury-2 is amazing. I am using it frequently as the arbiter in llm-consortium The context window is relatively small, so to make it work with larger consortiums I can construct a recursive sort-of meta consortium like this:

  llm consortium save cns-glm -m glm-5.2 -n 5 --arbiter mercury-2 --judging-method rank

  llm consortium save cns-kimi -m k2.6 -n 5 --arbiter mercury-2 --judging-method rank

  llm consortium save cns-meta-glm-kimi -m cns-glm -m cns-kimi --arbiter mercury-2 --judging-method synthesis
Now when I prompt cns-meta-glm-kimi it will pick the best of five from kimi and glm before creating a synthesis from the two winners.

I've found the average output of many suboptimal models is still suboptimal, especially when it comes to judging the accuracy/correctness of the work of other models.

I did some benchmarks recently of how well various models find security vulnerabilities, and then follow up testing of the judging process of whether the models found the right bug and whether other bugs it reported were false positives or legitimate other bugs. A committee of good-not-great models (DeepSeek, MiMo, Gemma 4) cannot replicate the accuracy of Opus by itself. Even when all three of the other models disagreed with Opus, Opus was almost always the one that was actually right.

It's an interesting area for research. And, a model that's very fast can make a lot more attempts at a solution, and in cases where there is an unambiguous "right" solution that can be proven by some sort of static rule, "very fast" may be a useful characteristic. Small classification problems, where you need to make thousands of decisions about some specific aspect of a large corpus of data, seems like a sweet spot for a model like Mercury.


I have had a better experience with my own use. I use it every day and it rarely fails to improve tasks. Perhaps the prompts and rubrics make a difference. And finding bugs is one of the better use cases because it is essentially a search problem. As long as models are non-deterministic and there is some diversity in training data, then an ensemble that iterates on the problem is more likely to cover the ground needed to find solve a problem.

Some tasks benefit from this approach more than others. There was a paper from google on a version they made which was very similar and achieved SOTA then on planning and pathfinding benchmarks.

edit:

Mind Evolution paper https://deepmind.google/research/publications/122391/

(That was a month after I published llm-consortium :) https://xcancel.com/karpathy/status/1870692546969735361


Is it a larger model or just better trained? Anthropic does not actually claim it is a larger model anywhere that I can see.

If it’s not larger, it’d be tough to justify the massive price increase for using it.

Price is based on perceived value, not cost to produce. There is no international court of price justifications; if customers are willing to pay $X you can charge $X.

That and a model can be the same size, yet use a lot more compute, I guess think of it as intelligence per watt used or something like that.

Exactly. The company should care because it drives margins. But pricing to customers should not change unless it was artificially high (competitors offer more value for same money) due to profitability concerns.

Opus 4.7 was smaller and people still paid 4.6 prices.

gpt-5.5 isn't larger than gpt-5.4 but costs double.


"we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).

...

Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."


Where is this text coming from?

[edit] -- I see that this comes from the system card -- dang merged the comments from the other discussion so that explains the confusion.


This is just the sales team doing their thing, applying the Law of Scarcity to drive demand.

It's the same exact speed as opus >=4.5, sonnet 4.5, and twice the speed of opus <=4.1

It must have about the same active parameters, or else its a larger model running in turbo mode (smaller batches) and being heavily subsidized for some reason. But given most of the benchmarks are within 5% I doubt it is a much larger model. Most perplexing.


It could be a much bigger MoE model

Then it would be slower.

Anthropic has again changed the set of benchmarks they use[0]. This time they have also moved all benchmark scores to the PDF. At a glance it looks like it gains about ~5-10% over other models. the speed is about the same as opus >=4.5, sonnet 4.5, and double the speed of opus <=4.1

                          Mythos 5 Fable 5 MythosPrev Opus 4.8 GPT-5.5 Gemini 3.1 Pro
  SWE-bench Pro             80.3       80        77.8       69.2      58.6       54.2
  SWE-bench Ver             95.5       95        93.9       88.6       -         80.6
  Terminal-Bench            88.0      84.3        -         82.7      83.4         -
  BrowseComp (Single-Agent) 88.0       -        87.9       84.3      84.4       85.9
  BrowseComp (Multi-Agent)  93.3       -          -         88.5       -           -
  HLE (No tools)            59.0      -       56.8      49.8      41.4        44.4
  HLE (Tools)                64.5      -        64.7     57.9      52.2       51.4
  CharXiv Reasoning (No tools) 88.9       -         86.2       80.5       -         -
  CharXiv Reasoning (Tools)    93.5       -         92.5      89.9      -         -
  BioMystery Bench (Human)     83.9       -       82.6     80.4       -         -
  BioMystery Bench (Hard)    46.1       -         29.6     40.0       -         -
  OSWorld-Verified          85.0      85.0       85.4       83.4      78.7      76.2*
  CritPt                     28.6       -       20.9       27.1      17.7       -
  ArxivMath                  78.5      68.7       71.8       71.5      64.0       -
[0] https://news.ycombinator.com/item?id=48312633

Edit: Also in the system card... "we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).

...

Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."


It's announced as a revolution but when you look at those benchmarks it surely looks like an iteration.

llm-consortium: prompts multiple models in parallel, loops until confidence_threshold, and iteratively refines a response.

This was inspired by a karpathy tweet [0] and the prototype created using another tool of mine: The LLM Plugin Generator plugin (essentially a curated collection of plugins for simonws llm cli as a few-shot prompt)

The llm-model-gateway companion plugin lets you serve models from the LLM cli as a an openai API. This allows you to use saved consortiums in your various clients as if they where a regular model. Bringing massive parallel reasoning to any workflow.

It occured to me at some time that an collection of parallel LLMs was not really a consortium. A consortium is a group of organizations. A group of groups. To rectify this I added for actual consortiums, where each member of an llm-consortium can itself be a consortium of models. e.g.

llm consortium save cns-glm-n3 -m glm-5.1 -n 3 --arbiter mercury-2

llm consortium save cns-k2-n3 -m kimi-k2.6:3 --arbiter mercury-2

llm consortium save cns-meta-glm-k2 -m cns-k2-n3 -m cns-glm-n3 --arbiter cns-k2-n3

Yes, even the arbiter/judge can be comprised of a consortium of models, bringing parallel reasoning to the task of judging parallel reasoning chains.

Consortiums can also now contain groups of specialists. These custom user-defined expert characters address the prompt from a different perspective. And a Westworld style Attribute matrix can be randomized to inject some more entropy into the process.

[0]https://xcancel.com/karpathy/status/1870692546969735361

Some other llm plugins I vibe coded:

classifai generates labels with approximate confidence derived from logprobs

llm-alias-options saves inference parameters such as reasoning effort with a model alias. (good for setting the provider in openrouter or creating a consortium of high temperature models)

llm-prompt-json adds a --json flag to return the llm logs object (good for getting conversion_id, or reasoning output in scripts)

llm-jina adds support for all jina AI specialised models and tools like web fetching, embedding and reranking.


I'm quite curious about this.

I think this is similar. Unfinished. https://github.com/mattjoyce/roundtable-consensus


Great project! I often check the opinion of one model against others when doing research and a sort of consensus process would save many a c/p

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: