This just made any closed LLM a huge supply chain risk. Everybody was aware of this possibility, but now it actually happened. It's like having nuclear weapons vs. firing a nuclear weapon.
Especially outside the US customers are going to be very hesitant to keep adopting LLMs from US companies.
> Especially outside the US customers are going to be very hesitant to keep adopting LLMs from US companies.
Not really. There aren't any other choices, and the PRC also heavily utilizes export controls [0].
This is why sovereign AI has become important, as can be seen with EU NatSec uses cases tending to use Mistral [1] and Indian governments starting to use Sarvam [2].
That said, for most commercial usecases, older generations of Opus as well as enterprise grade GPT and Gemini are fairly good.
The distilled OSS models are alright for hobbyists but if you have actually used unrestricted and enterprise grade versions of Claude, Mythos, GPT, and Gemini (most hobbyists don't get access to these) you see how far behind the open weight models are.
Even in China, traditionally open minded models teams like Alibaba's Qwen are looking to become more restricted given the org changes [3].
Also, Corporate RFCs now demand final say on model used and depending on the geo, this can be a dealbreaker (eg. An American financial institution will absolutely blacklist a vendor if they use a Chinese model and same in reverse and European defense vendors mandate sovereign EU models depending on the opportunity).
> if you have actually used unrestricted and enterprise grade versions of Claude, Mythos, GPT, and Gemini you see how far behind the open weight models are.
I really do feel like DeepSeek V4 Pro is often better than current Sonnet is, in the general case.
Opus 4.7 is a solid step above Sonnet, and Fable was a solid step above Opus 4.7. I've only had Fable for a few days, obviously, but I was decently impressed after Opus 4.8 being a downright disappointment for me (it's just too buggy; I had it go out of control 3 separate times on things Opus 4.7 never had any trouble with.) I still ran into limitations. It's not world-endingly great.
So, based on that, I think DeepSeek V4 Pro is, ignoring multi-modal capabilities, about a couple solid steps behind. Assuming model iteration will continue to decelerate, especially as Anthropic heads into IPO, I'm guessing that DeepSeek will probably be able to strike back with something further along. Of course we'll see how able and willing they are to stay open weight, but they've done well so far so, no reason to doubt them at the moment.
(There are some models that claim to be ahead of DeepSeek V4 Pro. I've tried some of them and really not been that impressed. Maybe it's a me issue.)
Now I reckon that most people just simply don't really need Mythos/Fable for most of what they do and using Mythos/Fable tokens in place of Sonnet-tier models would not make any sense. At my job we already mostly just use Sonnet as it is. I'm sure there is some cutting-edge research where you want the absolute best model available and sure, in that case, you're stuck with Anthropic for the moment.
But is that really everyone? After all, while Mythos was dominating the hype cycles, quite a lot of impressive LLM-assisted CVEs dropped that were not linked to Mythos.
Compute was constrained. There is a lot happening, especially with chinese chips which currently points to a massive upcoming increase in non-US capacity.
Also, the EU, Japan, SK, ASEAN, and India are not supportive of using Chinese tech after China export controlled rare earth exports last years [0].
Software supply chain regulations also make utilizing Chinese software risky for ExChina players and make using ExChina tech risky for Chinese players.
Expect to see RFCs now demanding visibility into what models are used and right of refusal - this is already the norm in F1000s. Similar ones are likely to arise in the EU as well with some of the upcoming industrial policy changes being proposed.
you think its that hard to get trade secrets from some openai or anthropic engineer if you promise them anonymity and a new better paid position? hell they might even give it away for free if they think what their company is doing is morally wrong. know how is not source code, you cant catch it with dlp or online leak scanners. you would need 24/7 combined human and electronic surveillance and thats something even the cia reserves for top level targets, it takes too much manpower to use it on everyone.
SMIC hired hundreds of TSMC employees and now its a couple years away from 3nm equivalent chips in full production. export controls only work against poor countries with less advanced industry like russia. china has the resources and export controls give the motivation. and if the eu/us relations get even worse i wouldnt be surprised if the dutch government let asml start selling euv machines to everyone just to get back at trump.
If you’re talking about TSMC Arizona they aren’t fabricating at N3 until end of next year at the earliest, N2 isn’t slated until “end of decade”. I think they’re manufacturing Blackwell there which is N4 / 4nm
Exactly. Just look at what they are really useful right now. Running LLMs in feedback-loops (agents) so they can try out random-ish approaches until some verification function passes (tests).
It's like the infinite monkeys on typewrighters that will type whatever you are looking for, given infinite time. LLMs are just tuned to much better odds than the monkeys are. But it's still a lot of randomness, with random results.
> It's like the infinite monkeys on typewrighters that will type whatever you are looking for, given infinite time.
In the monkey example the infinite time is doing a lot of work there. The fact that LLMs can search through semantic space and find reasonably correct paths in a reasonable time is directly tied to the reason why they are valuable.
Saying "these two things are similar except one can be useful and one can't" is not a great comparison.
For me the real lesson learned isn't how "smart" LLMs are, but rather how much human work is basically reducible to repeating past work with minor variation. Human's believe they are "reasoning" but so much code writen is just the human brain doing the same autocomplete style work that LLMs can do now.
The point is that it's the same process with—much—better priors.
This seems like a reasonable view to me. It's surprising just how much better priors matter and how we can develop those priors by training on a bunch of text. But it also explains, or at least hints at an explanation, for why LLM capabilities are so jagged, and in such inhuman ways.
It’s real weird to see people argue that LLM output is no different than random gibberish and then handwave over the fact that it’s clearly not with terms like “training”, as if a steam of random garbage is trainable.
I quite literally created and productized predictive linguistics and behavioral vectors at Google.
If you had stopped to consider what I explained; you’d understand that it’s the process of turning random garbage into increasingly acceptable outputs.
Ie training the monkeys.
The insight you are missing is the rule of networked scale. It turns out that any reactive node scaled enough can form sophisticated predictive system given reward over a training topography, even if it starts out at garbage or is literally made of monkeys.
So it is garbage. And you can turn garbage into semi-intelligence.
A human child is born with no ability to speak intelligibly. All they can do is babble. Through years of training they gain the ability to speak intelligibly and communicate in advanced ways.
The act of successful training means it’s not garbage anymore.
> So it is garbage.
This statement is ultimately meaningless and I continue to find it weird that someone who works in this space would support this view. If you fundamentally change the nature of a thing, it’s no longer that original thing. Is tan HDD still random garbage after you fill it with family photos just because that’s how it starts?
If you start with a fire hose of literal sewage and install a series of filters culminating in a reverse osmosis step that pours clean drinking water out, the product is not shit even if the original input was.
I don’t believe that you can’t understand the distinction between “at one point this was garbage” and “at the present time this is still garbage”. You’re clearly smarter than that.
> but so much code writen is just the human brain doing the same autocomplete style work that LLMs can do now.
That's the part they are really good at. But they are really bad at taking complex decisions. Most of them are just guesses from a finite amount of solutions they were trained on, or from options they have in context.
Yes, and your ability to remember a relatively few things that happened years ago is predicated on your ability to also forget most things that happen to you - like what you had for dinner last week. Good thing we have technology to fill in the gaps.
And nothing about this makes your initial comment any less goofy. Anyone who has ever had to make a difficult decision knows more than half the battle is preparation. Where do you think complex decisions come from? Have current events left you with the impression that people just waltz into idk say the Situation Room and just big brain their way through world events? That's how the current administration seems to think the world works, with quite predictable results.
Society is already algorithmic. To optimize for humans being dumb. AI is nothing more than another advance along this continuum. No one is impressed by your ability to remember something years ago, many if not most mammals have the same capability. Human recall is also notoriously bad in many cases - see numerous studies on the reliability of eye witnesses testimony.
AI is smart because most people are dumb. Come to terms with the fact that your anthropocentrism need not be based on a notion of intellectual supremacy and you'll be a far less tedious person to deal with.
Hmm saying it’s random-ish is doing it a disservice. I understand it’s a stochastic process but there’s definitely some level of understanding. Not at the level of lived experience but usually an LLM with vision capabilities can call a spade a spade and do something useful with it. And when a verification function shows how they are wrong then they usually come with a better and more informed approach.
So I can’t fully see how that’s related to the infinite monkeys. A typewriting monkey doesn’t have access to a verification function. And even if it did, it would not be the original concept anymore with infinite typewriting monkeys producing the works of Shakespeare.
Nevertheless, I upvoted your comment because it’s definitely insightful.
Feedback loops certainly seem to give them some level of understanding.
Agent reads a skill file about how to use a CLI tool. It tries to use the tool but gets an error about the input format. It tries again with a different format based on the error message, and sees that command succeeded. It compares what worked to what was in the skill file and notes the difference. On future invocations it continues to use the new format.
They get trained before release. On general information. But they don't improve while working on very specific tasks. Every new session is like an experienced human on their first day at a new job.
Unique skills and jobs do exist. And LLMs can't gain additional knowledge "on the job" like humans can. They are generalists, that can only be steered by prompts, skills and context. Thats all I'm saying.
This fact is currently the most limiting factor for LLMs.
I think they were extremely scared of 4o at that point, and were scared it could trigger some horrible event. Documented cases of severe psychosis because of AI started to surface at that time.
Just imagine what would've happened if a major terrorist attack was a result of someone getting mentally ill from AI, without the safety filters recognizing the danger.
The robotic tone was probably from over-correcting the sycophantic tendencies of 4o.
I think they've brought back a "personality" of sorts to ChatGPT 5.x. I've caught it more than once explaining something to me and saying "In my personal opinion", or "I personally enjoy <thing> the most". Which is always jarring, it doesn't "personally" or "enjoy" anything. We could be discussing videogames and it tells me which games "it personally enjoys the most". Bizarre.
Because it valued human connection over factual correctness.
LLMs lack the intelligence and emotions to realize when they have to stop being friendly and supportive, because it becomes unethical to continue being supportive.
It's totally fine that Apple doesn't release this feature for EU customers. If they think they can still sell enough phones it's also fine I guess.
What's not fine, is to blame the EU for the missing feature. It's damaging their brand and damaging their reputation. Just think about if Porsche would make a press release and calling the US tariffs "un-American". Wouldn't be perceived well either.
> Tell me which company in your opinion would be in the LOUD headlines, Apple or the random 3rd party?
The world I want to live in is not the one where apple claims responsibility for every byte of my data which passes through their products.
I think web browsers are a nice comparison here. Chrome added some nice security features (e.g. safe browsing) which are broadly a good thing for reducing harm from websites, but at the same time if you go to a dodgy website and they harvest all your personal details no one blames chrome for that.
No doubt AIs are an interesting use case because of the sheer volume of personal data involved, but if I want to trust some other AI app like gemini or chatGPT with my data then why should I be restricted from doing that?
Sorry, Apple has to be dragged, kicking and screaming to allow app store alternatives, that they charge offensive amounts for "to ensure your security" and has Draconian review rules on the App Store "to ensure your security".
Sure, 3rd party will get some shit. But if Apple neither protected me on their App Store _or_ on the app stores that they extort, what the fuck is their racket for? As long as Apple keeps this behaviour, they deserve to have their cornflakes pissed in.
"I own data that may be stored on Apple's servers. Apple's operating system syncs that data to my local computer in a secure fashion. Apple's operating system exposes low-level APIs that allow other applications to access that data. I can install third party applications. Those third party applications can access that data and may exfiltrate it to an unvetted 3rd party cloud."
Your turn smart guy: Am I talking about iOS or MacOS?
I can't find the problematic statement. Off course the tariffs are a threat to the financial success of German car manufacturers, and they need to keep their investors updated.
The DMA is also threatening Apple's high profit margins. That's the whole point of the DMA.
It is multiple things at once. It's a typical antitrust law, to increase competition. Enough competition usually leads to lower prices and lower margins.
Such political statements never damage the brand for every citizen, but for some.
Tesla is a good example. Elon Musk became political and anti-EU, which resulted in an irreparable damage of the Tesla brand in Europe. Not for everyone, but a big group of people would never again consider buying a Tesla. As a result Tesla lost market share in Europe.
> This is essentially a backdoor into all of your data.
No. Only if you would consider the Linux/macos/windows filesystem API a backdoor too. On your desktop any app with sufficient permissions can read all your data. Would you call that a backdoor?
Incompetent oligarchs. If you can't even successfully bribe the Albanian or Serbian government, you are just really bad at corruption. Those are (sadly) among the most corrupt countries in the world.
edit: the governments appear to be supportive, but obviously aren't as supportive as they could be. Probably taking the bribe and not doing as much as they could.
reply