The biggest advantage actual developers have is access to the NDA'd vendor docs and the official SDKs. And, the vendor docs are bad and the official SDKs are a mess. Internal documentation? You'd be lucky if it's two steps above "nonexistent". It's usually just one step.
The bulk of open source development is (still?) on Github.
It's not like the Linux kernel isn't real. It's just that the kind of people who write Linux kernel patches and get them accepted are, in the eyes of an average open source developer, somewhere between "majestic magical creatures" and "madmen".
Old concern, but it really doesn't work that way. Genetics don't respect human ideas like "nationalities" or "borders" - the targeting you can get by selecting on singular DNA variants is coarse enough to make ICBMs look like precision weapons.
Like many things of this nature, people keep bringing it up because it sounds Very Scary and Very Dystopian - not because it's worth giving an actual fuck about.
If it's year 2126, and you have this kind of tech floating around, and you aren't equipping the entire population with artificial immune systems capable of dealing with known and unknown biological threats? You've done something wrong.
A model's capability is a function of model size, and you can only push a small overspecialized "idiot savant" model so far before its crippling size starts to bite you.
You can make a model like Composer 2.5. But Mythos 5 will beat it on capability, both at coding and at everything else. And the world is always hungry for more capabilities.
If you're running high on agentic AI and low on human oversight, paying x2 for going from 5% faults to 2% faults is a good deal.
I'm not a very smart person, so take what I say with a grain of salt.
I think the path forward will have agents that use models that are individually specialized tasks (some might use a bigger model, some might use smaller models), then orchestrators that are good at knowing when to use which agent type.
I've played around with this in my own tiny coding agents, for TTRPG NPCs, and even a small experiment where LLMs controlled a MUD client as an NPC that played the game with you (only 5 rooms in the experiment).
Basically, break the tasks down into chunks so you don't have to use generalist models for everything, and can chose the right model for the job.
I'm also running all of this locally, where a generalist foundation model doesn't work, and heavily quantized models don't perform well for all tasks, so for unlimited token budgets, my solution is probably overkill.
"Orchestrator" pattern, "only use a big model to do big thinking, use smaller models to do grunt work" is probably what the field would converge to, eventually. Perhaps in form of "dynamic sparsity" - i.e. a family of closely related models allowing inference to transition from 1B class to 100T class on a dime, complete with something like joint KV cache.
But it's a hard pattern to pull off, so I'm not sure how soon we'll see it in action.
If you're claiming "AI inference is sold at a loss", it's on you to prove it.
All we have actual evidence of is: some users use enough AI that the subscription is sold at a loss to them (up to degenerate cases: usage maxed out at all times), if billed by API metrics, while some other users are, by the same metrics, profitable (down to degenerate cases: a forgotten subscription with $20 a month and 0 usage).
We don't know how API prices relate to costs - we only have estimates. And we certainly don't know how much inference does an average subscription user spend.
If you have some sort of information that would decisively prove that the aggregate is "AI company N is losing money on subscriptions", then, show it.
Or is it you who's blinded by faith? Like some sort of AI bubble cultist? The bubble is real, you just have to believe in it?
Very well said. People are making a lot of claims when very little knowledge of the financials is public. If you actually look at the numbers, there are plenty of ways in which API revenue and forgotten subs could more than make the difference for power users. Even if power users are getting 10-20x their sub fee in tokens, the math could still work out. Personally, I doubt more than 5% of Claude subs even approach max usage, because it requires having so many agents running all of the time.
I imagine we'll know in a few months when these companies go public.
I was reverse engineering a medical device, and had to do a lot of trickery to get Opus 4.5 - not even Fable/Mythos, Opus - not to trip up its fucking CBRN filter.
What happened with Fable is basically what I feared when they announced those restrictions. They took the shitty Opus CBRN filter and made it even worse.
I pity the fools trying to use Anthropic AIs for anything biotech.
Opus has been fine on proteomics and bioinformatics for me. I have never seen a Claude model refuse on such grounds before in the past.
Claude is still the best IMO, but it feels like its most frustrating and grating aspects are not down to the model’s abilities, but the increasingly heavy hand of Anthropic expressing itself within the model. Fable’s comically useless responses almost seem like a cynical marketing tweak.
“This model is so powerful we basically can’t let it do anything. How terrifying! We need more money to make it stronger. Now do you see why we should be the ones who write the regulations? We’re the Good Guy AI Company Who Will Never Ever Ever Be Unethical after all.”
As this entity gains more ground, their models become increasingly annoying to use and their little act becomes more transparent. The whole “I’m-just a befuddled ethically-minded AI researcher who is perturbed by the power that I unwittingly discovered and I must warn the world” thing? Yeah fuck off. Your twee pandering to naïve nerds and cynical technocrats is nauseating and ordinary people can smell it a mile away. Completely repellent leadership who put up red flags to anyone left with a working ability to read between the lines of both spoken language and body language. The tech company equivalent of a sex predator who plays as the nice guy. Gross.
Nobody likes these companies and their models are annoying, but we’re going to put up with playing middle manager to these obnoxious programs because our jobs depend on it now, and these products are still the best on the market.
A breakthrough in tools that facilitate user-owned models and infrastructure is desperately needed for the sake of our dignity and sanity, if nothing else.
My personal suspicion is that it went "medical hardware -> high-throughput screening -> biorisk" in that old Opus case.
I like Anthropic's work, and I would be the first to argue against all the usual "it's all PR" whine. But there is a limit. And whoever made those fucking filters needs to be fired out of a cannon into the sun.
Recent image models are advancing rapidly at prompt adherence specifically, and being able to iterate on the same image is propelling them even further. Images 2.0 being the poster child of this "agentic iterative image composition" approach.
Images 2.0 isn't anywhere close to the kind of detail control I'm talking about.
It's the opposite of a skill issue. No image generator is anywhere near the ballpark of pro-level manual Photoshop or Illustrator editing for individual elements in an image.
If you don't understand this, try precisely kerning the text in a generated book cover to handle letter combinations like A and V.
This is one of the big problems with GenAI. You can do new things with it, but it's crude Dunning Kruger good-enough-if-you-don't-ask-for-more creativity.
The pros can see what most people can't, and the flaws and missing features are frustrating and obvious creatively, not just in terms of production values.
We went from "AI can't generate text that isn't at least 20% typos and it always looks like shit" to "some letter combinations aren't kerned to perfection sometimes and adjusting that with prompts is hard". In a couple of generations.
The biggest advantage actual developers have is access to the NDA'd vendor docs and the official SDKs. And, the vendor docs are bad and the official SDKs are a mess. Internal documentation? You'd be lucky if it's two steps above "nonexistent". It's usually just one step.
reply