More

ACCount37 · 2026-06-14T11:49:42 1781437782

Yeah, that's the status quo.

The biggest advantage actual developers have is access to the NDA'd vendor docs and the official SDKs. And, the vendor docs are bad and the official SDKs are a mess. Internal documentation? You'd be lucky if it's two steps above "nonexistent". It's usually just one step.

ACCount37 · 2026-06-14T10:14:23 1781432063

The bulk of open source development is (still?) on Github.

It's not like the Linux kernel isn't real. It's just that the kind of people who write Linux kernel patches and get them accepted are, in the eyes of an average open source developer, somewhere between "majestic magical creatures" and "madmen".

ACCount37 · 2026-06-12T16:54:00 1781283240

Old concern, but it really doesn't work that way. Genetics don't respect human ideas like "nationalities" or "borders" - the targeting you can get by selecting on singular DNA variants is coarse enough to make ICBMs look like precision weapons.

Like many things of this nature, people keep bringing it up because it sounds Very Scary and Very Dystopian - not because it's worth giving an actual fuck about.

ikrenji · 2026-06-12T17:34:27 1781285667

I mean maybe not right now, but in 100-1000 years a complicated enough "nanobot/virus" could possibly be made to target a single person

ACCount37 · 2026-06-12T17:39:52 1781285992

If it's year 2126, and you have this kind of tech floating around, and you aren't equipping the entire population with artificial immune systems capable of dealing with known and unknown biological threats? You've done something wrong.

ACCount37 · 2026-06-12T08:15:45 1781252145

You can logit distill (full token probabilities) or one hot distill (chat logs), or even align hidden states. All are distillation methods.

ACCount37 · 2026-06-11T16:35:33 1781195733

Model inference:training compute for frontier models is estimated to be over 10:1 now.

Driven mostly by just how much inference they sell nowadays - but also by things like base model reuse.

ACCount37 · 2026-06-11T16:30:49 1781195449

"Focusing on a domain" has a hard ceiling.

A model's capability is a function of model size, and you can only push a small overspecialized "idiot savant" model so far before its crippling size starts to bite you.

You can make a model like Composer 2.5. But Mythos 5 will beat it on capability, both at coding and at everything else. And the world is always hungry for more capabilities.

If you're running high on agentic AI and low on human oversight, paying x2 for going from 5% faults to 2% faults is a good deal.

jermaustin1 · 2026-06-11T17:15:44 1781198144

I'm not a very smart person, so take what I say with a grain of salt.

I think the path forward will have agents that use models that are individually specialized tasks (some might use a bigger model, some might use smaller models), then orchestrators that are good at knowing when to use which agent type.

I've played around with this in my own tiny coding agents, for TTRPG NPCs, and even a small experiment where LLMs controlled a MUD client as an NPC that played the game with you (only 5 rooms in the experiment).

Basically, break the tasks down into chunks so you don't have to use generalist models for everything, and can chose the right model for the job.

I'm also running all of this locally, where a generalist foundation model doesn't work, and heavily quantized models don't perform well for all tasks, so for unlimited token budgets, my solution is probably overkill.

ACCount37 · 2026-06-11T17:52:28 1781200348

"Orchestrator" pattern, "only use a big model to do big thinking, use smaller models to do grunt work" is probably what the field would converge to, eventually. Perhaps in form of "dynamic sparsity" - i.e. a family of closely related models allowing inference to transition from 1B class to 100T class on a dime, complete with something like joint KV cache.

But it's a hard pattern to pull off, so I'm not sure how soon we'll see it in action.

anthonypasq · 2026-06-11T17:08:21 1781197701

Mythos is 20x more expensive though

ACCount37 · 2026-06-11T17:37:33 1781199453

Fable 5 is listed at merely x2 of Opus 4.8 on OpenRouter. $10/$50 per 1M I/O, vs $5/$25.

Now, Fable 5 is currently borderline unusable because of asinine filters. But I assume they'll fix this shit eventually.

anthonypasq · 2026-06-11T19:21:08 1781205668

im talking about compared to composer 2.5

ACCount37 · 2026-06-11T16:20:51 1781194851

If you're claiming "AI inference is sold at a loss", it's on you to prove it.

All we have actual evidence of is: some users use enough AI that the subscription is sold at a loss to them (up to degenerate cases: usage maxed out at all times), if billed by API metrics, while some other users are, by the same metrics, profitable (down to degenerate cases: a forgotten subscription with $20 a month and 0 usage).

We don't know how API prices relate to costs - we only have estimates. And we certainly don't know how much inference does an average subscription user spend.

If you have some sort of information that would decisively prove that the aggregate is "AI company N is losing money on subscriptions", then, show it.

Or is it you who's blinded by faith? Like some sort of AI bubble cultist? The bubble is real, you just have to believe in it?

BosunoB · 2026-06-11T17:00:26 1781197226

Very well said. People are making a lot of claims when very little knowledge of the financials is public. If you actually look at the numbers, there are plenty of ways in which API revenue and forgotten subs could more than make the difference for power users. Even if power users are getting 10-20x their sub fee in tokens, the math could still work out. Personally, I doubt more than 5% of Claude subs even approach max usage, because it requires having so many agents running all of the time.

I imagine we'll know in a few months when these companies go public.

ACCount37 · 2026-06-11T09:10:19 1781169019

Remove the relevant data, and just enough of the data around it will remain that the AI will be able to close the gap if given relevant documentation.

Not to mention that those capabilities are inherently dual use. If you know how to write C safely, you know how to spot unsafe C.

ACCount37 · 2026-06-11T09:03:15 1781168595

I was reverse engineering a medical device, and had to do a lot of trickery to get Opus 4.5 - not even Fable/Mythos, Opus - not to trip up its fucking CBRN filter.

What happened with Fable is basically what I feared when they announced those restrictions. They took the shitty Opus CBRN filter and made it even worse.

I pity the fools trying to use Anthropic AIs for anything biotech.

pneumic · 2026-06-11T13:15:34 1781183734

Opus has been fine on proteomics and bioinformatics for me. I have never seen a Claude model refuse on such grounds before in the past.

Claude is still the best IMO, but it feels like its most frustrating and grating aspects are not down to the model’s abilities, but the increasingly heavy hand of Anthropic expressing itself within the model. Fable’s comically useless responses almost seem like a cynical marketing tweak.

“This model is so powerful we basically can’t let it do anything. How terrifying! We need more money to make it stronger. Now do you see why we should be the ones who write the regulations? We’re the Good Guy AI Company Who Will Never Ever Ever Be Unethical after all.”

As this entity gains more ground, their models become increasingly annoying to use and their little act becomes more transparent. The whole “I’m-just a befuddled ethically-minded AI researcher who is perturbed by the power that I unwittingly discovered and I must warn the world” thing? Yeah fuck off. Your twee pandering to naïve nerds and cynical technocrats is nauseating and ordinary people can smell it a mile away. Completely repellent leadership who put up red flags to anyone left with a working ability to read between the lines of both spoken language and body language. The tech company equivalent of a sex predator who plays as the nice guy. Gross.

Nobody likes these companies and their models are annoying, but we’re going to put up with playing middle manager to these obnoxious programs because our jobs depend on it now, and these products are still the best on the market.

A breakthrough in tools that facilitate user-owned models and infrastructure is desperately needed for the sake of our dignity and sanity, if nothing else.

ACCount37 · 2026-06-11T14:46:43 1781189203

My personal suspicion is that it went "medical hardware -> high-throughput screening -> biorisk" in that old Opus case.

I like Anthropic's work, and I would be the first to argue against all the usual "it's all PR" whine. But there is a limit. And whoever made those fucking filters needs to be fired out of a cannon into the sun.

staticman2 · 2026-06-11T12:15:33 1781180133

The filters are really bad.

Yesterday Fable rejected commenting on poetry because it had anatomy lines like:

got anotha round of acetylcholine from da boss.

ACCount37 · 2026-06-10T11:24:20 1781090660

Sounds like a skill issue?

Recent image models are advancing rapidly at prompt adherence specifically, and being able to iterate on the same image is propelling them even further. Images 2.0 being the poster child of this "agentic iterative image composition" approach.

TheOtherHobbes · 2026-06-10T11:51:19 1781092279

Images 2.0 isn't anywhere close to the kind of detail control I'm talking about.

It's the opposite of a skill issue. No image generator is anywhere near the ballpark of pro-level manual Photoshop or Illustrator editing for individual elements in an image.

If you don't understand this, try precisely kerning the text in a generated book cover to handle letter combinations like A and V.

This is one of the big problems with GenAI. You can do new things with it, but it's crude Dunning Kruger good-enough-if-you-don't-ask-for-more creativity.

The pros can see what most people can't, and the flaws and missing features are frustrating and obvious creatively, not just in terms of production values.

ACCount37 · 2026-06-10T12:33:01 1781094781

I fail to see anything other than a skill issue.

We went from "AI can't generate text that isn't at least 20% typos and it always looks like shit" to "some letter combinations aren't kerned to perfection sometimes and adjusting that with prompts is hard". In a couple of generations.