I had the same problem. Was using this a few months ago, had it running for weeks, and noticed that code was disappearing. I no longer use it. Aside from that, I decided there's no point anyway, considering that LLMs are great at figuring out merge conflicts.
Times are changing. The open-weight models have needed time to catch up, but they're finally at a point now where we can get almost frontier level capabilities for coding.
I just wish we had a way to actually benchmark them properly though. Still seems no one has solved the problem of software architecture, brittleness and bloat as the codebase grows. Models love to add stuff, but they rarely clean up as they go. In a perfect world they'd do both near equally as they're developing.
It would be nice if there was an "architecture quality" benchmark that distilled the essence of what it means to have a good architecture, but I suppose that's an open research question with a lot of variables? Like how is good architecture actually quantified and measured? Is there a mechanism that can be re-used across all codebases to clearly denote one that is good and one that is bad, or is it highly subjective and depend on the lens you're looking at it from? Is there a lot more to it than just "how much refactoring effort is required to extend this in the future?".
Surely this is something that has been well researched - yet I never really hear anything about it. Makes me wonder why.
I know the big labs like to pretend that their models are trillion parameter. But how likely is that really to be the case when Qwen 3.6 35B A3B gets so close to their performance? Seems that with the best research applied, best training data, they'd be able to top the charts with a 60B model quite easily.
Number of parameters doesn't make the model smarter, it just makes it know more stuff out of the box.
At some point there's diminishing returns and your coding LLM performs worse because you encoded useless stuff like Pokemon combinations or languages you don't speak into its parameter space.
The "smartness" of the model comes from RLHF post-training, which is orthogonal to model size.
Also, if you're using an agentic harness a much better approach is to let the model control its own context. If you ever reach a point where your coding LLM needs to know about Pokemon, just give it a web search tool and let it google the Pokemons.
Hallucination city, doing whatever it feels like and far more than what I've bargained for, and performance on par with Opus 4.8 for coding in a large production codebase. I still have far more success with GPT 5.5, it actually follows my instructions and doesn't try to automate my entire job, which allows me to build skills and pipelines around the things I actually want automated.
Interesting. I only used Fable. And not for very much time. But one thing I did notice was improved adherence. Maybe it was just improved adherence to the claude.md and the instructions from the skills in use that I was noticing.
When Jensen (Nvidia) was doing interviews at his recent public talks, he was asked something along the lines of: "Why release these new laptops which are a low margin market, if your other businesses are vastly more profitable?" and his answer was basically that if they can build the coolest and best technology and push the frontier, they will do it. It's not all about making tons of money. He seemed genuinely excited about the tech.
It highlights the difference between companies like Nvidia and Anthropic to me, where one is clearly all about the money and power, and the other is doing it because they genuinely want to accelerate progress and make cool stuff as the driving factor. It's no surprise therefore, that Nvidia is the worlds largest open-source contributor to AI, with over 800 open-weight models.
Of course, these models run on Nvidia hardware, so they benefit from it as a company. But with that healthy mindset, they found a way to contribute that not only benefits everyone, but also benefits themselves.
Contrast to Anthropic, who has gone the complete opposite direction. Closed off everything, restricting everything, fearmongering progress, regulatory capture attempts, the list goes on. I mean, they won't even agree on using AGENTS.md as a standard because CLAUDE.md is free marketing for them. That's the level of disgusting greed we are dealing with...
From a game theory perspective, the cooperative strategies tend to win. As a result, Nvidia has set themselves up for a lifetime. Anthropic however, is playing a strategy of winner takes all, and they're happy to see the world and the entire AI industry collapse in the process.
The proof is in the pudding though. I'm judging based on their actions, not on their words. They're making AI models and AI research widely accessible, including selling consumer grade hardware to run them locally, and to use open-weight models. They could have just gone all in on selling to Anthropic, OpenAI, and all the other big tech companies, but they aren't. Meanwhile, Anthropic is trying to price people out of the market, increasing their restrictions, cutting the latest model from subscription plans, etc.
Nvidia not doing it out of goodness of their hearts and love to open source. If at anynpoint their CUDA vendor lock-in moat will faik because Intel or AMD manage to get working software they'll return to keep everything locked and proprietary ASAP.
Basically everything Nvidia does in open source is there to make sure their proprietary stack have a good moat and no competitor stack can catch up.
That's not really the impression I get from Anthropic, but if you have the links to back it up, I'm always willing to change my mind.
Compared to bizes like Oracle, Microsoft, or Facebook, I felt that Anthropic was more interested in progress (not to the neglect of business―AI training is expensive at the end of the day), but maybe I've just not seen what you've seen.
This is a good idea. I've been hoping that a large player with enough social reach would create an open-source fund that everyone can contribute to, to develop a company that trains and releases open-source models at the cutting edge. We can crowdfund the training costs, and the whole world benefits.
It's the most logical solution for AI anyway, considering that it's training on humanities collective knowledge. It should be more of a public-funded and public-access resource, rather than something greedy tech companies distribute like crumbs while they use unlocked powers internally to clone all of our businesses and swallow the economy.
It's actually the opposite. Democratization of intelligence is the only way to stop existential threats and render them useless.
Right now, and likely forever, because biological threats can be sanctioned at a supply-chain level, the risk of AI is all digital. Fraud, phishing scams, spam, hacks, etc.
The only way we harden the worlds infrastructure to the point that it can withstand attack from bad AI is if we have an abundance of access to frontier intelligence to develop countermeasures.
Otherwise, bad actors will develop these capabilities behind closed doors and use them to hold the world hostage and cause irreparable harm. There's no putting the genie back in the bottle. Good and open-access AI and the people using it are the digital immune system.
If there's an asymmetry where bleeding edge is gated off to only a small group, and allowed to gain exponential power over the immune systems defense grid, the slightest infection will lead to death of the host.
It's not only tenable, it is a necessity. Unless you want humanity to be enslaved in perpetuity to a single figurehead.
Bad AI is only countered by having a majority of good, open-access and open-source AI to keep it in check, where the good AI can overpower the bad. The moment you destroy that balance is the moment a bad actor gains exponential advantage and the ability to hold the whole world hostage forever.
reply