Hacker Newsnew | past | comments | ask | show | jobs | submit | germanptr's commentslogin

I get this question a lot, and I found it hard to answer briefly, so I ended up writing a longer post about how I work:

https://www.trigosec.com/insights/mob-programming-for-one/

The short version is that I don’t let AI agents work unsupervised on my code. I treat them like participants in a mob programming session instead of autonomous developers. Different agents get different roles (implementer, reviewer, architect, security reviewer, etc.), and I stay involved throughout the process.

I also agree with your point about architecture. Generating isolated components is relatively easy; preserving and evolving the architectural boundaries across a larger codebase is much harder.

We’re still missing a good way to express and measure architectural quality. Until then, architecture heavy work requires much closer supervision than implementation heavy work


> We’re still missing a good way to express and measure architectural quality

Architectural complexity[1]! There’s several really good papers on this.

Unfortunately it never caught on and we don’t have great automated tools to spit out a number. Also the majority of people just don’t care enough. Research in this field kinda died out when we invented microservices and started treating those as a silver bullet to The Architecture Problem (it’s not [2])

[1] https://swizec.com/blog/why-taming-architectural-complexity-...

[2] https://youtu.be/y8OnoxKotPQ


> Also the majority of people just don’t care enough.

Yet! It is the next frontier and we will need it for having agent as described in the post to really work


> Yet! It is the next frontier

While researching my book I read papers from the 80’s saying this. If you get a good enough spec and define the contracts and architecture, you then just hand off implementation to juniors/offshore/etc

So far has not worked. Maybe this time!


Didn't even need to click the YouTube link, I knew it would be Krazaam.

> The short version is that I don’t let AI agents work unsupervised on my code. I treat them like participants in a mob programming session instead of autonomous developers.

I wonder if OS maintainers would have a leg up in defining workflows to better leverage this. Of course, OS contributors are autonomous developers, but maybe a trick or two might transfer across


I follow a similar approach and use multiple LLMs per task. The quality improvement is surprisingly large.

Lately I’ve been experimenting with adding an explicit reward function so the models optimize for measurable output quality.

This creates a generate, critique, revise loop where candidate answers compete for a higher score. It feels promising because it reduces the amount of handholding for every task. It is also more fun because part of the review process is embedded in the scoring function, which simplifies the review effort.


> I follow a similar approach and use multiple LLMs per task.

Pardon my ignorance, but how would go about doing that on, say, a standard c++ project?

I get the part where one can use codex/claude with an ide and/or extension. But how does one connect two LLMs together in such a setup?


I don't have it automated, but I score on minimizing lines of code added, readability of the code, and quality of the architecture.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: