For me, AI is an enabler for things you can't do otherwise (or that would take many weeks of learning). But you still need to know how to do things properly in general, otherwise the results are bad.
E.g. I'm a software architect and developer for many years. So I know already how to build software but I'm not familiar with every language or framework. AI enabled me to write other kind of software I never learned or had time for. E.g. I recently re-implemented an android widget that has not been updated for a decade by it's original author. Or I fixed a bug in a linux scanner driver. None of these I could have done properly (within an acceptable time frame) without AI. But also none of there I could have done properly without my knowledge and experience, even with AI.
Same for daily tasks at work. AI makes me faster here, but also makes me doing more. Implement tests for all edge cases? Sure, always, I saved the time before. More code reviews. More documentation. Better quality in the same (always limited) time.
I use Claude Code a lot but one thing that really made me concerned was when I asked it about some ideas I have had which I am very familiar with. It's response was to constantly steer me away from what I wanted to do towards something else which was fine but a mediocre way to do things. It made me question how many times I've let it go off and do stuff without checking it thoroughly.
I've had quite a bit of the "tell it to do something in a certain way", it does that at first, then a few messages of corrections and pointers, it forgets that constraint.
> it does that at first, then a few messages of corrections and pointers, it forgets that constraint.
Yup, most models suffer from this. Everyone is raving about million tokens context, but none of the models can actually get past 20% of that and still give as high quality responses as the very first message.
My whole workflow right now is basically composing prompts out of the agent, let them run with it and if something is wrong, restart the conversation from 0 with a rewritten prompt. None of that "No, what I meant was ..." but instead rewrite it so the agent essentially solves it without having to do back and forth, just because of this issue that you mention.
Seems to happen in Codex, Claude Code, Qwen Coder and Gemini CLI as far as I've tested.
Yes, agreed. I find it interesting that people are saying they're building these huge multi-agent workflows since the projects I've tried it on are not necessarily huge in complexity. I've tried variety of different things re: isntructions files, etc. at this point.
So far, I haven't yet seen any demonstration of those kind of multi-agent workflows ending up with code that won't fall down over itself in some days/weeks. Most efforts so far seems to have to been focusing on producing as much code as possible, as fast as possible, while what I'd like to see, if anything, is the opposite of that.
Anytime I ask for demonstration of what the actual code looks like, when people start talking about their own "multi-agent orchestration platforms" (or whatever), they either haven't shared anything (yet), don't care at all about how the code actually is and/or the code is a horrible vibeslopped mess that contains mostly nonsense.
That's a strange name, why? It's more like a "iterate and improve" loop, "Groundhog Day" to me would imply "the same thing over and over", but then you're really doing something wrong if that's your experience. You need to iterate on the initial prompt if you want something better/different.
Call me a conspiracy theorist, and granted much of this could be attributed to the fact that the majority of code in existence is shit, but im convinced that these models are trained and encouraged to produce code that is difficult for humans to work on. Further driving and cementing the usage of then when you inevitably have to come back and fix it.
I don't think they would be able to have an LLM withouth the flaws. The problem is that an LLM cannot make a distinction between sense and nonsense in the logical way. If you train an LLM on a lot of sensible material, it will try to reproduce it by matching training material context and prompt context. The system does not work on the basis of logical principles, but it can sound intelligent.
I think LLM producers can improve their models by quite a margin if customers train the LLM for free, meaning: if people correct the LLM, the companies can use the session context + feedback to as training. This enables more convincing responses for finer nuances of context, but it still does not work on logical principles.
LLM interaction with customers might become the real learning phase. This doesn't bode well for players late in the game.
This could be the case even without an intentional conspiracy. It's harder to give negative feedback to poor quality code that's complicated vs. poor quality code that's simple.
Hence the feedback these models get could theoretically funnel them to unnecessarily complicated solutions.
No clue has any research been done into this, just a thought OTTOMH.
Mediocre is fine for many tasks. What makes a good software engineer is that he spots the few places in every software where mediocre is not good enough.
Yes but in my experience this sometimes works great, other times you paint yourself in a corner and the sun total is that you still have to learn the thing, just the initial ram is less steep. For example I build my self a nice pipeline for converting jpegs on disk to h264 on disk via zero-copy nvjpeg to nvenc, with python bindings but have been pulling out my hair over bframe ordering and weird delays in playback etc. Nothing u solvable but I had to learn a great deal and when we were in the weeds, Opus was suggesting stupid hack quick fixes that made a whack a mole with the tests. In the end I had to lead e Pugh and read enough to be able to ask it with the right vocabulary to make it work. Similarly with entering many novel areas. Initially I get a rush because it "just works" but it really only works for the median case initially and it's up to you to even know what to test. And AIs can be quite dismissive of edge cases like saying this will not happen in most cases so we can skip it etc.
Yeah, knowing what words to use is half the battle. Quickly throw away a prompt like "Hey, `make build` takes five minutes, could you make it fast enough to run under 1 minute" and the agent will do some work and say "Done, now the build takes 25 seconds as we're skipping the step of building the images, use `make build INCLUDE_IMAGES=true` when you want to build with images". It's not wrong, given the prompt, but takes a bit to get used to how they approach things.
I'm in the same boat. I've been taking on much more ambitious projects both at work and personally by collaborating with LLMs. There are many tasks that I know I could do myself but would require a ton of trial and error.
I've found giving the LLMs the input and output interfaces really help keep them on rails, while still being involved in the overall process without just blindly "vibe coding."
Having the AI also help with unit tests around business logic has been super helpful in addition to manual testing like normal. It feels like our overall velocity and code quality has been going up regardless of what some of these articles are saying.
100% agree with AI expanding core testing from my own edge and key tests.
I agree, I write out the sketch of what I want. With a recent embedded project in C I gave it a list of function signatures and high level description and was very satisfied with what it produced. It would have taken me days to nail down the particulars of the HAL (like what kind of sleep do I want what precisely is the way to setup the WDT and ports).
I think it's also language dependent.
I imagine JavaScript can be a crap shoot. The language is too forgiving.
Rust is where I have had most success. That is likely a personal skill issue, I know we want a Arc<DashMap>, will I remember all the foibles of accessing it? No.
But given the rigidity of the compiler and strong typing I can focus on what the code functionally is doing, that in happy with the shape/interface and function signature and the compiler is happy with the code.
It's quite fast work. It lets me use my high level skills without my lower level skills getting in the way.
And id rather rewrite the code at a mid-level then start it fresh, and agree with others once it's a large code base then in too far behind in understanding the overall system to easily work on it.
That's true of human products too - someone elses code always gives me the ick.
Vanilla javascript is hit or miss for anything complex.
Using Typescript works great because you can still build out the interfaces and with IDE integrations the AIs can read the language server results so they get all the type hints.
I agree that the AI code is usually a pretty good starting point and gets me up to speed for new features fast rather than starting everything from scratch. I usually end up refactoring the last 10-20% manually to give it some polish because some of the code still feels off some times.
Huh. I'm extremely skeptical of AI in areas where I don't have expertise, because in areas where I do have expertise I see how much it gets wrong. So it's fine for me to use it in those areas because I can catch the errors, but I can't catch errors in fields I don't have any domain expertise in.
In my case I built a video editing tool fully customized for a community of which I am a member. I could do it in a few hours. I wouldn't have even started this project as I don't have much free time, though I have been coding for 25+ years.
I see it empowering to build custom tooling which need not be a high quality maintenance project.
> Or I fixed a bug in a linux scanner driver. None of these I could have done properly (within an acceptable time frame) without AI. But also none of there I could have done properly without my knowledge and experience, even with AI
There are some things here that folks making statements like yours often omit and it makes me very sus about your (over)confidence. Mostly these statements talk in a business short-term results oriented mode without mentioning any introspective gains (see empirically supported understanding) or long-term gains (do you feel confident now in making further changes _without_ the AI now that you have gained new knowledge?).
1. Are you 100% sure your code changes didn't introduce unexpected bugs?
1a. If they did, would you be able to tell if they where behaviour bugs (ie. no crashing or exceptions thrown) without the AI?
2. Did you understand why the bug was happening without the AI giving you an explanation?
2a. If you didn't, did you empirically test the AI's explanation before applying the code change?
3. Has fixing the bug improved your understanding of the driver behaviour beyond what the AI told you?
3a. Have you independently verified your gained understanding or did you assume that your new views on its behaviour are axiomatically true?
Ultimately, there are 2 things here: one is understanding the code change (why it is needed, why that particular change implementation is better relative to others, what future improvements could be made to that change implementation in the future) and skill (has this experience boosted your OWN ability in this particular area? in other words, could you make further changes WITHOUT using the AI?).
This reminds me of people that get high and believe they have discovered these amazing truths. Because they FEEL it not because they have actual evidence. When asked to write down these amazing truths while high, all you get in the notes are meaningless words. While these assistants are more amenable to get empirically tested, I don't believe most of the AI hypers (including you in that category) are actually approaching this with the rigour that it entails. It is likely why people often think that none of you (people writing software for a living) are experienced in or qualified to understand and apply scientific principles to build software.
Arguably, AI hypers should lead with data not with anecdotal evidence. For all the grandiose claims, the lack of empirical data obtained under controlled conditions on this particular matter is conspicuous by its absence.
Thanks for pointing these things out. I always try to learn and understand the generated code and changes. Maybe not so deep for the android app (since it's just my own pet project). But especially for every pull request to a project. Everyone should do this out of respect to the maintainers who review the change.
> Are you 100% sure your code changes didn't introduce unexpected bugs?
Who is this ever? But I do code reviews and I usually generate a bunch of tests along with my PRs (if the project has at lease _some_ test infrastructure).
Same applies for the rest of the points. But that's only _my_ way to do these things. I can imagine that others do it a different way and that the points above are more problematic then.
> I always try to learn and understand the generated code and changes
Not to be pedantic but, do you _try_ to understand? Or do you _actually_ understand the changes? This suggests to me that there are instances where you don't understand the generated code on projects others than your own, which is literally my point and that of many others. And even if you did understand it, as I pointed out earlier, that's not enough. It is a low bar imo. I will continue to keep my mind open but yours isn't a case study supporting the use of these assistants but the opposite.
In science, when a new idea is brought forward, it gets grilled to no end. The greater the potential the harder the grilling. Software should be no different if the builders want to lay a claim on the name "engineer". It is sad to see a field who claims to apply scientific principles to the development of software not walking the walk.
It's incredible that within two minutes after posting this comment is already grayed out whereas it makes a number of excellent points.
I've been playing with various AI tools and homebrew setups for a long time now and while I see the occasional advantage it isn't nearly as much of a revolution as I've been led to believe by a number of the ardent AI proponents here.
This is starting to get into 'true believer' territory: you get these two camps 'for and against' whereas the best way forward is to insist on data rather than anecdotes.
AI has served me well, no doubt about that. But it certainly isn't a passe-partout and the number of times it has caused gross waste of time because it insisted on chasing some rabbit simply because it was familiar with the rabbit adds up to a considerable loss in productivity.
The scientific principle is a very powerful tool in such situations and anybody insisting on it should be applauded. It separates fact from fiction and allows us to make impartial and non-emotional evaluations of both theories and technologies.
> (...) you get these two camps 'for and against' whereas the best way forward is to insist on data rather than anecdotes.
I think that's an issue with online discussions. It barely happens to me in the real world, but it's huge on HN.
I'm overall very positive about AI, but I also try to be measured and balanced and learn how to use it properly. Yet here on HN, I always get the feeling people responding to me have decided I am a "true believer" and respond to the true believer persona in their head.
They are. And we have processes to minimize them - tests, code review, staging/preprod envs - but they are nowhere close to being 100% sure that code is bug free - that's just way too high bar for both AI and purely human workflows outside of few pretty niche fields.
I think what we'll see as AI companies collect more usage data the requirements for knowing what you do will sink lower and lower. Whatever advantage we have now is transient.
Also most of the studies shown start to be obsolete with AI rapid path of improvements.
Opus 4.5 has been a huge game changer for me (combined with CC that I had not used before) since December.
Claude code arrived this summer if I’m not mistaken.
So I’m not sure a study from 2024 or impact on code produced during 2024 2025 can be used to judge current ai coding possibilities.
> But you still need to know how to do things properly in general, otherwise the results are bad.
Even that could use some nuance. I'm generating presentations in interactive JS. If they work, they work - that's the result, and I extremely don't care about the details for this use case. Nobody needs to maintain them, nobody cares about the source. There's no need for "properly" in this case.
I've found this is exact opposite of what I'd dare do with AI, things you don't understand are things you can't verify. Consider you want a windowed pane for your cool project, so you ask an AI to draft a design. It looks cool and it works! Until you bring it outside where after 30 minutes it turns into explosive shrapnel, because the model didn't understand thermal expansion, nor did you.
Contrast this to something you do know but can't be arsed to make; you can keep re-rolling a design until you get something you know and can confirm works. Perfect, time saved.
I think AI will fail in any organisation where the business process problems are sometimes discuvered during engineering. I use AI quite a lot, I recently had Claude upgrade one of our old services from hubspot api v1 to v3 without basically any human interaction beyond the code review. I had to ask it for two changes I think, but over all I barely got out of my regular work to get it done. I did know exactly what to ask of it because the IT business partners who had discovered the flaw had basically written the tasks already. Anyway. AI worked well there.
Where AI fails us is when we build new software to improve the business related to solar energy production and sale. It fails us because the tasks are never really well defined. Or even if they are, sometimes developers or engineers come up with a better way to do the business process than what was planned for. AI can write the code, but it doesn't refuse to write the code without first being told why it wouldn't be a better idea to do X first. If we only did code-reviews then we would miss that step.
In a perfect organisation your BPM people would do this. In the world I live in there are virtually no BPM people, and those who know the processes are too busy to really deal with improving them. Hell... sometimes their processes are changed and they don't realize until their results are measurably better than they used to be. So I think it depends a lot on the situation. If you've got people breaking up processes, improving them and then decribing each little bit in decent detail. Then I think AI will work fine, otherwise it's probably not the best place to go full vibe.
> AI can write the code, but it doesn't refuse to write the code without first being told why it wouldn't be a better idea to…
LLMs combine two dangerous traits simultaneously: they are non-critical about suboptimal approaches and they assist unquestioningly. In practice that means doing dumb things a lazy human would refuse because they know better, and then following those rabbit holes until they run out of imaginary dirt.
My estimation is that that combination undermines their productivity potential without very structured application. Considering the excess and escalating costs of dealing with issues as they arise further from the developers work station (by factors of approximately 20x, 50x, and 200x+ as you get out through QA and into customer environments (IIRC)), you don’t need many screw ups to make the effort net negative.
One benefit of AI could be to build quick prototypes to discover what processes are needed for users to try out different approaches before committing to a full high quality project.
> business process problems are sometimes discovered (sic.) during engineering
This deserves a blog post all on its own. OP you should write one and submit it. It's a good counterweight to all the AI optimistic/pessimistic extremism.
> but it doesn't refuse to write the code without first being told why it wouldn't be a better idea to do X first
Then don't ask it to write code? If you ask any recent high quality model to discuss options, tradeoffs, design constraints, refine specs it will do it for you until you're sick and tired of it finding real edge cases and alternatives. Ask for just code and you'll get just code.
They are way better at code-related tasks than design or strategy ones. Anything involving users or business strategy or a "why" is vague and misguided, they have no insight.
To be fair, they're primed to write code, even when you don't ask for it. I explicitly tell Claude "do not write code" when I don't want any, otherwise it'll spit some out just to say hello (world).
You need to be in plan mode. Not only can it not change code, its interaction with you is quite different. It will surface issues and ask you for choices.
The more I read people saying that Claude is failing, the more I realize this is 90% a user problem. This is just an example, but I see it often.
Claude has a mode specifically for what you're talking about, it is actually very good (Opus 4.5) at planning and going through design without coding, it's called planning mode.
Listen, if you aren't constantly shift-tab or esc-esc during complex problems, and then struggling when it isn't working for you, rtfm, you'll get further and better results.
> Unlike their human counterparts who would and escalate a requirements gap to product when necessary, coding assistants are notorious for burying those requirement gaps within hundreds of lines of code
This is the kind of argument that seems true on the surface, but isn't really. An LLM will do what you ask it to do! If you tell it to ask questions and poke holes into your requirements and not jump to code, it will do exactly that, and usually better than a human.
If you then ask it to refactor some code, identify redundancies, put this or that functionality into a reuseable library, it will also do that.
Those critiques of coding assistants are really critiques of "pure vibe coders" who don't know anything and just try to output yet another useless PDF parsing library before they move on to other things.
I hear your pushback, but that I think that's his point:
Even seasoned coders using plan mode are funneled towards "get the code out" when experience shows that the final code is a tiny part of the overall picture.
The entire experience should be reorganized that the code is almost the afterthought, and the requirements, specs, edge cases, tests, etc are the primary part.
This is always been the businessman's dream to write requirements and then coding becomes a mindless work but requirements and specs can never cover every small detail. Code itself is the spec but Business people just dont wanna write it. if you handle all edge cases and limitation in the spec, and then do the same in the code, you are just writing code twice.
This also completely ignores the fact that PMs and Business teams are generating specs by AI too, so its slop covered by more slop and has no actual specific details until you reach the code level.
It will not in fact always do what you ask it because it lacks any understanding, though the chat interface and prolix nature of LLMs does a good job at hiding that.
It’s like in Anthropic’s own experiment. People who used AI to do their work for them did worse than the control group. But people who used AI to help them understand the problem, brainstorm ideas, and work on their solution did better.
The way you approach using AI matters a lot, and it is a skill that can be learned.
It not just about asking questions, its about asking right questions. Can AI pushback and decline a completely stupid request? PMs & Business people dont really know the limitation of the software and almost always think adding more features is better. With AI you will be shipping 90% of the features which were never needed thus adding to bloat & making the product go off the rails quicker.
> There’s a name for misalignment between business intent and codebase implementation: technical debt.
I wish we'd stop redefining this term. Technical debt is a shortcut agreed upon with the business to get something out now and fix later, and the fix will cost more than the original. It is entirely in line with business intent.
Talking and typing feels far more productive that staring and thinking, and there is a cumulative effect of those breaks to check Reddit while something is generating.
Humans are notoriously bad at estimating time use with different subjective experiences and show excessive weighting of the tail ends of experiences and perceived repetitious tasks. Making something psychologically more comforting and active, particularly if you can activate speech, will distort people’s sense of time meaningfully.
The current hype around LLMs is making me think about misapplied ORMs in medium scale projects... The tool is chosen early to save hours of boring typing and a certain kind of boring maintenance, but deep into the project what do we see? Over and over days are spontaneously being lost to incidental complexity and arbitrary tool constraints. And with the schedule slipping it’s too much work to address the root issue so band-aides get put on band-aides, and we start seeing weeks slip down the drain.
Subjective time accounting and excessive aversion to specific conceptual tasks creates premature optimizations whose effects become omnipresent over time. All the devs in the room agreed they want to avoid some work day 1, but the accounting shows a big time commitment resulting from that immediate desire. Feelings aren’t stopwatches.
[Not hating on ORMs, just misusing tools for weeks to save a couple hours - every day ain’t Saturday - right tool for the job.]
The article they are referring to is 404, but based on the URL was published bit more than year ago. That's quite long time in a field that is evolving so rapidly and which even the pioneers are still figuring out.
Yes, that's true, because as developer you have to check if "generated" code meet your standards and if is handling all edge cases you see.
When you are an experienced developer and you "struggle" writing manually some code this is important warning indicator about project architecture - that something is wrong in it.
For such cases I like to step back and think about redesign/refactor.
When coding goes smoothly, some "unpredicted" customer changes can be added easly into project then it is the best indicator that architecture is fine.
It's even simpler than that. "Reading code is harder than writing code" has been repeated for decades and everyone agrees.
When you use AI to generate your code, instead of you writing it and then someone else reviewing it, there are two people reviewing it (you and the reviewer), which obviously takes longer.
I think that the premise is wrong (and the title is very clickbaity, but we will ignore that it doesn’t really match the article and the “conclusion”): coding agents are “solving” at least one problem, which is to massively expand the impact of senior developers _that can use them effectively_.
Everything else is just hype and people “holding it wrong”.
The writeup is a bit contrived in my opinion. And sort of misrepresenting what users can do with tools like Claude Code.
Most coding assistant tools are flexible to applying these kinds of workflows, and these sorts of workflows are even brought up in Anthropic's own examples on how to use Claude Code. Any experienced dev knows that the act of specifically writing code is a small part of creating a working program.
this concept of bottlenecking on code review is definitely a problem.
Either you (a) don't review the code, (b) invest more resources in review or (c) hope that AI assistance in the review process increases efficiency there enough to keep up with code production.
But if none of those work, all AI assistance does is bottleneck the process at review.
If companies truly believed more code equals more productivity then they will remove all code review from their process and let IC’s ship AI generated code that they “review” as the prompter directly to prod.
I have found that using Cursor to write in Rust what I previously would write as a shell or Python or jq script was rather helpful.
The datasets are big and having the scripts written in the performant language to process them saves non-trivial amounts of time, like waiting just 10 minutes versus an hour.
Initial code style in the scripts was rather ugly with a lot of repeated code. But with enough prompting that I reuse the generated code became sufficiently readable and reasonable to quickly check that it is indeed doing what was required and can be manually altered.
But prompting it to do non-trivial changes to existing code base was a time sink. It took too much time to explain/correct the output. And critically the prompts cannot be reused.
Same though lately discovered some rough edges in rust with LLM. Sticking a working app into a from scratch container image seems particularly problematic even if you give it the hint that it needs to static link
The requirements gap point is underrated. AI guesses where a human would ask
By the time you catch it in review, you've already wasted the time you saved -_-
Some of the conclusions remind of the "ha ha only serious" joke that most people (obviously not the Monks themselves) had about Perl; "write only code". Maybe some of the lessons learnt about how to maintain Perl code might be applicable in this space?
First you must accept that engineering elegance != market value. Only certain applications and business models need the crème de le crème of engineers.
LLM has been hollowing out the mid and lower end of engineering. But has not eroded highest end. Otherwise all the LLM companies wouldn’t pay for talent, they’d just use their own LLM.
I'm going to give an example of a software with multiple processes.
Humans can imagine scenarios where a process can break. Claude can also do it, but only when the breakage happens from inside the process and if you specify it. It can not identify future issues from a separate process unless you specifically describe that external process, the fact that it could interact with our original process and the ways in which it can interact.
Identifying these are the skills of a developer, you could say you can document all these cases and let the agent do the coding. But here's the kicker, you only get to know these issues once you started coding them by hand. You go through the variables and function calls and suddenly remember a process elsewhere changes or depends on these values.
Unit tests could catch them in a decently architected system, but those tests needs to be defined by the one coding it. Also if the architect himself is using AI, because why not, it's doomed from the start.
So, your point is that programmers identify the unexpected edge cases through the act of taking their time writing the code by hand. From my experience, it takes a proficient developer to actually plan their code around future issues from separate processes.
I think that it's mistaken to think that reasoning while writing the code is at all a good way to truly understand what your code is doing. (Without implying that you shouldn't write it by hand or reason about it.) You need to debug and test it thoroughly either way, and basically be as sceptical of your own output as you'd be of any other person's output.
Thinking that writing the code makes you understand it better can cause more issues than thinking that even if you write the code, you don't really know what it's doing. You are merely typing out the code based on what you think it should be doing, and reasoning against that hypothesis. Of course, you can be better or worse at constructing the correct mental model from the get go, and keep updating it in the right direction while writing the code. But it's a slippery slope, because it can also go the other way around.
A lot of bugs that take unreasonably long for junior-mid level engineers to find, seem to happen because: They trust their own mental model of the code too much without verifying it thoroughly, create a hypothesis for the bug in their own head without verifying it thoroughly, then get lost trying to reason about a made up version of whatever is causing the bug only to come to the conclusion that their original hypothesis was completely wrong.
> From my experience, it takes a proficient developer to actually plan their code around future issues from separate processes.
And it takes even more experience to know when not to spend time on that.
Way too many codebases are optimised to 1M DAU and see like 100 users for the first year. All that time optimising and handling edge cases could've been spent on delivering features that bring in more users and thus more money.
I keep hearing this but I don’t understand. If inelegant code means more bugs that are harder to fix later, that translates into negative business value. You won’t see it right away which is probably where this sentiment is coming from, but it will absolutely catch up to you.
Elegant code isn’t just for looks. It’s code that can still adapt weeks, months, years after it has shipped and created “business value”.
It's a trade-off. The gnarly thing is that you're trading immediate benefits for higher maintenance costs and decreased reliability over time, which makes it a tempting one to keep taking. Sure, there will be negative business value, but later, and right now you can look good by landing the features quicker. It's FAFO with potentially many reporting quarters between the FA and the FO.
This trade-off predates LLMs by decades. I've been fortunate to have a good and fruitful career being the person companies hire when they're running out of road down which to kick the can, so my opinion there may not be universal, mind you.
Sometimes "elegance" just makes shit hard to read.
Write boring code[0], don't go for elegance or cool language features. Be as boring and simple as possible, repeat yourself if it makes the flow clearer than extracting an operation to a common library or function.
This is the code that "adapts" and can be fixed 3 years after the elegant coder has left for another greenfield unicorn where they can use the latest paradigms.
People sometimes conflate inelegance with buggy code, where the market fit and value matter more than code elegance. Bugs still are not acceptable even in your MVP. Actually I think buggy software especially if those bugs destroy user experience, will kill products. It’s not 2010 anymore. There are a lot of less buggy software out there and attention spans are narrower than before.
> I keep hearing this but I don’t understand. If inelegant code means more bugs that are harder to fix later, that translates into negative business value.
That's a rather short-sighted opinion. Ask yourself how "inelegant code" find it's way into a codebase, even with working code review processes.
The answer more often than not is what's typically referred to as tech debt driven development. Meaning, sometimes a hacky solution with glaring failure modes left unaddressed is all it takes to deliver a major feature in a short development cycle. Once the feature is out, it becomes less pressing to pay off that tech debt because the risk was already assumed and the business value was already created.
Later you stumble upon a weird bug in your hacky solution. Is that bug negative business value?
You not only stumble upon a weird bug in your hacky solution that takes engineering weeks to debug, but your interfaces are fragile so feature velocity drops (bugs reproduce and unless you address reproduction rate you end up fixing bugs only) and things are so tightly coupled that every two line change is now multi-week rewrite.
Look at e.g. facebook. That site has not shipped a feature in years and every time they ship something it takes years to make it stable again. A year or so ago facebook recognized that decades of fighting abuse led them nowhere and instead of fixing the technical side they just modified policies to openly allow fake accounts :D Facebook is 99% moltbook bot-to-bot trafic at this point and they cannot do anything about it.
Ironically, this is a good argument against code quality: if you manage to become large enough to become a monopoly, you can afford to fix tech debt later. In reality, there is one unicorn for every ten thousand of startups that crumbled under their own technical debt.
> You not only stumble upon a weird bug in your hacky solution that takes engineering weeks to debug, but your interfaces are fragile so feature velocity drops (bugs reproduce and unless you address reproduction rate you end up fixing bugs only) and things are so tightly coupled that every two line change is now multi-week rewrite.
I don't think you fully grasp the issue you're discussing. Things don't happen in a vacuum, and your hypothetical "fragile interfaces" that you frame as being a problem are more often than not a lauded solution to quickly deliver a major feature.
The calling card of junior developers is looking at a project and complaining it's shit. Competent engineers understand tradeoffs and the importance of creating and managing technical debt.
This has always been true. I just don’t see how AI makes accumulating tech debt more attractive, as the original poster seems to be implying. If anything it seems to make things worse. At least when you write shit code by hand you know it so you can remember to go back to it, keep it in mind as a potential source of bugs. But YOLO from AI and you probably have no idea.
Of course a bug is negative business value. Perhaps the benefit of shipping faster was worth the cost of introducing bugs, but that doesn't make it not a cost.
Because the entire codebase is crap, each user encounters a different bug. So now all your customers are mad, but they’re all mad for different reasons, and support is powerless to do anything about it. The problems pile up but they’re can’t be solved without a competent rewrite. This is a bad place to be.
And at some level of sloppiness you can get load bearing bugs, where there’s an unknown amount of behavior that’s dependent on core logic being dead wrong. Yes, I’ve encountered that one…
Once you gain some professional experience working with software development, you'll understand that that's exactly how it goes.
I think you are failing to understand the "soft" in "software". Changing software is trivial. All software has bugs, but the only ones being worked on are those which are a) deemed worthy of being worked on, b) have customer impact.
> So now all your customers are mad, but they’re all mad for different reasons, and support is powerless to do anything about it.
That's not how it works. You are somehow assuming software isn't maintained. What do you think software developers do for a living?
Nothing I just described was hypothetical. I’ve been the developer on the rewrite crew, the EM determining if there’s anything to salvage, and the client with a list of critical bugs that aren’t getting fixed and ultimately killed the contract.
If you haven’t seen anything reach that level of tech debt with active clients, well, lucky you.
If you can see the future and know no-one will ever encounter it, maybe not. But in the real world you presumably think there's some risk (unless no-one is using this codebase at all - but in that case the whole thing has negative business value, since it's incurring some cost and providing no benefit).
Well, it takes time to assess and adapt, and large organizations need more time than smaller ones. We will see.
In my experience the limiting factor is doing the right choices. I've got a costumer with the usual backlog of features. There are some very important issues in the backlog that stay in the backlog and are never picked for a sprint. We're doing small bug fixes, but the big ones. We're doing new features that are in part useless because of the outstanding bugs that prevent customers from fully using them. AI can make us code faster but nobody is using it to sort issues for importance.
> nobody is using it to sort issues for importance
True, and I'd add the reminder that AI doesn't care. When it makes mistakes it pretends to be sorry.
Simulated emotion is dangerous IMHO, it can lead to undeserved trust. I always tell AI to never say my name, and never use exclamation points or simulated emotion. "Be the cold imperfect calculator that you are."
When it was giving me complements for noticing things it failed to, I had to put a stop to that. Very dangerous. When business decisions or important technical decisions are made by an entity that literally is incapable of caring, but instead pretends to like a sociopath, that's when trouble brews.
LLM has been hollowing out the mid and lower end of engineering. But has not eroded highest end. Otherwise all the LLM companies wouldn’t pay for talent, they’d just use their own LLM.
The talent isn't used for writing code anymore though. They're used for directing, which an LLM isn't very good at since it has limited real world experience, interacting with other humans, and goals.
OpenAI has said they're slowing down hiring drastically because their models are making them that much more productive. Codex itself is being built by Codex. Same with Claude Code.
Source: Trust me, bro. A company selling an AI model telling others their AI model is so good that it's building itself. What could possibly motivate them to say that?
Remember a few years ago when Sam Altman said we had to pause AI development for 6 months because otherwise we would have the singularity and it would end the world? Yeah, about that...
Based on my experience using Claude opus 4.5, it doesn't really even get functionality correct. It'll get scaffolding stuff right if you tell it exactly what you want but as soon as you tell it to do testing and features it ranges from mediocre to worse than useless.
So basically - "ai" - actually llms - are decent at what they are trained at - producing plausible text with a bunch of structure and constraints - and a lot of programming, boring work emails, reddit/hn comments, etc can fall into that. It still requires understanding to know when that diverges from something useful, it still is just plausible text, not some magic higher reasoning.
Are they something worth using up vast amounts of power and restructuring all of civilisation around? No
Are they worth giving more power to megacorps over? No
Its like tech doesn't understand consent and then partially the classic case of "disrupting x" - thinking that you know how to solve something in maths, cs, physics and then suddenly that means you can solve stuff in a completely different field.
Isn't this proposal closely matching with the approach OpenSpec is taking? (Possibly other SDD tool kits, I'm just familiar with this one). I spend way more time in making my spec artifacts (proposal, design, spec, tasks) than I do in code review. During generation of each of these artifacts the code is referenced and surfaces at least some of the issues which are purely architecture based.
meh piece, don't feel like I learned anything from it. Mainly words around old stats in a rapidly evolving field, and then trying to pitch their product
tl;dr content marketing
There is this super interesting post in new about agent swarms and how the field is evolving towards formal verification like airlines, or how there are ideas we can draw on. Any, imo it should be on the front over this piece
"Why AI Swarms Cannot Build Architecture"
An analysis of the structural limitations preventing AI agent swarms from producing coherent software architecture
> meh piece, don't feel like I learned anything from it.
That's fine. I found the leading stats interesting. If coding assistants slowed down experienced developers while creating a false sense of development speed then that should be thought-provoking. Also, nearly half of code churned by coding assistants having security issues. That he's tough.
Perhaps it's just me, but that's in line with my personal experience, and I rarely see those points being raised.
> There is this super interesting post in new about agent swarms and how (...)
That's fine. Feel free to submit the link. I find it far more interesting to discuss the post-rose tinted glasses view of coding agents. I don't think it makes any sense at all to laud promises of formal verification when the same technology right now is unable to introduce security vulnerabilities.
They are from before the current generation of models and agent tools, they are almost certainly out of date and now different and will continue to evolve
We're still learning to crawl, haven't gotten to walking yet
Wondering why is ths on front page? There is hardly any new insight other than a few minutes of exposure to greenish glow that makes everything looks brownish after you close that page.
For me, AI is an enabler for things you can't do otherwise (or that would take many weeks of learning). But you still need to know how to do things properly in general, otherwise the results are bad.
E.g. I'm a software architect and developer for many years. So I know already how to build software but I'm not familiar with every language or framework. AI enabled me to write other kind of software I never learned or had time for. E.g. I recently re-implemented an android widget that has not been updated for a decade by it's original author. Or I fixed a bug in a linux scanner driver. None of these I could have done properly (within an acceptable time frame) without AI. But also none of there I could have done properly without my knowledge and experience, even with AI.
Same for daily tasks at work. AI makes me faster here, but also makes me doing more. Implement tests for all edge cases? Sure, always, I saved the time before. More code reviews. More documentation. Better quality in the same (always limited) time.
I use Claude Code a lot but one thing that really made me concerned was when I asked it about some ideas I have had which I am very familiar with. It's response was to constantly steer me away from what I wanted to do towards something else which was fine but a mediocre way to do things. It made me question how many times I've let it go off and do stuff without checking it thoroughly.
I've had quite a bit of the "tell it to do something in a certain way", it does that at first, then a few messages of corrections and pointers, it forgets that constraint.
> it does that at first, then a few messages of corrections and pointers, it forgets that constraint.
Yup, most models suffer from this. Everyone is raving about million tokens context, but none of the models can actually get past 20% of that and still give as high quality responses as the very first message.
My whole workflow right now is basically composing prompts out of the agent, let them run with it and if something is wrong, restart the conversation from 0 with a rewritten prompt. None of that "No, what I meant was ..." but instead rewrite it so the agent essentially solves it without having to do back and forth, just because of this issue that you mention.
Seems to happen in Codex, Claude Code, Qwen Coder and Gemini CLI as far as I've tested.
Yes, agreed. I find it interesting that people are saying they're building these huge multi-agent workflows since the projects I've tried it on are not necessarily huge in complexity. I've tried variety of different things re: isntructions files, etc. at this point.
So far, I haven't yet seen any demonstration of those kind of multi-agent workflows ending up with code that won't fall down over itself in some days/weeks. Most efforts so far seems to have to been focusing on producing as much code as possible, as fast as possible, while what I'd like to see, if anything, is the opposite of that.
Anytime I ask for demonstration of what the actual code looks like, when people start talking about their own "multi-agent orchestration platforms" (or whatever), they either haven't shared anything (yet), don't care at all about how the code actually is and/or the code is a horrible vibeslopped mess that contains mostly nonsense.
I call this the Groundhog Day loop
That's a strange name, why? It's more like a "iterate and improve" loop, "Groundhog Day" to me would imply "the same thing over and over", but then you're really doing something wrong if that's your experience. You need to iterate on the initial prompt if you want something better/different.
I thought "iterate and improve" was exactly what Phil did.
Call me a conspiracy theorist, and granted much of this could be attributed to the fact that the majority of code in existence is shit, but im convinced that these models are trained and encouraged to produce code that is difficult for humans to work on. Further driving and cementing the usage of then when you inevitably have to come back and fix it.
I don't think they would be able to have an LLM withouth the flaws. The problem is that an LLM cannot make a distinction between sense and nonsense in the logical way. If you train an LLM on a lot of sensible material, it will try to reproduce it by matching training material context and prompt context. The system does not work on the basis of logical principles, but it can sound intelligent.
I think LLM producers can improve their models by quite a margin if customers train the LLM for free, meaning: if people correct the LLM, the companies can use the session context + feedback to as training. This enables more convincing responses for finer nuances of context, but it still does not work on logical principles.
LLM interaction with customers might become the real learning phase. This doesn't bode well for players late in the game.
This could be the case even without an intentional conspiracy. It's harder to give negative feedback to poor quality code that's complicated vs. poor quality code that's simple.
Hence the feedback these models get could theoretically funnel them to unnecessarily complicated solutions.
No clue has any research been done into this, just a thought OTTOMH.
Or it takes a lot of time effort and intelligence to produce good code and IA is not there yet…
It is a mathematical, averaging model after all
Mediocre is fine for many tasks. What makes a good software engineer is that he spots the few places in every software where mediocre is not good enough.
Yes but in my experience this sometimes works great, other times you paint yourself in a corner and the sun total is that you still have to learn the thing, just the initial ram is less steep. For example I build my self a nice pipeline for converting jpegs on disk to h264 on disk via zero-copy nvjpeg to nvenc, with python bindings but have been pulling out my hair over bframe ordering and weird delays in playback etc. Nothing u solvable but I had to learn a great deal and when we were in the weeds, Opus was suggesting stupid hack quick fixes that made a whack a mole with the tests. In the end I had to lead e Pugh and read enough to be able to ask it with the right vocabulary to make it work. Similarly with entering many novel areas. Initially I get a rush because it "just works" but it really only works for the median case initially and it's up to you to even know what to test. And AIs can be quite dismissive of edge cases like saying this will not happen in most cases so we can skip it etc.
Yeah, knowing what words to use is half the battle. Quickly throw away a prompt like "Hey, `make build` takes five minutes, could you make it fast enough to run under 1 minute" and the agent will do some work and say "Done, now the build takes 25 seconds as we're skipping the step of building the images, use `make build INCLUDE_IMAGES=true` when you want to build with images". It's not wrong, given the prompt, but takes a bit to get used to how they approach things.
I'm in the same boat. I've been taking on much more ambitious projects both at work and personally by collaborating with LLMs. There are many tasks that I know I could do myself but would require a ton of trial and error.
I've found giving the LLMs the input and output interfaces really help keep them on rails, while still being involved in the overall process without just blindly "vibe coding."
Having the AI also help with unit tests around business logic has been super helpful in addition to manual testing like normal. It feels like our overall velocity and code quality has been going up regardless of what some of these articles are saying.
100% agree with AI expanding core testing from my own edge and key tests.
I agree, I write out the sketch of what I want. With a recent embedded project in C I gave it a list of function signatures and high level description and was very satisfied with what it produced. It would have taken me days to nail down the particulars of the HAL (like what kind of sleep do I want what precisely is the way to setup the WDT and ports).
I think it's also language dependent.
I imagine JavaScript can be a crap shoot. The language is too forgiving.
Rust is where I have had most success. That is likely a personal skill issue, I know we want a Arc<DashMap>, will I remember all the foibles of accessing it? No.
But given the rigidity of the compiler and strong typing I can focus on what the code functionally is doing, that in happy with the shape/interface and function signature and the compiler is happy with the code.
It's quite fast work. It lets me use my high level skills without my lower level skills getting in the way.
And id rather rewrite the code at a mid-level then start it fresh, and agree with others once it's a large code base then in too far behind in understanding the overall system to easily work on it. That's true of human products too - someone elses code always gives me the ick.
Vanilla javascript is hit or miss for anything complex.
Using Typescript works great because you can still build out the interfaces and with IDE integrations the AIs can read the language server results so they get all the type hints.
I agree that the AI code is usually a pretty good starting point and gets me up to speed for new features fast rather than starting everything from scratch. I usually end up refactoring the last 10-20% manually to give it some polish because some of the code still feels off some times.
Huh. I'm extremely skeptical of AI in areas where I don't have expertise, because in areas where I do have expertise I see how much it gets wrong. So it's fine for me to use it in those areas because I can catch the errors, but I can't catch errors in fields I don't have any domain expertise in.
In my case I built a video editing tool fully customized for a community of which I am a member. I could do it in a few hours. I wouldn't have even started this project as I don't have much free time, though I have been coding for 25+ years.
I see it empowering to build custom tooling which need not be a high quality maintenance project.
> Or I fixed a bug in a linux scanner driver. None of these I could have done properly (within an acceptable time frame) without AI. But also none of there I could have done properly without my knowledge and experience, even with AI
There are some things here that folks making statements like yours often omit and it makes me very sus about your (over)confidence. Mostly these statements talk in a business short-term results oriented mode without mentioning any introspective gains (see empirically supported understanding) or long-term gains (do you feel confident now in making further changes _without_ the AI now that you have gained new knowledge?).
1. Are you 100% sure your code changes didn't introduce unexpected bugs?
1a. If they did, would you be able to tell if they where behaviour bugs (ie. no crashing or exceptions thrown) without the AI?
2. Did you understand why the bug was happening without the AI giving you an explanation?
2a. If you didn't, did you empirically test the AI's explanation before applying the code change?
3. Has fixing the bug improved your understanding of the driver behaviour beyond what the AI told you?
3a. Have you independently verified your gained understanding or did you assume that your new views on its behaviour are axiomatically true?
Ultimately, there are 2 things here: one is understanding the code change (why it is needed, why that particular change implementation is better relative to others, what future improvements could be made to that change implementation in the future) and skill (has this experience boosted your OWN ability in this particular area? in other words, could you make further changes WITHOUT using the AI?).
This reminds me of people that get high and believe they have discovered these amazing truths. Because they FEEL it not because they have actual evidence. When asked to write down these amazing truths while high, all you get in the notes are meaningless words. While these assistants are more amenable to get empirically tested, I don't believe most of the AI hypers (including you in that category) are actually approaching this with the rigour that it entails. It is likely why people often think that none of you (people writing software for a living) are experienced in or qualified to understand and apply scientific principles to build software.
Arguably, AI hypers should lead with data not with anecdotal evidence. For all the grandiose claims, the lack of empirical data obtained under controlled conditions on this particular matter is conspicuous by its absence.
Thanks for pointing these things out. I always try to learn and understand the generated code and changes. Maybe not so deep for the android app (since it's just my own pet project). But especially for every pull request to a project. Everyone should do this out of respect to the maintainers who review the change.
> Are you 100% sure your code changes didn't introduce unexpected bugs?
Who is this ever? But I do code reviews and I usually generate a bunch of tests along with my PRs (if the project has at lease _some_ test infrastructure).
Same applies for the rest of the points. But that's only _my_ way to do these things. I can imagine that others do it a different way and that the points above are more problematic then.
> I always try to learn and understand the generated code and changes
Not to be pedantic but, do you _try_ to understand? Or do you _actually_ understand the changes? This suggests to me that there are instances where you don't understand the generated code on projects others than your own, which is literally my point and that of many others. And even if you did understand it, as I pointed out earlier, that's not enough. It is a low bar imo. I will continue to keep my mind open but yours isn't a case study supporting the use of these assistants but the opposite.
In science, when a new idea is brought forward, it gets grilled to no end. The greater the potential the harder the grilling. Software should be no different if the builders want to lay a claim on the name "engineer". It is sad to see a field who claims to apply scientific principles to the development of software not walking the walk.
It's incredible that within two minutes after posting this comment is already grayed out whereas it makes a number of excellent points.
I've been playing with various AI tools and homebrew setups for a long time now and while I see the occasional advantage it isn't nearly as much of a revolution as I've been led to believe by a number of the ardent AI proponents here.
This is starting to get into 'true believer' territory: you get these two camps 'for and against' whereas the best way forward is to insist on data rather than anecdotes.
AI has served me well, no doubt about that. But it certainly isn't a passe-partout and the number of times it has caused gross waste of time because it insisted on chasing some rabbit simply because it was familiar with the rabbit adds up to a considerable loss in productivity.
The scientific principle is a very powerful tool in such situations and anybody insisting on it should be applauded. It separates fact from fiction and allows us to make impartial and non-emotional evaluations of both theories and technologies.
> (...) you get these two camps 'for and against' whereas the best way forward is to insist on data rather than anecdotes.
I think that's an issue with online discussions. It barely happens to me in the real world, but it's huge on HN.
I'm overall very positive about AI, but I also try to be measured and balanced and learn how to use it properly. Yet here on HN, I always get the feeling people responding to me have decided I am a "true believer" and respond to the true believer persona in their head.
Why would you ever, outside flight and medical software, care about being 100% sure that the change did not introduce any bugs?
Because bugs are bad. Fixing one bug but accidentally introducing three more is such a pattern it should have a name.
They are. And we have processes to minimize them - tests, code review, staging/preprod envs - but they are nowhere close to being 100% sure that code is bug free - that's just way too high bar for both AI and purely human workflows outside of few pretty niche fields.
When you use AI to 'fix' something you don't actually understand the chances of this happening go up tremendously.
I propose "the whack-a-hydra" pattern
Hehe, yes, very apt. It immediately gives the right mental image.
Because why would you make something broken when you could make something not broken?
Because it's way too high bar to be 100% sure outside of few niche fields.
> 1. Are you 100% sure your code changes didn't introduce unexpected bugs?
How often have you written code and been 100% your code didn't introduce ANY bugs?
Seriously, for most of the code out there who cares? If it's in a private or even public repo, it doesn't matter.
I think what we'll see as AI companies collect more usage data the requirements for knowing what you do will sink lower and lower. Whatever advantage we have now is transient.
Also most of the studies shown start to be obsolete with AI rapid path of improvements. Opus 4.5 has been a huge game changer for me (combined with CC that I had not used before) since December. Claude code arrived this summer if I’m not mistaken.
So I’m not sure a study from 2024 or impact on code produced during 2024 2025 can be used to judge current ai coding possibilities.
Agreed, this space move so fast, 2024 feels like light-years away in terms of capabilities.
> But you still need to know how to do things properly in general, otherwise the results are bad.
Even that could use some nuance. I'm generating presentations in interactive JS. If they work, they work - that's the result, and I extremely don't care about the details for this use case. Nobody needs to maintain them, nobody cares about the source. There's no need for "properly" in this case.
I've found this is exact opposite of what I'd dare do with AI, things you don't understand are things you can't verify. Consider you want a windowed pane for your cool project, so you ask an AI to draft a design. It looks cool and it works! Until you bring it outside where after 30 minutes it turns into explosive shrapnel, because the model didn't understand thermal expansion, nor did you.
Contrast this to something you do know but can't be arsed to make; you can keep re-rolling a design until you get something you know and can confirm works. Perfect, time saved.
I think AI will fail in any organisation where the business process problems are sometimes discuvered during engineering. I use AI quite a lot, I recently had Claude upgrade one of our old services from hubspot api v1 to v3 without basically any human interaction beyond the code review. I had to ask it for two changes I think, but over all I barely got out of my regular work to get it done. I did know exactly what to ask of it because the IT business partners who had discovered the flaw had basically written the tasks already. Anyway. AI worked well there.
Where AI fails us is when we build new software to improve the business related to solar energy production and sale. It fails us because the tasks are never really well defined. Or even if they are, sometimes developers or engineers come up with a better way to do the business process than what was planned for. AI can write the code, but it doesn't refuse to write the code without first being told why it wouldn't be a better idea to do X first. If we only did code-reviews then we would miss that step.
In a perfect organisation your BPM people would do this. In the world I live in there are virtually no BPM people, and those who know the processes are too busy to really deal with improving them. Hell... sometimes their processes are changed and they don't realize until their results are measurably better than they used to be. So I think it depends a lot on the situation. If you've got people breaking up processes, improving them and then decribing each little bit in decent detail. Then I think AI will work fine, otherwise it's probably not the best place to go full vibe.
> AI can write the code, but it doesn't refuse to write the code without first being told why it wouldn't be a better idea to…
LLMs combine two dangerous traits simultaneously: they are non-critical about suboptimal approaches and they assist unquestioningly. In practice that means doing dumb things a lazy human would refuse because they know better, and then following those rabbit holes until they run out of imaginary dirt.
My estimation is that that combination undermines their productivity potential without very structured application. Considering the excess and escalating costs of dealing with issues as they arise further from the developers work station (by factors of approximately 20x, 50x, and 200x+ as you get out through QA and into customer environments (IIRC)), you don’t need many screw ups to make the effort net negative.
One benefit of AI could be to build quick prototypes to discover what processes are needed for users to try out different approaches before committing to a full high quality project.
Assuming the prototypes are functional.
> business process problems are sometimes discovered (sic.) during engineering
This deserves a blog post all on its own. OP you should write one and submit it. It's a good counterweight to all the AI optimistic/pessimistic extremism.
> but it doesn't refuse to write the code without first being told why it wouldn't be a better idea to do X first
Then don't ask it to write code? If you ask any recent high quality model to discuss options, tradeoffs, design constraints, refine specs it will do it for you until you're sick and tired of it finding real edge cases and alternatives. Ask for just code and you'll get just code.
They are way better at code-related tasks than design or strategy ones. Anything involving users or business strategy or a "why" is vague and misguided, they have no insight.
To be fair, they're primed to write code, even when you don't ask for it. I explicitly tell Claude "do not write code" when I don't want any, otherwise it'll spit some out just to say hello (world).
You need to be in plan mode. Not only can it not change code, its interaction with you is quite different. It will surface issues and ask you for choices.
"Here is some python code that printf's the travel recommendations you asked for, btw"
The more I read people saying that Claude is failing, the more I realize this is 90% a user problem. This is just an example, but I see it often.
Claude has a mode specifically for what you're talking about, it is actually very good (Opus 4.5) at planning and going through design without coding, it's called planning mode.
Listen, if you aren't constantly shift-tab or esc-esc during complex problems, and then struggling when it isn't working for you, rtfm, you'll get further and better results.
> Unlike their human counterparts who would and escalate a requirements gap to product when necessary, coding assistants are notorious for burying those requirement gaps within hundreds of lines of code
This is the kind of argument that seems true on the surface, but isn't really. An LLM will do what you ask it to do! If you tell it to ask questions and poke holes into your requirements and not jump to code, it will do exactly that, and usually better than a human.
If you then ask it to refactor some code, identify redundancies, put this or that functionality into a reuseable library, it will also do that.
Those critiques of coding assistants are really critiques of "pure vibe coders" who don't know anything and just try to output yet another useless PDF parsing library before they move on to other things.
I hear your pushback, but that I think that's his point:
Even seasoned coders using plan mode are funneled towards "get the code out" when experience shows that the final code is a tiny part of the overall picture.
The entire experience should be reorganized that the code is almost the afterthought, and the requirements, specs, edge cases, tests, etc are the primary part.
This is always been the businessman's dream to write requirements and then coding becomes a mindless work but requirements and specs can never cover every small detail. Code itself is the spec but Business people just dont wanna write it. if you handle all edge cases and limitation in the spec, and then do the same in the code, you are just writing code twice.
This also completely ignores the fact that PMs and Business teams are generating specs by AI too, so its slop covered by more slop and has no actual specific details until you reach the code level.
It will not in fact always do what you ask it because it lacks any understanding, though the chat interface and prolix nature of LLMs does a good job at hiding that.
It’s like in Anthropic’s own experiment. People who used AI to do their work for them did worse than the control group. But people who used AI to help them understand the problem, brainstorm ideas, and work on their solution did better.
The way you approach using AI matters a lot, and it is a skill that can be learned.
It not just about asking questions, its about asking right questions. Can AI pushback and decline a completely stupid request? PMs & Business people dont really know the limitation of the software and almost always think adding more features is better. With AI you will be shipping 90% of the features which were never needed thus adding to bloat & making the product go off the rails quicker.
> There’s a name for misalignment between business intent and codebase implementation: technical debt.
I wish we'd stop redefining this term. Technical debt is a shortcut agreed upon with the business to get something out now and fix later, and the fix will cost more than the original. It is entirely in line with business intent.
Exactly. The quote is a great definition of a bug, not debt
Software is not a liability, it's an asset. If you make it for less then it has a shorter shelf-life. Tech debt is a nonsense term to begin with.
"Experienced developers were 19% slower when using AI coding assistants—yet believed they were faster (METR, 2025)"
Anecdotally I see this _all the time_...
Talking and typing feels far more productive that staring and thinking, and there is a cumulative effect of those breaks to check Reddit while something is generating.
Humans are notoriously bad at estimating time use with different subjective experiences and show excessive weighting of the tail ends of experiences and perceived repetitious tasks. Making something psychologically more comforting and active, particularly if you can activate speech, will distort people’s sense of time meaningfully.
The current hype around LLMs is making me think about misapplied ORMs in medium scale projects... The tool is chosen early to save hours of boring typing and a certain kind of boring maintenance, but deep into the project what do we see? Over and over days are spontaneously being lost to incidental complexity and arbitrary tool constraints. And with the schedule slipping it’s too much work to address the root issue so band-aides get put on band-aides, and we start seeing weeks slip down the drain.
Subjective time accounting and excessive aversion to specific conceptual tasks creates premature optimizations whose effects become omnipresent over time. All the devs in the room agreed they want to avoid some work day 1, but the accounting shows a big time commitment resulting from that immediate desire. Feelings aren’t stopwatches.
[Not hating on ORMs, just misusing tools for weeks to save a couple hours - every day ain’t Saturday - right tool for the job.]
This is actually amazing, isn't it? we are just 21% away from becoming faster then?
Also I don't even care about speed, since I've managed to get soooo much work done which I would not even have wanted to start working on manually.
The article they are referring to is 404, but based on the URL was published bit more than year ago. That's quite long time in a field that is evolving so rapidly and which even the pioneers are still figuring out.
Its not from a year ago, just 6 months ago: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
Yes, that's true, because as developer you have to check if "generated" code meet your standards and if is handling all edge cases you see.
When you are an experienced developer and you "struggle" writing manually some code this is important warning indicator about project architecture - that something is wrong in it.
For such cases I like to step back and think about redesign/refactor. When coding goes smoothly, some "unpredicted" customer changes can be added easly into project then it is the best indicator that architecture is fine.
That's my humble human opinion ;)
It's even simpler than that. "Reading code is harder than writing code" has been repeated for decades and everyone agrees.
When you use AI to generate your code, instead of you writing it and then someone else reviewing it, there are two people reviewing it (you and the reviewer), which obviously takes longer.
I think that the premise is wrong (and the title is very clickbaity, but we will ignore that it doesn’t really match the article and the “conclusion”): coding agents are “solving” at least one problem, which is to massively expand the impact of senior developers _that can use them effectively_.
Everything else is just hype and people “holding it wrong”.
I really wonder how you people manage to ignore the many research studies that have come out and prove this wrong.
“You people” is a loaded term in US culture. And for what you know, I might just be an AI :)
The writeup is a bit contrived in my opinion. And sort of misrepresenting what users can do with tools like Claude Code.
Most coding assistant tools are flexible to applying these kinds of workflows, and these sorts of workflows are even brought up in Anthropic's own examples on how to use Claude Code. Any experienced dev knows that the act of specifically writing code is a small part of creating a working program.
this concept of bottlenecking on code review is definitely a problem.
Either you (a) don't review the code, (b) invest more resources in review or (c) hope that AI assistance in the review process increases efficiency there enough to keep up with code production.
But if none of those work, all AI assistance does is bottleneck the process at review.
Also the thought of my job becoming more code review than anything else is enough to turn me into a carpenter.
If companies truly believed more code equals more productivity then they will remove all code review from their process and let IC’s ship AI generated code that they “review” as the prompter directly to prod.
you mean to Staging, right? even non AI code can't be trusted on "straight to prod on Friday evening" level
I have found that using Cursor to write in Rust what I previously would write as a shell or Python or jq script was rather helpful.
The datasets are big and having the scripts written in the performant language to process them saves non-trivial amounts of time, like waiting just 10 minutes versus an hour.
Initial code style in the scripts was rather ugly with a lot of repeated code. But with enough prompting that I reuse the generated code became sufficiently readable and reasonable to quickly check that it is indeed doing what was required and can be manually altered.
But prompting it to do non-trivial changes to existing code base was a time sink. It took too much time to explain/correct the output. And critically the prompts cannot be reused.
Same though lately discovered some rough edges in rust with LLM. Sticking a working app into a from scratch container image seems particularly problematic even if you give it the hint that it needs to static link
> Experienced developers were 19% slower when using AI coding assistants—yet believed they were faster
One paper is sure doing a lot of leg work these days...
You know, anecdotally...
When I first picked up an agentic coding assistant I was very interested in the process and paid way more attention to it than necessary.
Quickly, I caught myself treating it like a long compilation and getting up to get a coffee and had to self correct this behavior.
I wonder how much novelty of the tech and workflow plays into this number.
The requirements gap point is underrated. AI guesses where a human would ask By the time you catch it in review, you've already wasted the time you saved -_-
I always stop reading when I see someone citing that METR study
Some of the conclusions remind of the "ha ha only serious" joke that most people (obviously not the Monks themselves) had about Perl; "write only code". Maybe some of the lessons learnt about how to maintain Perl code might be applicable in this space?
I barely use ai as a coding assistant. I use it as a product owner. Works wonders. Especially in this age of clueless product owners.
A Calculator won't increase your creativity directly but it will free resources that you can allocate to creativity!
First you must accept that engineering elegance != market value. Only certain applications and business models need the crème de le crème of engineers.
LLM has been hollowing out the mid and lower end of engineering. But has not eroded highest end. Otherwise all the LLM companies wouldn’t pay for talent, they’d just use their own LLM.
It's not just about elegance.
I'm going to give an example of a software with multiple processes.
Humans can imagine scenarios where a process can break. Claude can also do it, but only when the breakage happens from inside the process and if you specify it. It can not identify future issues from a separate process unless you specifically describe that external process, the fact that it could interact with our original process and the ways in which it can interact.
Identifying these are the skills of a developer, you could say you can document all these cases and let the agent do the coding. But here's the kicker, you only get to know these issues once you started coding them by hand. You go through the variables and function calls and suddenly remember a process elsewhere changes or depends on these values.
Unit tests could catch them in a decently architected system, but those tests needs to be defined by the one coding it. Also if the architect himself is using AI, because why not, it's doomed from the start.
So, your point is that programmers identify the unexpected edge cases through the act of taking their time writing the code by hand. From my experience, it takes a proficient developer to actually plan their code around future issues from separate processes.
I think that it's mistaken to think that reasoning while writing the code is at all a good way to truly understand what your code is doing. (Without implying that you shouldn't write it by hand or reason about it.) You need to debug and test it thoroughly either way, and basically be as sceptical of your own output as you'd be of any other person's output.
Thinking that writing the code makes you understand it better can cause more issues than thinking that even if you write the code, you don't really know what it's doing. You are merely typing out the code based on what you think it should be doing, and reasoning against that hypothesis. Of course, you can be better or worse at constructing the correct mental model from the get go, and keep updating it in the right direction while writing the code. But it's a slippery slope, because it can also go the other way around.
A lot of bugs that take unreasonably long for junior-mid level engineers to find, seem to happen because: They trust their own mental model of the code too much without verifying it thoroughly, create a hypothesis for the bug in their own head without verifying it thoroughly, then get lost trying to reason about a made up version of whatever is causing the bug only to come to the conclusion that their original hypothesis was completely wrong.
> From my experience, it takes a proficient developer to actually plan their code around future issues from separate processes.
And it takes even more experience to know when not to spend time on that.
Way too many codebases are optimised to 1M DAU and see like 100 users for the first year. All that time optimising and handling edge cases could've been spent on delivering features that bring in more users and thus more money.
Agreed. Overengineering and premature optimization are the root of all crud.
I keep hearing this but I don’t understand. If inelegant code means more bugs that are harder to fix later, that translates into negative business value. You won’t see it right away which is probably where this sentiment is coming from, but it will absolutely catch up to you.
Elegant code isn’t just for looks. It’s code that can still adapt weeks, months, years after it has shipped and created “business value”.
It's a trade-off. The gnarly thing is that you're trading immediate benefits for higher maintenance costs and decreased reliability over time, which makes it a tempting one to keep taking. Sure, there will be negative business value, but later, and right now you can look good by landing the features quicker. It's FAFO with potentially many reporting quarters between the FA and the FO.
This trade-off predates LLMs by decades. I've been fortunate to have a good and fruitful career being the person companies hire when they're running out of road down which to kick the can, so my opinion there may not be universal, mind you.
Sometimes "elegance" just makes shit hard to read.
Write boring code[0], don't go for elegance or cool language features. Be as boring and simple as possible, repeat yourself if it makes the flow clearer than extracting an operation to a common library or function.
This is the code that "adapts" and can be fixed 3 years after the elegant coder has left for another greenfield unicorn where they can use the latest paradigms.
[0] https://berthub.eu/articles/posts/on-long-term-software-deve...
People sometimes conflate inelegance with buggy code, where the market fit and value matter more than code elegance. Bugs still are not acceptable even in your MVP. Actually I think buggy software especially if those bugs destroy user experience, will kill products. It’s not 2010 anymore. There are a lot of less buggy software out there and attention spans are narrower than before.
edit: typo
> I keep hearing this but I don’t understand. If inelegant code means more bugs that are harder to fix later, that translates into negative business value.
That's a rather short-sighted opinion. Ask yourself how "inelegant code" find it's way into a codebase, even with working code review processes.
The answer more often than not is what's typically referred to as tech debt driven development. Meaning, sometimes a hacky solution with glaring failure modes left unaddressed is all it takes to deliver a major feature in a short development cycle. Once the feature is out, it becomes less pressing to pay off that tech debt because the risk was already assumed and the business value was already created.
Later you stumble upon a weird bug in your hacky solution. Is that bug negative business value?
You not only stumble upon a weird bug in your hacky solution that takes engineering weeks to debug, but your interfaces are fragile so feature velocity drops (bugs reproduce and unless you address reproduction rate you end up fixing bugs only) and things are so tightly coupled that every two line change is now multi-week rewrite.
Look at e.g. facebook. That site has not shipped a feature in years and every time they ship something it takes years to make it stable again. A year or so ago facebook recognized that decades of fighting abuse led them nowhere and instead of fixing the technical side they just modified policies to openly allow fake accounts :D Facebook is 99% moltbook bot-to-bot trafic at this point and they cannot do anything about it. Ironically, this is a good argument against code quality: if you manage to become large enough to become a monopoly, you can afford to fix tech debt later. In reality, there is one unicorn for every ten thousand of startups that crumbled under their own technical debt.
> You not only stumble upon a weird bug in your hacky solution that takes engineering weeks to debug, but your interfaces are fragile so feature velocity drops (bugs reproduce and unless you address reproduction rate you end up fixing bugs only) and things are so tightly coupled that every two line change is now multi-week rewrite.
I don't think you fully grasp the issue you're discussing. Things don't happen in a vacuum, and your hypothetical "fragile interfaces" that you frame as being a problem are more often than not a lauded solution to quickly deliver a major feature.
The calling card of junior developers is looking at a project and complaining it's shit. Competent engineers understand tradeoffs and the importance of creating and managing technical debt.
This has always been true. I just don’t see how AI makes accumulating tech debt more attractive, as the original poster seems to be implying. If anything it seems to make things worse. At least when you write shit code by hand you know it so you can remember to go back to it, keep it in mind as a potential source of bugs. But YOLO from AI and you probably have no idea.
Of course a bug is negative business value. Perhaps the benefit of shipping faster was worth the cost of introducing bugs, but that doesn't make it not a cost.
If a bug is present but there is no one who encounters it, is it negative business value?
That’s not how this goes.
Because the entire codebase is crap, each user encounters a different bug. So now all your customers are mad, but they’re all mad for different reasons, and support is powerless to do anything about it. The problems pile up but they’re can’t be solved without a competent rewrite. This is a bad place to be.
And at some level of sloppiness you can get load bearing bugs, where there’s an unknown amount of behavior that’s dependent on core logic being dead wrong. Yes, I’ve encountered that one…
> That’s not how this goes.
Once you gain some professional experience working with software development, you'll understand that that's exactly how it goes.
I think you are failing to understand the "soft" in "software". Changing software is trivial. All software has bugs, but the only ones being worked on are those which are a) deemed worthy of being worked on, b) have customer impact.
> So now all your customers are mad, but they’re all mad for different reasons, and support is powerless to do anything about it.
That's not how it works. You are somehow assuming software isn't maintained. What do you think software developers do for a living?
Nothing I just described was hypothetical. I’ve been the developer on the rewrite crew, the EM determining if there’s anything to salvage, and the client with a list of critical bugs that aren’t getting fixed and ultimately killed the contract.
If you haven’t seen anything reach that level of tech debt with active clients, well, lucky you.
If you can see the future and know no-one will ever encounter it, maybe not. But in the real world you presumably think there's some risk (unless no-one is using this codebase at all - but in that case the whole thing has negative business value, since it's incurring some cost and providing no benefit).
Perhaps this was never actually true. Did anyone do an A/B test with messy code vs beautiful code?
OT: I applaud your correct use of the grave accent, however minor nitpick: crème in French is feminine, therefore it would be “la”.
There's an interesting aside about the origin of the phrase in Leslie Claret's Integral Principles of the Structural Dynamics of Flow
https://youtu.be/ca27ndN2fVM?si=hNxSY6vm0g-Pt7uR
Well, it takes time to assess and adapt, and large organizations need more time than smaller ones. We will see.
In my experience the limiting factor is doing the right choices. I've got a costumer with the usual backlog of features. There are some very important issues in the backlog that stay in the backlog and are never picked for a sprint. We're doing small bug fixes, but the big ones. We're doing new features that are in part useless because of the outstanding bugs that prevent customers from fully using them. AI can make us code faster but nobody is using it to sort issues for importance.
> nobody is using it to sort issues for importance
True, and I'd add the reminder that AI doesn't care. When it makes mistakes it pretends to be sorry.
Simulated emotion is dangerous IMHO, it can lead to undeserved trust. I always tell AI to never say my name, and never use exclamation points or simulated emotion. "Be the cold imperfect calculator that you are."
When it was giving me complements for noticing things it failed to, I had to put a stop to that. Very dangerous. When business decisions or important technical decisions are made by an entity that literally is incapable of caring, but instead pretends to like a sociopath, that's when trouble brews.
OpenAI has said they're slowing down hiring drastically because their models are making them that much more productive. Codex itself is being built by Codex. Same with Claude Code.
Source: Trust me, bro. A company selling an AI model telling others their AI model is so good that it's building itself. What could possibly motivate them to say that?
Remember a few years ago when Sam Altman said we had to pause AI development for 6 months because otherwise we would have the singularity and it would end the world? Yeah, about that...
Claude Code creator is saying it too. He doesn't code anymore.
I personally don't code manually anymore either so I'm inclined to believe them.
Based on my experience using Claude opus 4.5, it doesn't really even get functionality correct. It'll get scaffolding stuff right if you tell it exactly what you want but as soon as you tell it to do testing and features it ranges from mediocre to worse than useless.
So basically - "ai" - actually llms - are decent at what they are trained at - producing plausible text with a bunch of structure and constraints - and a lot of programming, boring work emails, reddit/hn comments, etc can fall into that. It still requires understanding to know when that diverges from something useful, it still is just plausible text, not some magic higher reasoning.
Are they something worth using up vast amounts of power and restructuring all of civilisation around? No
Are they worth giving more power to megacorps over? No
Its like tech doesn't understand consent and then partially the classic case of "disrupting x" - thinking that you know how to solve something in maths, cs, physics and then suddenly that means you can solve stuff in a completely different field.
llms are over indexed.
>>> The jury is out on the effectiveness of AI use in production, and it is not a pretty picture.
Errrrr…. false.
I’ll stop reading right there thanks I think I know what’s coming.
Isn't this proposal closely matching with the approach OpenSpec is taking? (Possibly other SDD tool kits, I'm just familiar with this one). I spend way more time in making my spec artifacts (proposal, design, spec, tasks) than I do in code review. During generation of each of these artifacts the code is referenced and surfaces at least some of the issues which are purely architecture based.
Hard to take it serious when it opens with this note: `48% of AI-generated code contains security vulnerabilities (Apiiro, 2024`?
Really? 2024? That was forever ago in LLM coding. Before tool calling, reasoning, and larger context windows.
It is like saying YouTube couldn’t exist because too many people were still on dial up.
https://www.wiz.io/blog/exposed-moltbook-database-reveals-mi...
meh piece, don't feel like I learned anything from it. Mainly words around old stats in a rapidly evolving field, and then trying to pitch their product
tl;dr content marketing
There is this super interesting post in new about agent swarms and how the field is evolving towards formal verification like airlines, or how there are ideas we can draw on. Any, imo it should be on the front over this piece
"Why AI Swarms Cannot Build Architecture"
An analysis of the structural limitations preventing AI agent swarms from producing coherent software architecture
https://news.ycombinator.com/item?id=46866184
> meh piece, don't feel like I learned anything from it.
That's fine. I found the leading stats interesting. If coding assistants slowed down experienced developers while creating a false sense of development speed then that should be thought-provoking. Also, nearly half of code churned by coding assistants having security issues. That he's tough.
Perhaps it's just me, but that's in line with my personal experience, and I rarely see those points being raised.
> There is this super interesting post in new about agent swarms and how (...)
That's fine. Feel free to submit the link. I find it far more interesting to discuss the post-rose tinted glasses view of coding agents. I don't think it makes any sense at all to laud promises of formal verification when the same technology right now is unable to introduce security vulnerabilities.
> found the leading stats interesting
They are from before the current generation of models and agent tools, they are almost certainly out of date and now different and will continue to evolve
We're still learning to crawl, haven't gotten to walking yet
> Feel free to submit the link
I did, or someone else did, it's the link in the post you replied to
Wondering why is ths on front page? There is hardly any new insight other than a few minutes of exposure to greenish glow that makes everything looks brownish after you close that page.
I upvoted because I’m very keen for more teams to start trying to solve this problem and release tools and products to help.
Context gathering and refinement is the biggest issue I have with product development at the moment.