In my large enterprise world, AI adoption hasn't made it outside of the development teams - only developers have access to Github Copilot.
Code takes 6-12 months to make it from commit to production. Development speed was never the bottleneck; it's all the other processes that take time: infra provisioning, testing, sign-offs, change management, deployment scheduling etc.
AI makes these post-development bottlenecks worse. Changes are now piling up at the door waiting to get on a release train.
Large enterprises need to learn how to ship software faster if they want to lock in ROI on their token spend. Unshipped code is a liability, not an asset.
> Development speed was never the bottleneck; it's all the other processes that take time: infra provisioning, testing, sign-offs, change management, deployment scheduling etc.
So much of Management (both mid and executive) still considers Software as if it were an assembly line; "We make software just like how Ford makes cars". Code as a product.
Which isn't to say that most software development isn't woefully inefficient, but the important bits aren't even considered. "The Work" is seen as being writing code, not the research that goes into knowing what code has to be written.
And for AI marketing, this is almost a videogame-esque weakspot. Microsoft proclaims "50% faster code!" and every management fool thinks "50% faster product; 50% faster money!"
> Large enterprises need to learn how to ship software faster if they want to lock in ROI on their token spend.
It's going to be a disaster once ROI is demanded. Right now everyone is fine with not measuring it; Investors are drunk on hype and nobody within the company actually wants to admit that properly measuring software development productivity is almost impossible.
But the hype won't last forever. Sooner or later investors will see the "$2M spend" and demand "$4M net profit", and that's not going to materialize.
Copilot and Claude won't be tackling the real bottlenecks. They're not going to dredge up decade old institutional knowledge, they won't figure out whether code looks bad because it is bad or because it solves a specific undocumented problem, they won't anticipate future uses.
Code just isn't the product. Not the real work. Really, if your codebase is in a healthy state, it's often a literally free output of the design and research processes. By the time you've refined "our procurement team finds the search hard to use" into a practical ticket, the React component for the appropriate search filters has basically already been written, writing up the code is just a short formality. Asking Copilot would turn a 10 minute job into a 5 minute job. Real impressive, were it not for the 6 hours of meetings and phone calls that went into it.
Almost certainly. Software firms are pretty bad at self-evaluation and they're profitable enough that Capitalism won't force them to do it either.
Right now the subscriptions are still in the range of reasonable business expenses, but pretty soon they'll have to jump and $200/month/seat subscriptions turning into $2000/month/seat subscriptions is going to get even very badly ran companies to re-evaluate.
It's worse than that. Developers themselves are drunk. They'll be cut off from tools right when they no longer understand the underlying code they're responsible for.
We're already here even. I know of a company that was doubling their Codex spend and hitting the cap week over week and finally they had enough and stopped increasing. Then they maxed out on credits and had a week of no Codex. A large percentage of the engineers loudly refused to work for the rest of the week. They were managing the Codex managing the codebase and were totally incapable of dealing with its output without it.
> Large enterprises need to learn how to ship software faster
They haven't even learned that "less code is better" yet, I wouldn't hold my breathe waiting for them to suddenly learn "more advanced" things like that before they learn the basics.
More code means more support and more maintenance. If your team is already overloaded or if it's going to be reduced because of AI, things are going to get tough.
My bigger concern is actually that, if a company isn't careful, the bloat (complexity, amount of code, other artifacts etc.) will just balloon and largely cancel out any gains.
Feedback is often only considered once something is already on fire (financially, functionally, or literally).
That’s the game plan for the AI companies: once companies have massive codebases of critical AI generated code and a skeleton crew of prompt engineers they’re going to be locked in to the AI product to develop anything new.
They’re not even selling shovels, they’re selling subscriptions for shovels.
I would argue that any sufficiently large system reaches a point where more code is in fact the opposite of what it needs.
Nutrition and calories are only useful up-to a point and then we have diminishing and later on negative returns.
Even-tough it is not the best analogy because we are describing two different system, it helps put a mental model around the fact that churning more is often less.
Side Note: A got a feedback from a customer today that while our documentation is complete and very detailed, they find it to be too overwhelming. It turns out having a few bullet points to get the idea across it better than 5 page document. Now it is obvious.
Like in that comic strip[0], where one side uses AI to inflate his bullet points to make it look better and have more content in the email, then other side uses AI to summarize it to bullet points.
This happens a probably a billion times a day. I shudder to think of the cost of it. Especially after knowing how LLMs aren't great at summarizing nor are they flawless at expanding information
Its trickling in slowly to non dev teams. Im consulting with a large fin-tech on Enterprise-wide AI adoption at the moment, and I'm seeing the same parallels though: you have power users that reap disproportionate rewards from it, and then you have the "tab complete" crowd that copy paste things into the prompt.
This was a huge motivation behind me trying to design an AI automation platform that comes "batteries included". I also think a lot of orgs, even engineering orgs do not know how to configure basic things like Claude plugin repositories into their installs.
Same here, but instead of the developers having access to Github Copilot, some selected few devs have access to some internal proxy, that goes to Amazon bedrock, where we have "400 request" per week to Claude Sonet :))))
It's good to know your experience mirrors mine. Developers are moving faster, but the rest of the organization is holding them back because processes and decisions still rely on other parts of the org. Has anyone else observed the same?
Organizations "born in AI" appear to buck this trend for obvious reasons (no legacy org. to deal with). My two cents.
We have a "two timelines" approach going on and I'm curious if others are seeing the same. There are official "Engineering-supported" services. There development speed is not the bottleneck. Engineers demand clean requirements that take forever to show up. Testing and deployment scheduling also take forever post-development. Important people are so fed-up that they've started hiring people to vibe code and develop services without going through Engineering. Code is shipped much faster here but technical debt accumulates rapidly. The important people are beginning to hire Data Scientists who sit outside of the Tech org to manage the AI code. It's all very interesting.
Especially when it waits a month and all the effort is either irrelevant or incompatible with latest changes that finally got through. So much token wastage to top off the recent chaos. Hopefully it improves just as fast as it materialised.
I kept saying this since Day 1 of llms - even 99% of development reduction means almost nothing in our company in speed of delivery of whole projects. And we are introducing generator of code that semi-randomly has poor performance when they have perf bottlenecks and fills the codebase with... sometimes questionable solutions. Sure, one has to check the results all the time, but then time is spent on code reviews, not much less than actual (way more fulfilling, rewarding and career-boosting) development.
Now I understand there are many more scenarios where gains are more realistic and sometimes huge, but it certainly ain't my current working place. So I use it sparingly to not atrophy my skillset but work estimates are so far the same and nobody questions that.
The post hits the nail on the head with the messy middle. There is simply no motivation to develop this sort of intelligence loop as a dev who has their own responsibilities which their job depend on. Management can ask as nicely as they want, but I’m not going to selflessly share my productivity gains with the broader company for free. I might share a tool if it’s useful. All the learning of how to wrangle AI or set up agents is better kept to myself if there is no recognition for sharing.
My company set up a “prompt of the week” award and brown-bag sessions to help spread adoption. We also have teams meant to develop these workflows. Clearly, they set these events up to play it off as their own productivity. Without a real (read “monetary”) incentive or job security, the risk and cost of spreading the knowledge falls squarely on the developer.
It kinda racks my brain how a lot of people don't think this way. For example, way before the current state of AI, I wrote my own CLI to make aspects of my job easier and easier to write scripts to automate; some colleagues have noticed my tool and said I should share it, and my diplomatically worded answer is no. I don't share it with anyone because of the negative return in both supporting it and everyone else being able to be as productive as I am. Moreover, leadership will not recognize my ingenuity as an asset, hence no added job security. No way am I going to help my company out of the goodness of my heart to be potentially let go anyway in the near future.
If developers are worried about their jobs with the way the market currently is, they should treat their personal workflows as trade secrets. My example was not specific to AI, but it applies just as much to AI workflows. In a worker's market, it was sometimes fun to share that kind of knowledge with an organization. In an employer's market, they can pay me if they want access to my personal choices.
I don't think this way because I like to collaborate. If a colleague can benefit from a tool I made I'm proud to save them time. I also think your attitude doesn't pass the golden rule: would you like to work on a team full of people like you?
I sadly have to agree with this. In a collaborative "give and take" world sharing is good. In an environment that takes only, all you have left is your own intellectual property. It is your own most vital asset worth protecting. Shouldn't be like this, but it is.
I love open source, but you are correct in identifying it as a very similar problem, though it's more a problem with software licensing than source code being publically available. Usually the argument is made that FOSS ends up as free labor, which is true in a lot of ways, but I see FOSS devaluing software as a whole. When software is open and libre, that sends a psychological signal that the software isn't that valuable. There would still be FOSS in a world where even projects like React charged a licensing fee to big organizations, but in that case there would be more choice between YOLO with free software or paying for quality software; as token expenses have proven, many companies could absolutely pay for the latter many times over. In terms of specifically open source, however, companies get a bit of a loophole in that their own employees (or LLM of choice) can be "inspired* by the source code and clone aspects of commercial software. This has the effect of devaluing the skill of individual software engineers to being glorified script kiddies.
It sucks to treat the workplace as adversarial, but we unfortunately have to as long as companies have the zero-sum mindset of "wow, everyone is so productive and we're achieving so much, why do we have so many people again?"
And I'm not a "at work we're a family!" guy, but I wish we could just be excellent at our jobs and share it with each other without worrying if I'm digging my own grave.
>but I’m not going to selflessly share my productivity gains with the broader company for free.
If your employer is expecting that you selflessly share your time for free, you’re getting fucked. Most people are paid to do their job. They are, of course, then expected to work for their employers while on the clock.
May not have been clear. My job is not AI development. I have features to deliver. The ask from employer is to add the AI knowledge sharing on top of it. They don’t pay for that. When layoffs come, it wouldn’t save me from missed deliverables.
I refuse to use LLMs and don't have a job, so I'm just some guy.
What I find strange about this is that in 2020 nobody would be this openly cynical and selfish about, say, good Python idioms, a useful emacs configuration, git shortcuts, etc. This attitude of "your job is to deliver value for the customer, anything else is a distraction, and if you share your hard-earned value-delivery techniques with others then you are a sucker" - this is new, and very disconcerting.
I understand there's not much we can do to stop the cyberpunk dystopia, but do we have to leap in head-first?
As a 3 year retired Systems Analyst I feel bad for my younger colleagues. In 2023 I was one of the first in my team to use AI to untangle some legacy code that did something mission critical with Perl and whose original author had long ago left and apparently didn't understand anything about actually commenting code or documentation. We were all in awe of this new technology that got us out of a bind. But more and more it looks less like a tool that is available to you instead of something that is being _done_ to you. Nobody asked for this.
At what point is inspiration and thought just devalued and worthless in the name of doing things instantly. The work has no soul.
AI by itself isn’t that useful. An agent forgets and makes enough mistakes that you have to check all its work, which can be net productivity negative.
It really comes into its own when you treat it as a tool that can build other tools. For example, having it build tools that force it to keep going until its work reaches a certain quality, or runs compliance checks on its outputs and tells it where it needs to fix things. Then and only then, can you trust its work.
Right now most current roles & workflows are designed around wrangling the tools you’re given to do a certain job. In that regime AI can only slide in at the edges.
It's just ass to work in this area now. In the company I work, the bosses let everyone use it, even non-developers. I really want to quit and work in another area but unfortunately where I live a beginning salary can't pay a rent and I'm getting old
Great article. The part that stood out to me is the shift in how organizations define work.
In the old model, performance and OKRs were anchored in disciplines, job titles, and role-specific expectations. In the AI era, those boundaries are starting to collapse. The deeper issue is psychological and organizational: people are constantly negotiating the line between “this is my job” and “this is not my responsibility.”
That creates a key adoption problem: what is the upside of being visibly recognized as an expert AI user? If people learn that I can do faster, better, and more cross-functional work, why would I reveal that unless the company also creates a clear system for recognition, compensation, or career growth?
Eventually whoever is responsible to fix prod incidents and maintain has the ownership. And I agree that’s pretty messy in a world where agents are crossing those boundaries. Will the AI engineer with their horde of agents be responsible to keep everything running? I really doubt so, but we will see
If they create a system to compensate expert AI users wouldn't that career have a problem in that anyone (enticed by the new careers existence and) integrating their advice on any company particulars with a (weeks) more modern approach is basically putting them in the role of domain expert being eliminated.
The part I push back on is the idea that expertise is easy to learn in just a few weeks.
Take Andrej Karpathy as an example. Even if I knew exactly what tools he uses and what his workflow looks like, I still would not be able to produce anything close to what he can produce in a few weeks. And he is not standing still either—he is evolving at the same time.
A lot of real expertise is not in the visible/system-able workflow. It is in someone’s experience, taste, judgment, and wisdom. You can copy the artifact, but you cannot easily copy the thinking behind it: the principles, the decision-making, and the ability to apply those principles across many different/subtle situations.
But I do agree with the concern behind the argument. People may worry that sharing what they know could weaken their own position. And the more uncomfortable question is about peers: if someone’s role can be “retired” because others absorbed their knowledge and skills, then it is hard not to ask, “Am I next?”
The problem of failed learning existed long before AI. The information comes from multiple sources like Slack conversations, customer support cases, and sales calls, but there is no process in place for filtering out patterns. Speed of development wasn't the issue. Figuring out which features should be developed is the issue. With AI, it takes three days to deliver new features instead of three weeks, but you still may spend those three days on something not worth building.
The loop closes only when customers' insights have a proper place to go, where duplicates can be filtered out and priorities set. This isn't happening for many teams, and the acceleration of code generation will only make things worse.
This is a great article. It helps you realize that the feedback loop is the goal but it won't just happen and traditional methodologies don't really support it. Has anyone here found a good way that promotes teams in a company to focus on the loop instead of productivity hack?
TIL my $company has used the same consultants as this guy. We started with Training and Champions, to Leadership/Lab/Crowd with a CoE/brown bags.
We are definitely struggling with the same issues author describes, but even worse the leaders down at the Crowd level have some perverse need to achieve reuse across their teams, rather than letting their Crowd experiment. One team does something interesting, we must stop and get that thing out to all teams in that group, so everyone “benefits”. This is a scarcity mindset, which made sense pre-AI where code was costly and ideas were more valuable.
At the same time, everyone not only has to do their work, they need to be 25% more efficient from AI (new KPIs), and so their own learnings slow to a halt, and the team with the cool idea has to give presentations instead of hacking.
On the first part of the article, I believe it describes how individual productivity gains do not seem to translate to business / larger scale productivity. I think this is expected; individual developer productivity, code volume, LOC/day never was a valuable metric on a company scale. Number of delivered features might be one, but ultimately, revenue and customer growth etc are.
While I do believe higher developer productivity can lead to faster reacting to market forces or more A/B testing, that won't necessarily lead to a successful business. Because ultimately it rarely is the software that's the issue there.
It’s been helpful for me to look at the promise of AI by comparing with the dotcom boom. Lots of similarities.
But the internet was a simpler concept for businesses. Basically it was you can now sell to people from their computers. AI’s promise is what? It can approximate reasoning about things? This is much more challenging implementation puzzle to truly solve.
I don’t know that I’ve seen anything of real substance outside coding tasks yet.
My biggest gripe with language models is that technical and conceptual discussions which used to be led organically (with people having to think about what others wrote and decide what to reply themselves) now turned into AI slop avalanches with participants just copy pasting obviously generated text into the discussion. And those texts are always very long and super weird to respond to because they usually are overall correct enough so you can't just reject them but are flawed all over the place, missing the point, lacking depth where it would be important, skipping over important steps. This is a huge time waste. Funnily, many people have no idea how obvious it is that their texts are generated and rubbish at that.
The Hub captures decisions humans made. It can't see the ones they didn't, which is most of what AI ships. You instrument the deliberate half and inherit the undeliberate one.
Even if LLMs write more buggy code they can still bring up software quality in the short to medium term by allowing you to clear out a lot of the backlog of bugs and UI issues that are known but never had enough priority to be fixed
Debugging and developing first fixes is also one of the spaces where current LLMs are the biggest force multipliers. Especially if you have reproduction cases the LLM can test on its own
But long-term it might look very different as more and more of the code becomes LLM written
> "Where is the ROI for the 2 mio € we paid Anthropic last year?"
The bias in the assumptions here is absolutely bonkers.
Problem: GenAI is not generating any visible return on investment.
"Solution": rearrange your entire development organization around the technology and start inventing new tooling.
What's entirely obvious is that the point of such articles is not the stuff they purportedly discuss, but the normalization of assumptions those discussions are based on.
One more point I noticed: since AI adoption is being promoted by companies, collaboration between developers could suffer. Why wait for a more experienced developer to have the time to explain some aspect of the codebase to you (and at the same time confess your ignorance), when AI can do it right away in a competent-sounding way (and most of the time it will probably be right, too)?
That already happens here. I am old dev who was the goto guy for people with certain business and technical questions. Not anymore (which is part good, as I'm interrupted much less, and part bad, as sometimes they regard the wrong answer as truth).
You could vibe yourself up an AMA tool where people can submit questions, an agent goes to work on them, then the question and agent answer sit in a queue waiting for you to provide a review and give your weigh-in.
Coworkers are demonstrating that they value immediacy and possibly also some combination of embarrassment about their question or social anxiety about asking someone else, over accuracy. Not only does that still require the coworker to review the question, and also lose immediacy vs an LLM, but it might even take longer before rogerthis gets around to reviewing the queue.
I'm pretty sure this is the best idea I've ever heard of for this technology. You should build that tool and it should become mandatory throughout the tech world.
Can we get some enabling legislation? A UN resolution perhaps?
The “get an immediate agent answer then a human expert’s fast-follow” is I think a great idea for many domains - imagine if you could get legal advice this way; the agent will have already explained the basics and the human expert just has to provide corrections - way less typing by humans.
Also, the corrections are now documented and could become future grounding for the agent.
This seems like more work (and much more cognitive load) than simply being available to answer questions as they arise. Nothing is more exhausting than fact-checking LLM slop.
I think you hit the nail on the head, it's probably right, most of the time. Or, maybe 89% right, 91% of the time.
The more I use AI, the more I see mistakes. I've noticed others see these same mistakes, correct them, then when queried say "Oh, it gets it right all of the time!". No, having to point out "you got this wrong, re-write that last bit" isn't "getting it right". And it's not that the code is wrong overtly, it's subtle. Not using a function correctly, not passing something through it should (and the default happens to just work -- during testing), and more. LLMs are great at subtle bugs.
So moving forward with this isolation you mention, ensures that maybe the guy in the company, the 'answer guy' about a thing, never actually appears. Maybe, he doesn't even get to know his own code well enough to be the answer guy.
And so when an LLM writes a weird routine, instead of being able to say "No, re-write that last bit", you'll have to shrug and say "the code looks fine, right?", because you, and the answer guy, if he exists, don't know the code well enough to see the subtle mistakes.
I’ve noticed that when I was implementing a build pipeline for a project. My changes introduced a runtime bug (I only tested that the thing was building), but then another developer broke the pipeline while fixing the runtime bug. While it was a failure of mine to introduce the runtime bug, I don’t think I can publish a fix for a bug without investigating why a bug appeared in the first place. Because code is all about assumptions and contracts, and if something that was working break, that means something else has changed and you need to be aware of it.
Dev/team member isolation, not a great environment to build
Gone are the days of mandatory corporate "synergy" and after-work bar gatherings to promote "team building."
AI is showing people in the tech industry that they're just interchangeable cogs. AI is bringing the offshored Indian work environment to Silicon Valley.
There are some improvements on coding and speed of developers, but more broadly in the enterprise AI is just producing a lot of slop that folks are getting fed up with.
AI content has a look and feel people sense immediately.
It’s amazing to see how quickly things shifted from “wow this is so cool, AI is going to change everything” to folks calling out “you lazy bum, this just looks like some slop you threw together with AI… let’s get some real thinking please.”
We are firmly heading into “trough of disillusionment” territory on the hype cycle.
Path dependencies between invention and utilization... are complicated and hard to fathom.
Our mental models of developments like the industrial revolution, literacy, printing or suchlike tend to be a lot more straightforward than how things play out in practice.
When a bottleneck is eliminated... you tend to shortly find the next bottleneck.
Meanwhile, there is an underlying assumption everyone seems to make that "more software, more value" is the basic reality. But... I'm skeptical.
To do lists, wishlists, buglists and road maps may be full of stuff but...
Visa or Salesforce have already exploited all their immediate "more software, more money" opportunities.
The ones in a position to easily leverage AI are upstarts. They're starting with nothing. No code. No features. No software. With Ai, presumably, they can produce more software and make value.
Also... I think overextended market rationalism leads people to see everything as an industrial revolution...which irl is much more of an exception.
The networked personal computing revokution put a pc one every desk. It digitized everything. Do we have way better administration for less cost? Not really. Most administrations have grown.
Did law fundamentally change dues to dugital efficiency? No. Not really.
If you work on a terrible enterprise codebase... it's very possible that software quality/quantity isn't actually that important to your organization.
> There is another pressure building underneath all this. AI usage will become more visibly metered. The current enterprise feeling of “everyone has access, don’t worry too much about the bill” will not hold forever, at least not in the form people are getting used to. ...
> I do not want to make this a cost panic story, that would be the least interesting way to think about “rented intelligence”. The question is not how to minimize token spend in the abstract, any more than the question of software delivery was ever how to minimize keystrokes.
If tokens were as cheap as keystrokes -that is, effectively free- then "How do we minimize token spend?" wouldn't be a question that anyone asks. It's because keystrokes are effectively free that you only ask "How do we minimize the number of keys pressed during the software development process?" if you're looking for an entertaining weekend project. If keystrokes cost as much per unit of work done as the -currently heavily subsidized- cost of tokens from OpenAI and Anthropic, you'd see a lot of focus on golfing everything under the sun all the damn time.
I think if these companies first adopted local models with fewer token outs and the learners got to watch the tokens get made, there'd be a lot more understanding.
> one team uses Copilot as autocomplete and calls it a day. Another team runs Claude Code in tight loops, with tests, reviews, and constant steering. A product owner suddenly prototypes real software instead of mocking screens in Figma. A senior engineer delegates a root-cause analysis to an agent and comes back to the valid solution in under an hour; this would’ve taken him two weeks without AI. A junior person produces polished code but has no idea which architectural assumptions got smuggled into the system. A support team quietly turns recurring tickets into workflow automation, because they know exactly where the work hurts and nobody in the Center of Excellence ever asked the right question.
This is just sales copy for various AI companies, laundered through an "influencer". It might as well be the CIA sending their article to be published in Daily Post Nigeria, so that the NYT can quote it as "sources".
The title is just clickbait. The rest of the content are fluffy bunnies and rainbows. It's all summed up as "continue to consume product, but remember to also do X". Sales copy + HBR MBA bait.
The closest thing to an honest, less-than-rosy example is the "junior person" who has no idea about the code they committed.
What about the "senior person" who has no idea about the code they committed? What about the CISO who doesn't understand that pasting proprietary documents willy nilly into the LLM's gaping maw might have legal/security/common sense implications, and that it is his job to set policy on such behavior? What about the middle manager who doesn't even try to retain the most experienced dev in the company because "we don't need the headcount anymore, now that Claude is so fast"? What about the company eating its own seed corn because every single junior position has been eliminated and there are no plans for the future anymore? What about the filesystem developer who fell in love with his chatbot girlfriend and is crashing out on Discord?
Oh wait, scratch that last one. He left the company and is crashing out on his own.
Indeed. Any developer who has used copilot knows you can't rely on it 100%
The post's head image immediately bothered me. Copilot's strength is not on patching SDLC but to speed up the catching of typos and minor oversights. If you're using it as an integral part of SDLC, it causes problems immediately. So why posit the strawman? Marketing.
In my large enterprise world, AI adoption hasn't made it outside of the development teams - only developers have access to Github Copilot.
Code takes 6-12 months to make it from commit to production. Development speed was never the bottleneck; it's all the other processes that take time: infra provisioning, testing, sign-offs, change management, deployment scheduling etc.
AI makes these post-development bottlenecks worse. Changes are now piling up at the door waiting to get on a release train.
Large enterprises need to learn how to ship software faster if they want to lock in ROI on their token spend. Unshipped code is a liability, not an asset.
> Development speed was never the bottleneck; it's all the other processes that take time: infra provisioning, testing, sign-offs, change management, deployment scheduling etc.
So much of Management (both mid and executive) still considers Software as if it were an assembly line; "We make software just like how Ford makes cars". Code as a product.
Which isn't to say that most software development isn't woefully inefficient, but the important bits aren't even considered. "The Work" is seen as being writing code, not the research that goes into knowing what code has to be written.
And for AI marketing, this is almost a videogame-esque weakspot. Microsoft proclaims "50% faster code!" and every management fool thinks "50% faster product; 50% faster money!"
> Large enterprises need to learn how to ship software faster if they want to lock in ROI on their token spend.
It's going to be a disaster once ROI is demanded. Right now everyone is fine with not measuring it; Investors are drunk on hype and nobody within the company actually wants to admit that properly measuring software development productivity is almost impossible.
But the hype won't last forever. Sooner or later investors will see the "$2M spend" and demand "$4M net profit", and that's not going to materialize.
Copilot and Claude won't be tackling the real bottlenecks. They're not going to dredge up decade old institutional knowledge, they won't figure out whether code looks bad because it is bad or because it solves a specific undocumented problem, they won't anticipate future uses.
Code just isn't the product. Not the real work. Really, if your codebase is in a healthy state, it's often a literally free output of the design and research processes. By the time you've refined "our procurement team finds the search hard to use" into a practical ticket, the React component for the appropriate search filters has basically already been written, writing up the code is just a short formality. Asking Copilot would turn a 10 minute job into a 5 minute job. Real impressive, were it not for the 6 hours of meetings and phone calls that went into it.
>Sooner or later investors will see the "$2M spend" and demand "$4M net profit", and that's not going to materialize.
I think this is probably going to happen at the same time that the providers start really jacking up token prices to extract all the value they can.
Almost certainly. Software firms are pretty bad at self-evaluation and they're profitable enough that Capitalism won't force them to do it either.
Right now the subscriptions are still in the range of reasonable business expenses, but pretty soon they'll have to jump and $200/month/seat subscriptions turning into $2000/month/seat subscriptions is going to get even very badly ran companies to re-evaluate.
It's worse than that. Developers themselves are drunk. They'll be cut off from tools right when they no longer understand the underlying code they're responsible for.
We're already here even. I know of a company that was doubling their Codex spend and hitting the cap week over week and finally they had enough and stopped increasing. Then they maxed out on credits and had a week of no Codex. A large percentage of the engineers loudly refused to work for the rest of the week. They were managing the Codex managing the codebase and were totally incapable of dealing with its output without it.
> Large enterprises need to learn how to ship software faster
They haven't even learned that "less code is better" yet, I wouldn't hold my breathe waiting for them to suddenly learn "more advanced" things like that before they learn the basics.
More code means more support and more maintenance. If your team is already overloaded or if it's going to be reduced because of AI, things are going to get tough.
My bigger concern is actually that, if a company isn't careful, the bloat (complexity, amount of code, other artifacts etc.) will just balloon and largely cancel out any gains.
Feedback is often only considered once something is already on fire (financially, functionally, or literally).
That’s the game plan for the AI companies: once companies have massive codebases of critical AI generated code and a skeleton crew of prompt engineers they’re going to be locked in to the AI product to develop anything new.
They’re not even selling shovels, they’re selling subscriptions for shovels.
Yep.
I would argue that any sufficiently large system reaches a point where more code is in fact the opposite of what it needs.
Nutrition and calories are only useful up-to a point and then we have diminishing and later on negative returns.
Even-tough it is not the best analogy because we are describing two different system, it helps put a mental model around the fact that churning more is often less.
Side Note: A got a feedback from a customer today that while our documentation is complete and very detailed, they find it to be too overwhelming. It turns out having a few bullet points to get the idea across it better than 5 page document. Now it is obvious.
Seeing this too. Machines are great at pumping out content.
Tl;dr's, quick references / QuickStarts / cheat sheets and FAQs are also some things they're great at generating.
Like in that comic strip[0], where one side uses AI to inflate his bullet points to make it look better and have more content in the email, then other side uses AI to summarize it to bullet points.
[0] https://marketoonist.com/2023/03/ai-written-ai-read.html
This happens a probably a billion times a day. I shudder to think of the cost of it. Especially after knowing how LLMs aren't great at summarizing nor are they flawless at expanding information
Its trickling in slowly to non dev teams. Im consulting with a large fin-tech on Enterprise-wide AI adoption at the moment, and I'm seeing the same parallels though: you have power users that reap disproportionate rewards from it, and then you have the "tab complete" crowd that copy paste things into the prompt.
This was a huge motivation behind me trying to design an AI automation platform that comes "batteries included". I also think a lot of orgs, even engineering orgs do not know how to configure basic things like Claude plugin repositories into their installs.
Same here, but instead of the developers having access to Github Copilot, some selected few devs have access to some internal proxy, that goes to Amazon bedrock, where we have "400 request" per week to Claude Sonet :))))
What's old is new again [0][1][2]
The Theory of Constraints - AI Era
[0] https://en.wikipedia.org/wiki/Theory_of_constraints [1] https://www.goodreads.com/book/show/113934.The_Goal [2] https://www.goodreads.com/en/book/show/17255186-the-phoenix-...
But then how will all of the know-nothing management types get their fingers in the pie?
"release train" ... "learn how to ship software faster"
SAFe is poison.
It's good to know your experience mirrors mine. Developers are moving faster, but the rest of the organization is holding them back because processes and decisions still rely on other parts of the org. Has anyone else observed the same?
Organizations "born in AI" appear to buck this trend for obvious reasons (no legacy org. to deal with). My two cents.
We have a "two timelines" approach going on and I'm curious if others are seeing the same. There are official "Engineering-supported" services. There development speed is not the bottleneck. Engineers demand clean requirements that take forever to show up. Testing and deployment scheduling also take forever post-development. Important people are so fed-up that they've started hiring people to vibe code and develop services without going through Engineering. Code is shipped much faster here but technical debt accumulates rapidly. The important people are beginning to hire Data Scientists who sit outside of the Tech org to manage the AI code. It's all very interesting.
Especially when it waits a month and all the effort is either irrelevant or incompatible with latest changes that finally got through. So much token wastage to top off the recent chaos. Hopefully it improves just as fast as it materialised.
Which is why there's currently a gold rush of "Enterprise AI" startups which implement / offer agents to enterprise businesses.
Do you work in my company? :)
I kept saying this since Day 1 of llms - even 99% of development reduction means almost nothing in our company in speed of delivery of whole projects. And we are introducing generator of code that semi-randomly has poor performance when they have perf bottlenecks and fills the codebase with... sometimes questionable solutions. Sure, one has to check the results all the time, but then time is spent on code reviews, not much less than actual (way more fulfilling, rewarding and career-boosting) development.
Now I understand there are many more scenarios where gains are more realistic and sometimes huge, but it certainly ain't my current working place. So I use it sparingly to not atrophy my skillset but work estimates are so far the same and nobody questions that.
The Mythical Man Month should really be a mandatory reading for anyone working in software… and I don’t mean reading a Claude summary
Sounds like the typical ServiceNow paralysis. The “Mother May I” model.
The post hits the nail on the head with the messy middle. There is simply no motivation to develop this sort of intelligence loop as a dev who has their own responsibilities which their job depend on. Management can ask as nicely as they want, but I’m not going to selflessly share my productivity gains with the broader company for free. I might share a tool if it’s useful. All the learning of how to wrangle AI or set up agents is better kept to myself if there is no recognition for sharing.
My company set up a “prompt of the week” award and brown-bag sessions to help spread adoption. We also have teams meant to develop these workflows. Clearly, they set these events up to play it off as their own productivity. Without a real (read “monetary”) incentive or job security, the risk and cost of spreading the knowledge falls squarely on the developer.
It kinda racks my brain how a lot of people don't think this way. For example, way before the current state of AI, I wrote my own CLI to make aspects of my job easier and easier to write scripts to automate; some colleagues have noticed my tool and said I should share it, and my diplomatically worded answer is no. I don't share it with anyone because of the negative return in both supporting it and everyone else being able to be as productive as I am. Moreover, leadership will not recognize my ingenuity as an asset, hence no added job security. No way am I going to help my company out of the goodness of my heart to be potentially let go anyway in the near future.
If developers are worried about their jobs with the way the market currently is, they should treat their personal workflows as trade secrets. My example was not specific to AI, but it applies just as much to AI workflows. In a worker's market, it was sometimes fun to share that kind of knowledge with an organization. In an employer's market, they can pay me if they want access to my personal choices.
I don't think this way because I like to collaborate. If a colleague can benefit from a tool I made I'm proud to save them time. I also think your attitude doesn't pass the golden rule: would you like to work on a team full of people like you?
I sadly have to agree with this. In a collaborative "give and take" world sharing is good. In an environment that takes only, all you have left is your own intellectual property. It is your own most vital asset worth protecting. Shouldn't be like this, but it is.
What are your thoughts on open source? Seems like the same problem writ large
I love open source, but you are correct in identifying it as a very similar problem, though it's more a problem with software licensing than source code being publically available. Usually the argument is made that FOSS ends up as free labor, which is true in a lot of ways, but I see FOSS devaluing software as a whole. When software is open and libre, that sends a psychological signal that the software isn't that valuable. There would still be FOSS in a world where even projects like React charged a licensing fee to big organizations, but in that case there would be more choice between YOLO with free software or paying for quality software; as token expenses have proven, many companies could absolutely pay for the latter many times over. In terms of specifically open source, however, companies get a bit of a loophole in that their own employees (or LLM of choice) can be "inspired* by the source code and clone aspects of commercial software. This has the effect of devaluing the skill of individual software engineers to being glorified script kiddies.
It sucks to treat the workplace as adversarial, but we unfortunately have to as long as companies have the zero-sum mindset of "wow, everyone is so productive and we're achieving so much, why do we have so many people again?"
And I'm not a "at work we're a family!" guy, but I wish we could just be excellent at our jobs and share it with each other without worrying if I'm digging my own grave.
>but I’m not going to selflessly share my productivity gains with the broader company for free.
If your employer is expecting that you selflessly share your time for free, you’re getting fucked. Most people are paid to do their job. They are, of course, then expected to work for their employers while on the clock.
May not have been clear. My job is not AI development. I have features to deliver. The ask from employer is to add the AI knowledge sharing on top of it. They don’t pay for that. When layoffs come, it wouldn’t save me from missed deliverables.
I refuse to use LLMs and don't have a job, so I'm just some guy.
What I find strange about this is that in 2020 nobody would be this openly cynical and selfish about, say, good Python idioms, a useful emacs configuration, git shortcuts, etc. This attitude of "your job is to deliver value for the customer, anything else is a distraction, and if you share your hard-earned value-delivery techniques with others then you are a sucker" - this is new, and very disconcerting.
I understand there's not much we can do to stop the cyberpunk dystopia, but do we have to leap in head-first?
As a 3 year retired Systems Analyst I feel bad for my younger colleagues. In 2023 I was one of the first in my team to use AI to untangle some legacy code that did something mission critical with Perl and whose original author had long ago left and apparently didn't understand anything about actually commenting code or documentation. We were all in awe of this new technology that got us out of a bind. But more and more it looks less like a tool that is available to you instead of something that is being _done_ to you. Nobody asked for this.
At what point is inspiration and thought just devalued and worthless in the name of doing things instantly. The work has no soul.
> Where is the ROI for the 2 mio € we paid Anthropic last year?
The CEO has a youtube style platinum token plaque for their office.
AI by itself isn’t that useful. An agent forgets and makes enough mistakes that you have to check all its work, which can be net productivity negative.
It really comes into its own when you treat it as a tool that can build other tools. For example, having it build tools that force it to keep going until its work reaches a certain quality, or runs compliance checks on its outputs and tells it where it needs to fix things. Then and only then, can you trust its work.
Right now most current roles & workflows are designed around wrangling the tools you’re given to do a certain job. In that regime AI can only slide in at the edges.
It's just ass to work in this area now. In the company I work, the bosses let everyone use it, even non-developers. I really want to quit and work in another area but unfortunately where I live a beginning salary can't pay a rent and I'm getting old
Great article. The part that stood out to me is the shift in how organizations define work.
In the old model, performance and OKRs were anchored in disciplines, job titles, and role-specific expectations. In the AI era, those boundaries are starting to collapse. The deeper issue is psychological and organizational: people are constantly negotiating the line between “this is my job” and “this is not my responsibility.”
That creates a key adoption problem: what is the upside of being visibly recognized as an expert AI user? If people learn that I can do faster, better, and more cross-functional work, why would I reveal that unless the company also creates a clear system for recognition, compensation, or career growth?
Eventually whoever is responsible to fix prod incidents and maintain has the ownership. And I agree that’s pretty messy in a world where agents are crossing those boundaries. Will the AI engineer with their horde of agents be responsible to keep everything running? I really doubt so, but we will see
If they create a system to compensate expert AI users wouldn't that career have a problem in that anyone (enticed by the new careers existence and) integrating their advice on any company particulars with a (weeks) more modern approach is basically putting them in the role of domain expert being eliminated.
The part I push back on is the idea that expertise is easy to learn in just a few weeks.
Take Andrej Karpathy as an example. Even if I knew exactly what tools he uses and what his workflow looks like, I still would not be able to produce anything close to what he can produce in a few weeks. And he is not standing still either—he is evolving at the same time.
A lot of real expertise is not in the visible/system-able workflow. It is in someone’s experience, taste, judgment, and wisdom. You can copy the artifact, but you cannot easily copy the thinking behind it: the principles, the decision-making, and the ability to apply those principles across many different/subtle situations.
But I do agree with the concern behind the argument. People may worry that sharing what they know could weaken their own position. And the more uncomfortable question is about peers: if someone’s role can be “retired” because others absorbed their knowledge and skills, then it is hard not to ask, “Am I next?”
Well, that's fine until your teammate does all of those things by default and gaps show up between them and the rest of the team.
The problem of failed learning existed long before AI. The information comes from multiple sources like Slack conversations, customer support cases, and sales calls, but there is no process in place for filtering out patterns. Speed of development wasn't the issue. Figuring out which features should be developed is the issue. With AI, it takes three days to deliver new features instead of three weeks, but you still may spend those three days on something not worth building.
The loop closes only when customers' insights have a proper place to go, where duplicates can be filtered out and priorities set. This isn't happening for many teams, and the acceleration of code generation will only make things worse.
This is a great article. It helps you realize that the feedback loop is the goal but it won't just happen and traditional methodologies don't really support it. Has anyone here found a good way that promotes teams in a company to focus on the loop instead of productivity hack?
TIL my $company has used the same consultants as this guy. We started with Training and Champions, to Leadership/Lab/Crowd with a CoE/brown bags.
We are definitely struggling with the same issues author describes, but even worse the leaders down at the Crowd level have some perverse need to achieve reuse across their teams, rather than letting their Crowd experiment. One team does something interesting, we must stop and get that thing out to all teams in that group, so everyone “benefits”. This is a scarcity mindset, which made sense pre-AI where code was costly and ideas were more valuable.
At the same time, everyone not only has to do their work, they need to be 25% more efficient from AI (new KPIs), and so their own learnings slow to a halt, and the team with the cool idea has to give presentations instead of hacking.
On the first part of the article, I believe it describes how individual productivity gains do not seem to translate to business / larger scale productivity. I think this is expected; individual developer productivity, code volume, LOC/day never was a valuable metric on a company scale. Number of delivered features might be one, but ultimately, revenue and customer growth etc are.
While I do believe higher developer productivity can lead to faster reacting to market forces or more A/B testing, that won't necessarily lead to a successful business. Because ultimately it rarely is the software that's the issue there.
It’s been helpful for me to look at the promise of AI by comparing with the dotcom boom. Lots of similarities.
But the internet was a simpler concept for businesses. Basically it was you can now sell to people from their computers. AI’s promise is what? It can approximate reasoning about things? This is much more challenging implementation puzzle to truly solve.
I don’t know that I’ve seen anything of real substance outside coding tasks yet.
My biggest gripe with language models is that technical and conceptual discussions which used to be led organically (with people having to think about what others wrote and decide what to reply themselves) now turned into AI slop avalanches with participants just copy pasting obviously generated text into the discussion. And those texts are always very long and super weird to respond to because they usually are overall correct enough so you can't just reject them but are flawed all over the place, missing the point, lacking depth where it would be important, skipping over important steps. This is a huge time waste. Funnily, many people have no idea how obvious it is that their texts are generated and rubbish at that.
The Hub captures decisions humans made. It can't see the ones they didn't, which is most of what AI ships. You instrument the deliberate half and inherit the undeliberate one.
Once people try to increase quality instead of speed they will see how LLMs are powerful. Everything else is just sales pitch by Nvidia and friends.
Even if LLMs write more buggy code they can still bring up software quality in the short to medium term by allowing you to clear out a lot of the backlog of bugs and UI issues that are known but never had enough priority to be fixed
Debugging and developing first fixes is also one of the spaces where current LLMs are the biggest force multipliers. Especially if you have reproduction cases the LLM can test on its own
But long-term it might look very different as more and more of the code becomes LLM written
> "Where is the ROI for the 2 mio € we paid Anthropic last year?"
The bias in the assumptions here is absolutely bonkers.
Problem: GenAI is not generating any visible return on investment.
"Solution": rearrange your entire development organization around the technology and start inventing new tooling.
What's entirely obvious is that the point of such articles is not the stuff they purportedly discuss, but the normalization of assumptions those discussions are based on.
LLMs can't fail, they can only be failed ... by you!
One more point I noticed: since AI adoption is being promoted by companies, collaboration between developers could suffer. Why wait for a more experienced developer to have the time to explain some aspect of the codebase to you (and at the same time confess your ignorance), when AI can do it right away in a competent-sounding way (and most of the time it will probably be right, too)?
That already happens here. I am old dev who was the goto guy for people with certain business and technical questions. Not anymore (which is part good, as I'm interrupted much less, and part bad, as sometimes they regard the wrong answer as truth).
You could vibe yourself up an AMA tool where people can submit questions, an agent goes to work on them, then the question and agent answer sit in a queue waiting for you to provide a review and give your weigh-in.
Coworkers are demonstrating that they value immediacy and possibly also some combination of embarrassment about their question or social anxiety about asking someone else, over accuracy. Not only does that still require the coworker to review the question, and also lose immediacy vs an LLM, but it might even take longer before rogerthis gets around to reviewing the queue.
I'm pretty sure this is the best idea I've ever heard of for this technology. You should build that tool and it should become mandatory throughout the tech world.
Can we get some enabling legislation? A UN resolution perhaps?
Despite the snark I’ll engage.
The “get an immediate agent answer then a human expert’s fast-follow” is I think a great idea for many domains - imagine if you could get legal advice this way; the agent will have already explained the basics and the human expert just has to provide corrections - way less typing by humans.
Also, the corrections are now documented and could become future grounding for the agent.
Absolutely zero snark. I'm serious. (About the serious part; obviously not the joke part.)
> a great idea for many domains
I completely agree. This is a great idea. If you don't do something with it I'm stealing it. ;-)
This seems like more work (and much more cognitive load) than simply being available to answer questions as they arise. Nothing is more exhausting than fact-checking LLM slop.
I think you hit the nail on the head, it's probably right, most of the time. Or, maybe 89% right, 91% of the time.
The more I use AI, the more I see mistakes. I've noticed others see these same mistakes, correct them, then when queried say "Oh, it gets it right all of the time!". No, having to point out "you got this wrong, re-write that last bit" isn't "getting it right". And it's not that the code is wrong overtly, it's subtle. Not using a function correctly, not passing something through it should (and the default happens to just work -- during testing), and more. LLMs are great at subtle bugs.
So moving forward with this isolation you mention, ensures that maybe the guy in the company, the 'answer guy' about a thing, never actually appears. Maybe, he doesn't even get to know his own code well enough to be the answer guy.
And so when an LLM writes a weird routine, instead of being able to say "No, re-write that last bit", you'll have to shrug and say "the code looks fine, right?", because you, and the answer guy, if he exists, don't know the code well enough to see the subtle mistakes.
I’ve noticed that when I was implementing a build pipeline for a project. My changes introduced a runtime bug (I only tested that the thing was building), but then another developer broke the pipeline while fixing the runtime bug. While it was a failure of mine to introduce the runtime bug, I don’t think I can publish a fix for a bug without investigating why a bug appeared in the first place. Because code is all about assumptions and contracts, and if something that was working break, that means something else has changed and you need to be aware of it.
In a large codebase it‘s probably next to impossible to get people who fully understand the code to explain it to you with unerring accuracy.
AI can get a pretty good picture, near instantly, whenever you need it.
It’s not just competent-sounding, it is reasonably competent, and certainly very useful for tasks like that.
That's a valid point. Dev/team member isolation, not a great environment to build
Dev/team member isolation, not a great environment to build
Gone are the days of mandatory corporate "synergy" and after-work bar gatherings to promote "team building."
AI is showing people in the tech industry that they're just interchangeable cogs. AI is bringing the offshored Indian work environment to Silicon Valley.
There are some improvements on coding and speed of developers, but more broadly in the enterprise AI is just producing a lot of slop that folks are getting fed up with.
AI content has a look and feel people sense immediately.
It’s amazing to see how quickly things shifted from “wow this is so cool, AI is going to change everything” to folks calling out “you lazy bum, this just looks like some slop you threw together with AI… let’s get some real thinking please.”
We are firmly heading into “trough of disillusionment” territory on the hype cycle.
Path dependencies between invention and utilization... are complicated and hard to fathom.
Our mental models of developments like the industrial revolution, literacy, printing or suchlike tend to be a lot more straightforward than how things play out in practice.
When a bottleneck is eliminated... you tend to shortly find the next bottleneck.
Meanwhile, there is an underlying assumption everyone seems to make that "more software, more value" is the basic reality. But... I'm skeptical.
To do lists, wishlists, buglists and road maps may be full of stuff but...
Visa or Salesforce have already exploited all their immediate "more software, more money" opportunities.
The ones in a position to easily leverage AI are upstarts. They're starting with nothing. No code. No features. No software. With Ai, presumably, they can produce more software and make value.
Also... I think overextended market rationalism leads people to see everything as an industrial revolution...which irl is much more of an exception.
The networked personal computing revokution put a pc one every desk. It digitized everything. Do we have way better administration for less cost? Not really. Most administrations have grown.
Did law fundamentally change dues to dugital efficiency? No. Not really.
If you work on a terrible enterprise codebase... it's very possible that software quality/quantity isn't actually that important to your organization.
> There is another pressure building underneath all this. AI usage will become more visibly metered. The current enterprise feeling of “everyone has access, don’t worry too much about the bill” will not hold forever, at least not in the form people are getting used to. ...
> I do not want to make this a cost panic story, that would be the least interesting way to think about “rented intelligence”. The question is not how to minimize token spend in the abstract, any more than the question of software delivery was ever how to minimize keystrokes.
If tokens were as cheap as keystrokes -that is, effectively free- then "How do we minimize token spend?" wouldn't be a question that anyone asks. It's because keystrokes are effectively free that you only ask "How do we minimize the number of keys pressed during the software development process?" if you're looking for an entertaining weekend project. If keystrokes cost as much per unit of work done as the -currently heavily subsidized- cost of tokens from OpenAI and Anthropic, you'd see a lot of focus on golfing everything under the sun all the damn time.
I think if these companies first adopted local models with fewer token outs and the learners got to watch the tokens get made, there'd be a lot more understanding.
> one team uses Copilot as autocomplete and calls it a day. Another team runs Claude Code in tight loops, with tests, reviews, and constant steering. A product owner suddenly prototypes real software instead of mocking screens in Figma. A senior engineer delegates a root-cause analysis to an agent and comes back to the valid solution in under an hour; this would’ve taken him two weeks without AI. A junior person produces polished code but has no idea which architectural assumptions got smuggled into the system. A support team quietly turns recurring tickets into workflow automation, because they know exactly where the work hurts and nobody in the Center of Excellence ever asked the right question.
This is just sales copy for various AI companies, laundered through an "influencer". It might as well be the CIA sending their article to be published in Daily Post Nigeria, so that the NYT can quote it as "sources".
The title is just clickbait. The rest of the content are fluffy bunnies and rainbows. It's all summed up as "continue to consume product, but remember to also do X". Sales copy + HBR MBA bait.
The closest thing to an honest, less-than-rosy example is the "junior person" who has no idea about the code they committed.
What about the "senior person" who has no idea about the code they committed? What about the CISO who doesn't understand that pasting proprietary documents willy nilly into the LLM's gaping maw might have legal/security/common sense implications, and that it is his job to set policy on such behavior? What about the middle manager who doesn't even try to retain the most experienced dev in the company because "we don't need the headcount anymore, now that Claude is so fast"? What about the company eating its own seed corn because every single junior position has been eliminated and there are no plans for the future anymore? What about the filesystem developer who fell in love with his chatbot girlfriend and is crashing out on Discord?
Oh wait, scratch that last one. He left the company and is crashing out on his own.
Carry on, then.
Indeed. Any developer who has used copilot knows you can't rely on it 100% The post's head image immediately bothered me. Copilot's strength is not on patching SDLC but to speed up the catching of typos and minor oversights. If you're using it as an integral part of SDLC, it causes problems immediately. So why posit the strawman? Marketing.