I find the contrast between two narratives around technology use so fascinating:
1. We advocate automation because people like Brenda are error-prone and machines are perfect.
2. We disavow AI because people like Brenda are perfect and the machine is error-prone.
These aren't contradictions because we only advocate for automation in limited contexts: when the task is understandable, the execution is reliable, the process is observable, and the endeavour tedious. The complexity of the task isn't a factor - it's complex to generate correct machine code, but we trust compilers to do it all the time.
In a nutshell, we seem to be fine with automation if we can have a mental model of what it does and how it does it in a way that saves humans effort.
So, then - why don't people embrace AI with thinking mode as an acceptable form of automation? Can't the C-suite in this case follow its thought process and step in when it messes up?
I think people still find AI repugnant in that case. There's still a sense of "I don't know why you did this and it scares me", despite the debuggability, and it comes from the autonomy without guardrails. People want to be able to stop bad things before they happen, but with AI you often only seem to do so after the fact.
Narrow AI, AI with guardrails, AI with multiple safety redundancies - these don't elicit the same reaction. They seem to be valid, acceptable forms of automation. Perhaps that's what the ecosystem will eventually tend to, hopefully.
It's not as black-and-white as "Brenda good, AI bad". It's much more nuanced than this.
When it comes to (traditional) coding, for the most part, when I program a function to do X, every single time I run that function from now until the heat death of the sun, it will always produce Y. Forever! When it does, we understand why, and when it doesn't, we also can understand why it didn't!
When I use AI to perform X, every single time I run that AI from now until the heat death of the sun it will maybe produce Y. Forever! When it does, we don't understand why, and when it doesn't, we also don't understand why!
We know that Brenda might screw up sometimes but she doesn't run at the speed of light, isn't able to produce a thousand lines of Excel Macro in 3 seconds, doesn't hallucinate (well, let's hope she doesn't), can follow instructions etc. If she does make a mistake, we can find it, fix it, ask her what happened etc. before the damage is too great.
In short: when AI does anything at all, we only have, at best, a rough approximation of why it did it. With Brenda, it only takes a couple of questions to figure it out!
Before anyone says I'm against AI, I love it and am neck-deep in it all day when programming (not vibe-coding!) so I have a full understanding of what I'm getting myself into but I also know its limitations!
> When I use AI to perform X, every single time I run that AI from now until the heat death of the sun it will maybe produce Y. Forever! When it does, we don't understand why, and when it doesn't, we also don't understand why!
To make this even worse, it may even produce Y just enough times to make it seem reliable and then it is unleashed without supervision, running thousands or millions of times, wrecking havoc producing Z in a large number of places.
Exactly. Fundamentally, I want my computer's computations to be deterministic, not probabilistic. And, I don't want the results to arbitrarily change because some company 1,500 miles away from me up-and-decided to "train some new model" or whatever it is they do.
A computer program should deliver reliable, consistent output if it is consistently given the same input. If I wanted inconsistency and unreliability, I'd ask a human to do it.
Brenda also needs to put food on the table. If Brenda is 'careless' and messes up we can fire Brenda, because of this Brenda tries not to be carless (also other emotions). However I cannot deprive an AI model of pay because it messed up;
It is it even worse in a sense that. It is not either. It is not neither. It is not even both as variations of Branda exist throughout the multiverse in all shapes and forms including one that can troubleshoot her own formulas with ease and accuracy.
But you are absolutely right about one thing. Brenda can be asked and, depending on her experience, she might give you a good idea of what might have happened. LLMs still seem to not have that 'feature'.
Machine reliability does the same thing the same way every time. If there's an error on some input, it will always make that error on that input, and somebody can investigate it and fix it, and then it will never make that error again.
Human reliability does the job even when there are weird variances or things nobody bothered to check for. If the printer runs out of paper, the human goes to the supply cabinet and gets out paper and if there is no paper the human decides whether to run out right now and buy more paper or postpone the print job until tomorrow; possibly they decide that the printing doesn't need to be done at all, or they go downstairs and use a different printer... Humans make errors but they fix them.
LLMs are not machine reliable and not human reliable.
> . If the printer runs out of paper, the human goes to the supply cabinet and gets out paper and if there is no paper the human decides
Sure, these humans exists, but the others, that I happen to encounter every day unfortunately, are the ones that go into broken mode immediately when something is unexpected. Today I ordered something they ran out of and the girl behind the counter just stared in The Deep not having a clue what to do now. Do or say. Or yesterday at dinner, the PoS (on batteries) ran out of power when I tried to pay for dinner. The guy just walked off and went outside for a smoke. I stood there with waiting to pay. The owner apologized and fixed it after a while but I am saying, the employee who runs out of paper and then finds and puts more paper in is not very ... common... In the real world.
I was brought up on the refrain of "aren't computers silly, they do exactly what you tell them to do to the letter, even if it's not what you meant". That had its roots in computers mostly being programmable BASIC machines.
Then came the apps and notifications, and we had to caveat "... when you're writing programs". Which is a diminishing part of the computer experience.
And now we have to append "... unless you're using AI tools".
The distinction is clear to technical people. But it seems like an increasingly niche and alien thing from the broader societal perspective.
I think we need a new refrain, because with the AI stuff it increasingly seems "computers do what they want, don't even get it right, but pretend that they did."
We have absolutely descended, and rapidly, into “computers do whatever the fuck they want and there’s nothing you can do about it” in the past 5 years, and gen AI is only half of the problem.
The other half comes from how incredibly opinionated and controlling the tech giants have become. Microsoft doesn’t even ALLOW consent on windows (yes or maybe later), Google is doing all it can to turn the entire internet into a chrome-only experience, and Apple has to be fought for an entire decade to allow users to place app icons wherever they want on their Home Screen.
There is no question that the overly explicit quirky paradigm of the past was better for almost everyone. It allowed for user control and user expression, but apparently those concepts are bad for the wallet of big tech so they have to go. Generative AI is just the latest biggest nail in the coffin.
We have come a LONG way from the "Where do you want to go today?" of the 90s. Now, it's "You're going where we tell you that you can go, whether you like it or not!"
If you think programs are predictable, I have a bridge to sell you.
The only relevant metric here is how often each thing makes mistakes. Programs are the most reliable, though far from 100%, humans are much less than that, and LLMs are around the level of humans, depending on the humans and the LLM.
Programs can be very close to 100% reliable when made well.
In my life, I've never seen `sort` produce output that wasn't properly sorted. I've never seen a calculator come up with the wrong answer when adding two numbers. I have seen filesystems fail to produce the exact same data that was previously written, but this is something that happens once in a blue moon, and the process is done probably millions of times a day on my computers.
There are bugs, but bugs can be reduced to a very low level with time, effort, and motivation. And technically, most bugs are predictable in theory, they just aren't known ahead of time. There are hardware issues, but those are usually extremely rare.
Nothing is 100% predictable, but software can get to a point that's almost indistinguishable.
> And technically, most bugs are predictable in theory, they just aren't known ahead of time.
When we're talking about reliability, it doesn't matter whether a thing can be reliable in theory, it matters whether it's reliable in practice. Software is unreliable, humans are unreliable, LLMs are unreliable. To claim otherwise is just wishful thinking.
RE: the calculator screenshot - it's still reliable because the same answer will be produced for the same inputs every time. And the behavior, though possibly confusing to the end user at times, is based on choices made in the design of the system (floating point vs integer representations, rounding/truncating behavior, etc). It's reliable deterministic logic all the way down.
There are other narratives going on in the background though both called out by the article and implied, including:
Brenda probably has annual refresher courses on GAAP, while her exec and the AI don't.
Automation is expected to be deterministic. The outputs can be validated for a given input. If you need some automation more than Excel functions, writing a power automate flow or recording an office script is sufficient & reliable as automation while being cheaper than AI. Can you validate AI as deterministic? This is important for accounting. Maybe you want some thinking around how to optimize a business process, but not for following them.
Brenda as the human-in-the-loop using AI will be much more able than her exec. Will Brenda + AI be better (or more valuable considering the cost of AI) than Brenda alone? That's the real question, I suppose.
AI in many aspects of our life is simply not good right now. For a lot of applications, AI is perpetually just a few years away from being as useful as you describe. If we get there, great.
"Thinking mode" only provides the illusion of debuggability. It improves performance by generating more tokens which hopefully steer the context towards one more likely to produce the desired response, but the tokens it generates do not reflect any sort of internal state or "reasoning chain" as we understand it in human cognition. They are still just stochastic spew. You have no more insight into why the model generates the particular "reasoning steps" it does than you do into any other output, and neither do you have insight into why the reasoning steps lead to whatever conclusion it comes to. The model is much less constrained by the "reasoning" than we would intuit for a human - it's entirely capable of generating an elaborate and plausible reasoning chain which it then completely ignores in favor of some invisible built-in bias.
I'm always amused when I see comments saying, "I asked it why it produced that answer, and it said...." Sorry, you've badly misunderstood how these things work. It's not analyzing how it got to that answer. It's producing what it "thinks" the response to that question should look like.
I don't understand why generative AI gets a pass at constantly being wrong, but an average worker would be fired if they performed the same way. If a manager needed to constantly correct you or double check your work, you'd be out. Why are we lowering the bar for generative AI?
My kneejerk reaction is the sunk cost fallacy (AI is expensive), but I'm pretty sure it's actually because businesses have spent the last couple of decades doing absolutely everything they can to automate as many humans out of the workforce as possible.
I've been trying to open my mind and "give AI a chance" lately. I spent all day yesterday struggling with Claude Code's utter incompetence. It behaves worse than any junior engineer I've ever worked with:
- It says it's done when its code does not even work, sometimes when it does not even compile.
- When asked to fix a bug, it confidently declares victory without actually having fixed the bug.
- It gets into this mode where, when it doesn't know what to do, it just tries random things over and over, each time confidently telling me "Perfect! I found the error!" and then waiting for the inevitable response from me: "No, you didn't. Revert that change".
- Only when you give it explicit, detailed commands, "modify fade_output to be -90," will it actually produce decent results, but by the time I get to that level of detail, I might as well be writing the code myself.
To top it off, unlike the junior engineer, Claude never learns from its mistakes. It makes the same ones over and over and over, even if you include "don't make XYZ mistake" in the prompt. If I were an eng manager, Claude would be on a PIP.
Recently I've used Claude Code to build a couple TUIs that I've wanted for a long time but couldn't justify the time investment to write myself.
My experience is that I think of a new feature I want, I take a minute or so to explain it to Claude, press enter, and go off and do something else. When I come back in a few minutes, the desired feature has been implemented correctly with reasonable design choices. I'm not saying this happens most of the time, I'm saying it happens every time. Claude makes mistakes but corrects them before coming to rest. (Often my taste will differ from Claude's slightly, so I'll ask for some tweaks, but that's it.)
The takeaway I'm suggesting is that not everyone has the same experience when it comes to getting useful results from Claude. Presumably it depends on what you're asking for, how you ask, the size of the codebase, how the context is structured, etc.
> Learning to use Claude Code (and similar coding agents) effectively takes quite a lot of work.
I've tried to put in the work. I can even get it working well for a while. But then all of a sudden it is like the model suffers a massive blow to the head and can't produce anything coherent anymore. Then it is back to the drawing board, trying all over again.
It is exhausting. The promise of what it could be is really tempting fruit, but I am at the point that I can't find the value. The cost of my time to put in the work is not being multiplied in return.
> Did you have it creating and running automated tests as it worked?
Yes. I work in a professional capacity. This is a necessity regardless of who (or what) is producing the product.
You don't have a human to manage. The relationship is completely one-sided, you can query a generative AI at 3 in the morning on new years eve. This entity has no emotions to manage and no own interests.
There's cost.
There's an implicit promise of improvement over time.
There's an the domain of expertise being inhumanly wide. You can ask about cookies right now, then about XII century France, then about biochemistry.
The fact that an average worker would be fired if they perform the same way is what the human actually competes with. They have responsibility, which is not something AI can offer. If it was the case that, say, Anthropic, actually signed contracts stating that they are liable for any mistakes, then humans would be absolutely toast.
It’s much cheaper than Brenda (superficially, at least). I’m not sure a worker that costs a few dollars a day would be fired, especially given the occasional brilliance they exhibit.
How much compute costs is it for the AI to do Brenda's job? Not total AI spend, but the fraction that replaced Brenda. That's why they'd fire a human but keep using the AI.
> Because it doesn’t have to be as accurate as a human to be a helpful tool.
I disagree. If something can't be as accurate as a (good) human, then it's useless to me. I'll just ask the human instead, because I know that the human is going to be worth listening to.
So now you don't have to pay people to do their actual work, you assign the work to ML ("AI") and then pay the people to check what it generated. That's a very different task, menial and boring, but if it produces more value for the same amount of input money, then it's economical to do so.
And since checking the output is often a lower skilled job, you can even pay the people less, pocketing more as an owner.
It’s not even greater trust. It’s just passive trust. The thing is, Brenda is her own QA department. Every good Brenda is precisely good because she checks her own work before shipping it. AI does not do this. It doesn’t even fully understand the problem/question sometimes yet provides a smart definitive sounding answer. It’s like the doctor on The Simpson’s, if you can’t tell he’s a quack, you probably would follow his medical advice.
That’s definitely the hype. But I don’t know if I agree. I’m essentially a Brenda in my corporate finance job and so far have struggled to find any useful scenarios to use AI for.
I thought once this can build me a Gantt chart because that’s an annoying task in excel. I had the data. When I asked it to help me, “I can’t do that but I can summarize your data”. Not helpful.
Any type of analysis is exactly what I don’t want to trust it with. But I could use help actually building things, which it wouldn’t do.
Also, Brenda’s are usually fast. Having them use a tool like AI that can’t be fully trusted just slows them down. So IMO, we haven’t proven the AI variable in your equation is actually a positive value.
I can't speak to finance. In programming, it can be useful but it takes some time and effort to find where it works well.
I have had no success in using it to create production code. It's just not good enough. It tends to pattern-match the problem in somewhat broad strokes and produce something that looks good but collapses if you dig into it. It might work great for CRUD apps but my work is a lot more fiddly than that.
I've had good success in using it to create one-off helper scripts to analyze data or test things. For code that doesn't have to be good and doesn't have to stand the test of time, it can do alright.
I've had great success in having it do relatively simple analysis on large amounts of code. I see a bug that involves X, and I know that it's happening in Y. There's no immediately obvious connection between X and Y. I can dig into the codebase and trace the connection. Or I can ask the machine to do it. The latter is a hundred times faster.
The key is finding things where it can produce useful results and you can verify them quickly. If it says X and Y are connected by such-and-such path and here's how that triggers the bug, I can go look at the stuff and see if that's actually true. If it is, I've saved a lot of time. If it isn't, no big loss. If I ask it to make some one-off data analysis script, I can evaluate the script and spot-check the results and have some confidence. If I ask it to modify some complicated multithreaded code, it's not likely to get it right, and the effort it takes to evaluate its output is way too much for it to be worthwhile.
I'd agree. Programming is a solid use case for AI. Programming is a part of my job, and hobby too, and that's the main place where I've seen some value with it. It still is not living up to the hype but for simple things, like building a website or helping me generate the proper SQL to get what I want - it helps and can be faster than writing by hand. It's pretty much replaced StackOverflow for helping me debug things or look up how to do something that I know is already solved somewhere and I don't want to reinvent. But, I've also seen it make a complete mess of my codebase anytime I try to build something larger. It might technically give me a working widget after some vibe coding, but I'm probably going to have to clean the whole thing up manually and refactor some of it. I'm not certain that it's more efficient than just doing it myself from the start.
Every other facet of the world that AI is trying to 'take over', is not programming. Programming is writing text, what AI is good at. It's using references to other code, which AI has been specifically trained on. Etc. It makes sense that that use case is coming along well. Everything else, not even close IMO. Unless it's similar. It's probably great at helping people draft emails and finish their homework. I don't have those pain points.
> So, then - why don't people embrace AI with thinking mode as an acceptable form of automation?
"Thinking" mode is not thinking, it's generating additional text that looks like someone talking to themselves. It is as devoid of intention and prone to hallucinations as the rest of LLM's output.
> Can't the C-suite in this case follow its thought process and step in when it messes up?
That sounds like manual work you'd want to delegate, not automation.
The promise of AI is that it lets you "skip the drudgery of thinking about the details" but sometimes that is exactly what you don't want. You want one or more humans with experience in the business domain to demonstrate they have thought about the details very carefully. The spreadsheet computes a result but its higher purpose is a kind of "proof" this thinking was done.
If the actual thinking doesn't matter and you just need some plausible numbers that look the part (also a common situation), gen ai will do that pretty well.
We need to stop using AI as an umbrella term. It’s worth remembering that LLMs can’t play chess and that the best chess models like Leela Chess Zero use deep neutral networks.
Generative AI - which the world now believes is AI, is not the same as predictive / analytical AI.
It’s fairly easy to demonstrate this by getting ChatGPT to generate a new relatively complex spreadsheet then asking it to analyze and make changes to the same spreadsheet.
The problem we have now is uninformed people believing AI is the answer to everything… if not today then in the near future. Which makes it more of a religion than a technology.
Which may be the whole goal …
> Successful people create companies. More successful people create countries. The most successful people create religions.
Ok yep, fair. My comment was about using copilot-ish tech to generate plausible looking spreadsheets.
The kind of things that a domain expert Brenda knows that ChatGPT doesn't know (yet) are like:
There are 3 vendors a, b, c who all look similar on paper but vendor c always tacks on weird extra charges that take a lot of angry phone calls to sort out.
By volume or weight it looks like you could get 100 boxes per truck but for industry specific reasons only 80 can legally be loaded.
Hyper specific details about real estate compliance in neighbouring areas that mean buildings that look similar on paper are in fact very different.
A good Brenda can understand the world around her as it actually is, she is a player in it and knows the "real" rules rather than operating from general understanding and what people have bothered to write down.
That automation you cite in your #1 is advocated for because it is deterministic and, with effort, fairly well understood (I have countless scripts solidly running for years).
I don't disavow AI, but like the author, I am not thrilled that the masses of excel users suddenly have access to Copilot (gpt4). I've used Copilot enough now to know that there will be huge, costly mistakes.
> We disavow AI because people like Brenda are perfect and the machine is error-prone.
I don't think that is the message here. The message is that while Brenda might know what she is doing and maybe AI helps her.
> She's gonna birth that formula for a financial report and then she's gonna send that financial report
The problem is people who might not know what they are doing
> he would have sent it back to Brenda but he's like oh I have AI and AI is probably like smarter than Brenda and then the AI is gonna fuck it up real bad
Because AI outputs sound so confident makes many people feel like experts. Rather than involve Brenda to debug the issue, C-suit might say - I believe! I can do it too. AI FTW!
Even when people advocate automation especially in areas like finance, because people are error prone, there is always a human in the loop whose job is to double check the automation. The day when this human finds errors in the machine there is going to be lot of noise. And if the day happens to be a quarterly or yearly closing/reporting there is going to be hell to pay once closing/reporting is done. Both the automation and developer are going to be hauled up (obviously I am exaggerating here).
would you be willing to guarantee that some automation process will never mess up, and if/when it does, compensate the user with cash.
For a compiler, with a given set of test suites, the answer is generally yes, and you could probably find someone willing to insure you for a significant amount of money, that a compilation bug will not screw up in a such a large way that it will affect your business.
For a LLM, I have a believing that anyone will be willing to provide that same level of insurance.
If a LLM company said "hey use our product, it works 100% of the time, and if it does fuck up, we will pay up to a million dollars in losses" I bet a lot of people would be willing to use it. I do not believe any sane company will make that guarantee at this point, outside of extremely narrow cases with lots of guardrails.
That's why a lot of ai tools are consumer/dev tools, because if they fuck up, (which they will) the losses are minimal.
I feel like it comes down to predictability and overall trust and confidence. AI is still very fucky, and for people that don't understand the nuances, it definitely will hallucinate and potentially cause real issues. It is about as happy as a Linux rm command to nuke hours of work. Fortunately these tools typically have a change log you can undo, but still.
Also Brenda is human and we should prioritize keeping humans in jobs, but with the way shit is going that seems like a lost hope. It's already over.
The “Brenda” example is a lumped sum fallacy where there is an “average” person or phenomenon that we can benchmark against. Such a person doesn't exist, leading to these dissonant, contradictory dichotomies.
The fact of the matter is that there are some people who can hold lots of information in their head at once. Others are good at finding information. Others still are proficient at getting people to help them. Etc. Any of these people could be tasked with solving the same problem and they would leverage their actual, particular strengths rather than some nebulous “is good or bad at the task” metric.
As it happens, nearly all the discourse uses this lumped sum fallacy, leading to people simultaneously talking past one another while not fundamentally moving the discussion forward.
I see where you are coming from but in my head, Brenda isn't real.
She represents the typical domain-experts that use Excel imo. They have an understanding of some part of the business and express it while using Excel in a deterministic way: enter a value of X, multiply it by Y and it keeps producing Z forever!
You can train AI to be a better domain expert. That's not in question, however with AI, you introduce a dice roll: it may not miltiply X and Y to get Z... it might get something else. Sometimes. Maybe.
If your spreadsheet is a list of names going on the next annual accounts department outing then the risk is minimal.
If it's your annual accounts that the stock market needs to work out billion dollar investment portfolios, then you are asking for all the pain that it will likely bring.
Humans, legacy algorithmic systems, and LLM's have different error modes.
- Legacy systems typically have error modes where integrations or user interface breaks in annoying but obvious ways. Pure algorithms calculating things like payroll tend to be (relatively) rigorously developed and are highly deterministic.
- LLMs have error modes more similar to humans than legacy systems, but more limited. They're non-deterministic, make up answers sometimes, and almost never admit they can't do something; sometimes they make pure errors in arithmetic or logic too.
- Humans have even more unpredictable error modes; on top of the errors encountered in LLM's, they also have emotion, fatigue, org politics, demotivation, misaligned incentives, and so on. But because we've been dealing with working with other humans for ten thousand years we've gotten fairly good at managing each other... but it's still challenging.
LLMs probably need a mixture of "correctness tests" (like evals/unit tests) and "management" (human-in-the-loop).
> So, then - why don't people embrace AI with thinking mode as an acceptable form of automation
Mainly because Generative AI _is not automation_ . Automation is set on fixed ruleset, predictable, reliable and actually saving time. Generative AI ...is whatever it is, it is definitely not automation.
I feel like you've squashed a 3D concern (automations at different levels of the tech stack) into a 2D observation (global concerns about automations).
Human determinism, as elastic as it might be, is still different than AI non-determinism. Especially when it comes to numbers/data.
AI might be helpful with information but it's far less trustable for data.
The big problem with AI in back-office automation is that it will randomly decide to do something different than it had been doing. Meaning that it could be happily crunching numbers accurately in your development and launch experience, then utterly drop the ball after a month in production.
While humans have the same risk factors, human oriented back-office processes involve multiple rounds of automated/manual checks which are extremely laborious. Human errors in spreadsheets have particular flavors such as forgotten cell, misstyped number, or reading from the wrong file/column. Human's are pretty good at catching these errors as they produce either completely wrong results when the columns don't line up - or the typo'd number is completely out of distribution.
An AI may simply decide to hallucinate realistic column values rather than extracting its assigned input. Or hallucinate a fraction of column values. How do you QA this? You can't guarantee that two invocations of the AI won't hallucinate the same values, you can't guarantee that a different LLM won't hallucinate different values. To get a real human check, you'd need to re-do the task as a human. In theory you can have the LLM perform some symbolic manipulation to improve accuracy... but it can still hallucinate the reasoning traces etc.
If a human decided to make up accounting numbers one out of every 10000 accounting requests they would likely be charged with fraud. Good luck finding the AI hallucinations at the equivalent level before some disaster occurs. Likewise, how do you ensure the human excel operator doesn't get pressured into certifying the AIs numbers when the "don't get fired this week" button is sitting right their in their excel app? how do you avoid the race to the bottom where the "star" employee is the one certifying the AI results without thorough review?
I'm bullish on AI in backoffice, but ignoring the real difficulties in deployment doesn't help us get there.
By the same fascination, do computers become more complex to enhance people? or do people get more complex with the use of computers? Also, do computers allow people to become less skilled and inefficient? or do less skilled and inefficient people require the need for computers?
The vector of change is acceptable in one direction and disliked in another. People become greater versions of themselves with new tech. But people also get dumber and less involved because of new tech.
I'm disappointed that my human life has no value in a world of AI. You can retort with "ah but you'll be entertained and on super-drugs so you won't care!", but I would further retort that I'd rather live in a universe where I can contribute something, no matter how small.
The current generation of AI tools augment humans, they don't replace them.
One of the most under-rated harms of AI at the moment is this sense of despair it causes in people who take the AI vendors at their word ("AGI! Outperform humans at most economically valuable work!")
Automation implies determinism. It reliable gives you the same predictable output for a given input, over and over again.
AI is non deterministic by design. You never quite no for sure what it's going to give you. Which is what makes it powerful. But also makes it higher risk.
The reason is oftentimes fairly simple, certain people have their material wealth and income threatened by such automation, and therefore it's bad (an intellectualized reason is created post-hoc)
I predict there will actually be a lot of work to be done on the "software engineering" side w.r.t. improving reliability and safety as you allude to, for handing off to less than sentient bots. Improved snapshot, commit, undo, quorum, functionalities, this sort of thing.
The idea that the AI should step into our programs without changing the programs whatsoever around the AI is a horseless carriage.
Co-pilot and AI has been shoved at the Microsoft Stack in my org for months. Most of the features were disabled or hopelessly bad. It’s cheaper for Microsoft to push this junk and claim they’re doing something, it’s going to improve their stock far more than not doing it, even though it’s basically useless currently.
Another issue is that my org disallows AI transcription bots. It’s a legit security risk if you have some random process recording confidential info because the person was too busy to attend the meeting and take notes themselves. Or possibly they just shirk off the meetings and have AI sit in.
Still find the Copilot transcripts orders of magnitude worse than something like Wispr Flow and they tend to allucinate constantly and do not adapt to a company's context (that Copilot has access too...). I am talking about acronyms of products / teams, names of people (even when they are in the call), etc.
Can anyone familiar with the technical details shed light on why this is so.
Is it because of a globally trained model (as opposed to trained[tweaked on] on context specific data) or because of using different classes of models.
Neither copilot nor flow can natively handle audio to my understanding, so there is already a transcription model converting it to text that then GPT tries to summarise.
It could be they simply use a mediocre transcription model. Wispr is amazing but would hurt their pride to use a competitor.
But i feel it's more likley the experience is; GPT didn't actually improve on the raw transcription, just made it worse. Especially as any miss-transcipted words may trip it up and make it misunderstand while making the summary.
if i can choose between a potentially confused and misunderstood summary, and a badly spellchecked (flipped words) raw transcription, i would trust the latter.
The worse part is to see it creep on developer stack at places where it should not be.
I am all good for nice completion on VS, or help decypher compiler errors, but lets do this AI push with some contention.
Also what I really deslike is the prompt interface, AI integrations have to feel natural transparent part of the workflow, not trying to put everything into a tiny chat window.
And while we're at it, can we please improve voice reckognition?
“There are two Brendas - their job is to make spreadsheets in the Finance department. Well, not quite - they add the months and categories to empty spreadsheets, then they ask the other departments to fill in their sales numbers every month so it can be presented to management.
“The two Brendas don’t seem to talk, otherwise they would realize that they’re both asking everyone for the same information, twice. And they’re so focused on their little spreadsheet worlds that neither sees enough of the bigger picture to say, ‘Wait… couldn’t we just automate this so we don’t need to do this song and dance every month? Then we wouldn’t need two people in different parts of the company compiling the same data manually.’
“But that’s not what Brenda was hired for. She’s a spreadsheet person, not a process fixer. She just makes the spreadsheets.”
We need fewer Brendas, and more people who can automate away the need for them.
With respect, you probably only see that bit of Finance, but doesn't mean that is all Brenda does.
At least half of the work in my senior Finance team involves meeting people in operations to find out what they are planning to do and to analyse the effects, and present them to decision makers to help them understand the consequences of decisions. For an AI to help, someone would have to trigger those conversations in the first place and ask the right questions.
The rest of the work involves tidying up all the exceptions that the automation failed on.
Meanwhile copilot in Excel can't even edit the sheet you are working on. If you say to it, 'give me a template for an expense claim' it will give you a sheet to download... probably with #REF written in where the answers should be.
I work in corporate finance and these issues are certainly present. However, they are almost always known and determined low priority to have a better process built. Finance processes are nearly always a non priority as a pure cost center/overhead there’s not many companies that want to invest in improving the situation, they’ll limp along with minimal investment even once big and profitable.
That said, every finance function is different and it may be unknown to them that you’re being asked for some data multiple times. If you’re enduring this process, I’m of the opinion you’re equally at fault. Suggest a solution that will be easier on you. As it’s possible they don’t even know it’s happening. In the case provided, email to all relevant finance people “Here’s a link to a shared workbook. I’ll drop the numbers here monthly, please save the link and get the data directly from that file. Thanks!” Problem solved. Until you don’t follow through which is what causes most finance people to be constantly asking for data/things. So be kind and also set yourself a monthly recurring reminder on your calendar and actually follow through.
And they've all been burned by enterprise finance products which were sold to solve exactly that problem.
Only different companies were all sold different enterprise finance products, but they need to communicate with each other (or themselves after mergers), so it all gets manually copied into Excel and emailed around each month.
Also an acceptable solution. This is usually where the next step is have a BI type person just create a report for finance. Many reasons but what will end up is different people are filtering/retrieving the data differently causing inconsistencies.
But Usually finance is always preferring on demand access so the communication feedback loop of asking for stuff is not well liked so I’m sure they appreciate this middle step too.
There are many cases where there’s no easy way to give access to the data and a human in the loop is required. In that case, do the shared workbook thing I mentioned as a starting point at least. It may evolve from there.
And then you end up with a team of five people each tree times as expensive as Brenda, and what used to be an email now takes a sprint and has to go through ticket system.
Then you end up with a report that goes out automatically every month to leadership pulled directly from the Salesforce data, along with a real time dashboard anyone in the org can look at, broken down by team, vertical, and sales volume.
It's not what you had in mind, but that's what you get. Because automation, integration, and AI are currently garbage -- Salesforce, Netsuite, doesn't matter. They don't do the magic that they promise. Because process is still very much a human problem, not a computational one.
We need more Brendas (those who excel goddesses come and kiss on the forehead) and need less people who are disrespectful of Brendas. The example in this post is someone giving more respect to AI than Brenda.
But then you need someone to maintain/look after that automation, and they'll be more expensive than two Brendas
And now if one of the Brendas wants to change their process slightly, add some more info, they can't just do it anymore. They have to have a three way discussion with the other Brenda, the automation guy and maybe a few managers. It will take months. So then its likely better for Brenda to just go back to using her spreadsheet again, and then you've got an automated process that no longer meets peoples needs and will be a faff to update.
People’s reaction to this varies based on the Brendas they’ve worked with. Some are given a specific task to do with their spreadsheets every week and have to just do as they are told even if they can see it’s not a good process. Others are secretly the brains of the company – the only one who really sees the whole picture. And a good number of Brendas are the company owner doing her best with the only tool she’s had the time to learn.
No, I’m suggesting that she is ineffective exactly because she stays in her box.
She should replaced with someone who says, “this box doesn’t need to be here… there is a better way of doing things.”
NOT to be confused with the junior engineer who comes into a project and says it’s garbage and suggests we rewrite it from scratch in ${hotLanguage} because they saw it on a blog somewhere.
I’m not sure why you’re going down to the mat for hanging onto redundant people putting numbers in spreadsheets.
At large companies in particular, there are far too many people who simply turn their widgets - this was the entire point of the tech revolution.
Think about how many bookkeepers were needed before Excel. Someone could have made your exact same argument (but it’s just the latest gimmick!) about Excel 30 years ago. And yet, technology will make businesses more efficient whether people stand in its way or not.
Even at a small company of one or two, QuickBooks will reduce the amount of bookkeepers and accountants needed. TurboTax will further reduce that.
We will need fewer people in the future maintaining their Excel spreadsheets, and more people building the automation for those processes.
The change averse will always find reasons not to adapt - they will create their own obsolescence.
(inb4 but it’s way more expensive to pay developers to automate!)
That's a pretty specific example when there are a lot of good "spreadsheet people" out there who do a lot more than spreadsheets (maybe they had to write SQL queries or scripts to get those numbers), but commonly need to simplify things down to a spreadsheet or power point for upper management. I'm not saying you should have multiple people doing redundant work, but this style isn't entirely dumb.
What would this be replaced by? Some kind of large SAP like system that costs millions of dollars and requires a dozen IT staff to maintain?
Fair - I was creating a straw man mostly to make a point. The people I’m thinking aren’t running SQL queries or scripts, they’re merely collection points for data.
So one good BI developer who knows Tableau and Salesforce and Excel and SQL can replace those pure collection points with a better process, but they can also generate insight into the data because they have some business understanding from being close to the teams, which is what my hypothetical Brenda can’t do.
In my example, Brenda would be asking sales leaders to enter in their data instead of going into Salesforce herself because she doesn’t know that tool / side of the company well enough.
I was making the point that, contrary to the article, the Brendas I know aren’t touched by the Excel angels, they’re just maintaining spreadsheets that we probably shouldn’t have anyway.
I think that is a fair point too. The person that builds the Tableau dashboard could just send Brenda a screenshot once a month and that saves everyone time.
A screenshot of a Tableau dashboard is possibly the most dangerous form of internal data communication there is, because it entirely removes any chance of digging into that dashboard and figuring out what queries created it and spotting the incorrect assumptions they made along the way.
A hill I will die on is that business analytics need "view source" or they aren't worth the pixels they are rendered with.
Y'know why people don't automate their jobs? It's not a skill issue it's an incentives issue.
If you do your job, you get paid periodically. If you automate your job, you get paid once for automating it and then nothing, despite your automation constantly producing value for the company.
To fix this, we need to pay people continually for their past work as long as it keeps producing value.
it's a large human behavior question for me, the notion of work, value, economy, efficiency .. all muddied in there
- i used to work on small jobs younger, as a nerd, i could use software better than legacy employees, during the 3 months, i found their tools were scriptable so I did just that. I made 10x more with 2x less mental effort (I just "copilot" my script before it commits actual changes) all that for min wage. and i was happy like a puppy, being free to race as far as i want it to be, designing the script to fit exactly the needs of an operator.
(side note, legacy employees were pissed because my throughput increase the rate of things they had to do, i didn't foresee that and when i offered to help them so they don't have to work more, they were just pissed at me)
- later i became a legit software engineer, i'm now paid a lot all things considered, to talk to the manager of legacy employees like the above, to produce some mediocre web app that will never match employees need because of all the middle layers and cost-pressure, which also means i'm tired because i'm not free to improve things and i have to obey the customer ...
so for 6x more money you get a lot less (if you deliver, sometimes projects get canned before shipping)
Yes, but notice what you are describing are all negative incentives.
When automation produces value for the company, the people automating it should capture a chunk of that value _as a matter of course_.
Even if you argue that you can then negotiate better compensation:
1) That is uncertain and delayed reward - and only if other people feel like it, it's not automatic.
2) The reward stops if you get fired or leave, despite the automation still producing value - you are also basically incentivized to build stuff that requires constant maintenance. Imagine you spend a man-month building the automation and then leave, it then requires a man-month of maintenance over the next 5 years. At the end of the 5 years, you should still be getting 50% of the reward.
That mirrors my experience as well. LLMs get instantly confused in real world scenarios in Excel and confidently hallucinate millions in errors
If you look at the demos for these it’s always something that is clean and abundantly available in training data. Like an income statement. Or a textbook example DCF. Or my personal fav „here is some data show me insights“. Real world excel use looks nothing like that.
I’m getting some utility out of them for some corporate tasks but zilch in excel space.
As somebody with non-existent experience with Excel, I could totally see myself getting a lot of value out of LLMs, if nothing else then simply for telling me what's possible, what functions and patterns exist at all etc.
Well if you do it once then yes, but if you automate this process it is different. E.g. I do this with YouTube videos, because watching 14 minutes video or reading 30 seconds summary is time saver. I still watch some videos fully, but many of them are not worth it.
So in summary I think it was just part of automated process (maybe) or it will become one in the future.
Why spend two minutes typing (and realistically longer than that, if I want to capture the exact transcript I would need to keep hitting pause and play and correcting myself) when I can spend ten seconds pasting a URL into my terminal and then dragging and dropping the resulting file onto the MacWhisper window?
I actually transcribed the whole TikTok which was about 50% longer than what I quoted, then edited it down to the best illustrative quote.
There's also a free version that just uses Whisper. I recommend giving it a go, it's a very well constructed GUI wrapper. I use it multiple times a week, and I've run Whisper on my machine in other less convenient ways in the past.
This reminds me of a friend whose company ran a daily perl script that committed every financial transaction of the day to a database. Without the script, the company could literally make no money irrespectively of sales because this database was one piece in a complex system for payment processor interoperability.
The script ran in a machine located at the corner of a cubicle and only one employee had the admin password. Nobody but a handful of people knew of the machine's existence, certainly not anyone in middle management and above. The script could only be updated by an admin.
Copilot may be good, but sure as hell doesn't know that admin password.
Everywhere I’ve ever worked has had that mission critical box.
At one of my jobs we had a server rack with UPS, etc, all the usual business. On the floor next to it was a dell desktop with a piece of paper on it that said “do not turn off”. It had our source control server in it, and the power button didn’t work. We did eventually move it to something more sensible but we had that for a long time
yeah. I mean, someone else would _eventually_ figure it out. There wasn't full disk encryption or anyhting on it, so if the guy got hit by a bus, and the machien turned off we probably would have just imaged the disk and got it running in a VM.
An old colleague and friend used to print out a 30 page perl script he wrote to do almost exactly this in this scenario. A stapled copy could always be found on his dining room table.
That sounds pretty bad. Not a great argument against AI: "Our employees have created such a bad mess that AI wont work because only they know how the mess they created works".
> "Our employees have created such a bad mess that AI wont work because only they know how the mess they created works".
This is an ironclad argument against fully replacing employees with AI.
Every single organization on Earth requires the people who were part of creating the current mess to be involved in keeping the organization functioning.
Yes you can improve the current mess. But it's still just a slightly better mess and you still need some of the people around who have been part of creating the new mess.
Just run a thought experiment: every employee in a corporation mysteriously disappear from the face of the Earth. If you bring in an equal number of equally talented people the next day to run it, but with no experience with the current processes of the corporation, how long will it take to get to the same capability of the previous employees?
Excel is the most popular programming environment in the universe. It has optimized the five minute out of the box experience so well that grade schoolers can use it.
Other than that, it is pretty horrible for coding.
Excel is programming. Spreadsheets have been full of bugs for decades. How is Brenda any different from a developer? Why are people scared when the LLM might affect their dollar calculations, and less bothered when it affects their product?
Many fears of “AI mucking it up” could be mitigated with an ability to connect a workbook to a git repository. Not for data, but for VBA, cell formulas, and cell metadata. When you can encapsulate the changes a contributor (in this case co-pilot) makes into a commit, you can more easily understand what changes it/they made.
Brenda has been getting slower over the years -as we all have-, but soon the boss will learn that it was a small price to pay for knowing well how to keep such house of cards from collapsing.
And then the boss will make the decision to outsource her job, to a company that promises the use of AI to make finance better, and faster, and while Brenda is in the unemployment line, someone else thousands of miles away is celebrating a new job
Excel is the “beast that drives the ENTIRE economy” and he’s worried about Brenda from the finance department losing her job because then her boss will get bad financial reports
I suppose the person that wrote that have not ideia Excel is just an app builder where you embed data together with code.
You know that we have excel because computers didn’t understand column names in databases and so data extraction needed to be made by humans. Humans then design those little apps in excel to massage the data.
Well, now an agent can read the boss saying gimme the sales from last month and the agent don’t need excel for that, because it can query the database itself, massage the data itself using python and present the data itself with html or PNGs.
So, we are in the process of automating Brenda AND excel away.
Also, finance departments are a very small part of excel users. Just think everywhere were people need small programs, excel is there.
In most cases where the excel spreadsheet is business critical, the spreadsheet _is_ the database. These companies aren’t using an erp system. They are directly entering inventory and sales numbers in the spreadsheet.
The post is clearly hyperbole obviously the sole issue being brought up isn't 'brenda losing her job may be bad for the company' you're being facetious.
You missed this bit “.. and then the AI is gonna fuck it up real bad and he won't be able to recognize it because he doesn't understand because AI hallucinates.”
The underlying assumption is that Brenda generally does her job pretty well. Human errors exist but usually peers/managers (or the person who did it) can identify and correct them reliably.
If we have to compare LLM’s against people who are bad at their jobs in order to highlight their utility we’re going the wrong direction.
There are a lot of underlying assumptions: Brenda, the woman, is accurate and trustworthy and has mastered an accurate and trustworthy technology; the upper manager, the male, will introduce error by not understanding that the technology he brings to bear on the situation is hallucinatory. The woman is lower in status and pay than the male. The woman is necessary to the functioning of "the economy" and "capitalism," while the man threatens those. There are a lot of unsubtle political undertones on TikTok.
Don't be like that. I work at a Fortune 500 and Brenda wants that co-pilot in Excel because it can help her achieve so much more. What is so much more you ask? Brenda and her C-Suits can define it but they know for sure Copilot in excel will lead to enormous time saving.
Coding agents are useful and good and real products because when they screw up, things stop working almost always before they can do damage. Coding agents are flawed in ways that existing tools are good at catching, never mind the more obvious build and runtime errors.
Letting AI write your emails and create your P&L and cash flow projections doesn't have to run the gauntlet of tools that were created to stop flawed humans from creating bad code.
Fair. I've been using the coding agent in Android Studio Canary to do exploratory code in Dart/Flutter and using ATProto. Low stakes, but higher productivity is a significant benefit. It's a daily surprise how brilliant it is it's some things and how abysmal at others.
Using ai does not absolve you from the responsibility of doing it correctly. If you use ai, then you better have the skills to have done the job yourself, and so have the ability to check the AI did things correctly.
You can save time still, but perhaps not as much as you think, because you need to check the ai's work thoroughly.
another cheese that will affect the outcome of major tournaments, not a good look for microsoft
its like the xlookup situation all over again, yet another move aimed at the casual audience, designed to bring in the party gamers and make the program an absolute mess competitively
I agree - having watched many people use Excel over the years, I'd say people often overestimate their skills. I see three categories of Excel users. First there are the people that are intimidated by it and stay away from any task involving Excel. Second are the people that know a little bit (a few basic formulas) and overestimate their skills because they only compare themselves to the first group. And the third group are the actual power users but know to keep that quiet because otherwise they become the "excel person" and have to fix every sheet that has issues.
I don't know if AI is going to make any of the above better or worse. I expect the only group to really use it will be that second group.
I have seen lots and lots of different uses for Excel in my line of work:
- password database
- script to automatically rename jpeg files
- game
- grocery lists
- Book keeping (and try and not get caught for fraud several years, because the monthly spending limit is $5000 and $4999 a month is below that...)
- embed/collect lots of Word documents
- coloring book
- Minecraft processes
- Resume database
- ID scans
The mismatch between what people not on it think TikTok is like and what it's actually like (once you get the algo tuned to your taste) is pretty crazy.
But then the "new user" experience is so horrific in terms of the tacky default content it serves you that I'm not surprised so many people don't get past it.
"the sweat from Brenda's brow is what allows us to do capitalism."
The CEO has been itching to fire this person and nuke her department forever. She hasn't gotten the hint with the low pay or long hours, but now Copilot creates exactly the opening the CEO has been looking for.
At some point, a publicly-listed company will go bankrupt due to some catastrophic AI-induced fuck-up. This is a massive reputational risk for AI platforms, because ego-defensive behaviour guarantees that the people involved will make as much noise as they can about how it's all the AI's fault.
I don't find comments along the lines of 'those people over there are bad' to be interesting, especially when I agree with them. My comment is about why it'll go wrong for them.
I see the inverse of that happening: every critical decision will incorporate AI somehow. If the decision was good, the leadership takes credit. If something terrible happens, blame it on the AI. I think it's the part no one is saying out loud. That AI may not do a damn useful thing, but it can be a free insurance policy or surrogate to throw under the bus when SHTF.
This works at most one time. If you show up to every board meeting and blame AI, you’re going to get fired.
This is true if you blame a bad vendor, or something you don’t even control like the weather. Your job is to deliver. If bad weather is the new norm, you better figure out how to build circus tents so you can do construction in the rain. If your AI call center is failing, you better hire 20 people to answer phones.
I'm actually not that worried about this, because again I would classify this as a problem that already exists. There are already idiots in senior management who pass off bullshit and screw things up. There are natural mechanisms to cope with this, primarily in business reputation - if you're one of those idiots who does this people very quickly start just discounting what you're saying, they might not know how you're wrong, but they learn very quickly to discount what you're saying because they know you can't be trusted to self-check.
I'm not saying that this can't happen and it's not bad. Take a look at nudge theory - the UK government created an entire department and spent enormous amounts of time and money on what they thought was a free lunch - that they could just "nudge" people into doing the things they wanted. So rather than actually solving difficult problems the uk government embarked on decades of pseudo-intellectual self agrandizement. The entire basis of that decades long debacle was based on bullshit data and fake studies. We didn't need AI to fuck it up, we managed it perfectly well by ourselves.
Nudge theory isn't useless, it's just not anything like as powerful as money or regulation.
It was taken up by the UK government at that time because the government was, unusually, a coalition of two quite different parties, and thus found it hard to agree to actually use the normal levers of power.
This is transparent nonsense. People are very very happy to introduce errors into excel spreadsheets without any help from AI.
Financial statements are correct because of auditors who check the numbers.
If you have a good audit process then errors get detected even if AI helped introduce them. If you aren't doing a good audit then I suspect nobody cares whether your financial statement is correct (anyone who did would insist on an audit).
> If you have a good audit process then errors get detected even if AI helped introduce them. If you aren't doing a good audit then I suspect nobody cares whether your financial statement is correct (anyone who did would insist on an audit).
Volume matters. The single largest problem I run into: AI can generate slop faster than anyone can evaluate it.
Both are valid concerns, no need to decide. Take the USA: They are currently lead by a patently dumb president who fucks up the global economy, and at the same time they are powerful enough to do so!
For a more serious example, consider the Paperclip Problem[0] for a very smart system that destroys the world due to very dumb behaviour.
The paperclip problem is a bit hand-wavey about intelligence. It is taken as a given than unlimited intelligence would automatically win presumably because it could figure out how to do literally anything.
But let's consider real life intelligence:
- Our super geniuses do not take over the world. It is the generationally wealthy who do.
- Super geniuses also have a tendency to be terribly neurotic, if not downright mentally ill. They can have trouble functioning in society.
- There is no thought here about different kinds of intelligence and the roles they play. It is assumed there is only one kind, and AI will have it in the extreme.
To be clear, I don't think the paperclip scenario is a realistic one. The point was that it's fairly easy to conceive an AI system that's simultaneously extremely savant and therefore dangerous in a single domain, yet entirely incapable of grasping the consequences or wider implications of its actions.
None of us knows what an actual, artificial intelligence really looks like. I find it hard to draw conclusions from observing human super geniuses, when their minds may have next to nothing in common with the AI. Entirely different constraints might apply to them—or none at all.
Having said all that, I'm pretty sceptical of an AI takeover doomsday scenario, especially if we're talking about LLMs. They may turn out to be good text generators, but not the road to AGI. But it's very hard to make accurate predictions in either direction.
I find the contrast between two narratives around technology use so fascinating:
1. We advocate automation because people like Brenda are error-prone and machines are perfect.
2. We disavow AI because people like Brenda are perfect and the machine is error-prone.
These aren't contradictions because we only advocate for automation in limited contexts: when the task is understandable, the execution is reliable, the process is observable, and the endeavour tedious. The complexity of the task isn't a factor - it's complex to generate correct machine code, but we trust compilers to do it all the time.
In a nutshell, we seem to be fine with automation if we can have a mental model of what it does and how it does it in a way that saves humans effort.
So, then - why don't people embrace AI with thinking mode as an acceptable form of automation? Can't the C-suite in this case follow its thought process and step in when it messes up?
I think people still find AI repugnant in that case. There's still a sense of "I don't know why you did this and it scares me", despite the debuggability, and it comes from the autonomy without guardrails. People want to be able to stop bad things before they happen, but with AI you often only seem to do so after the fact.
Narrow AI, AI with guardrails, AI with multiple safety redundancies - these don't elicit the same reaction. They seem to be valid, acceptable forms of automation. Perhaps that's what the ecosystem will eventually tend to, hopefully.
It's not as black-and-white as "Brenda good, AI bad". It's much more nuanced than this.
When it comes to (traditional) coding, for the most part, when I program a function to do X, every single time I run that function from now until the heat death of the sun, it will always produce Y. Forever! When it does, we understand why, and when it doesn't, we also can understand why it didn't!
When I use AI to perform X, every single time I run that AI from now until the heat death of the sun it will maybe produce Y. Forever! When it does, we don't understand why, and when it doesn't, we also don't understand why!
We know that Brenda might screw up sometimes but she doesn't run at the speed of light, isn't able to produce a thousand lines of Excel Macro in 3 seconds, doesn't hallucinate (well, let's hope she doesn't), can follow instructions etc. If she does make a mistake, we can find it, fix it, ask her what happened etc. before the damage is too great.
In short: when AI does anything at all, we only have, at best, a rough approximation of why it did it. With Brenda, it only takes a couple of questions to figure it out!
Before anyone says I'm against AI, I love it and am neck-deep in it all day when programming (not vibe-coding!) so I have a full understanding of what I'm getting myself into but I also know its limitations!
> When I use AI to perform X, every single time I run that AI from now until the heat death of the sun it will maybe produce Y. Forever! When it does, we don't understand why, and when it doesn't, we also don't understand why!
To make this even worse, it may even produce Y just enough times to make it seem reliable and then it is unleashed without supervision, running thousands or millions of times, wrecking havoc producing Z in a large number of places.
Exactly. Fundamentally, I want my computer's computations to be deterministic, not probabilistic. And, I don't want the results to arbitrarily change because some company 1,500 miles away from me up-and-decided to "train some new model" or whatever it is they do.
A computer program should deliver reliable, consistent output if it is consistently given the same input. If I wanted inconsistency and unreliability, I'd ask a human to do it.
It's not arbitrary ... your precise and deterministic, multi-year, financial analysis needs to be corrected every so often for left-wing bias.
/s ffs
Brenda also needs to put food on the table. If Brenda is 'careless' and messes up we can fire Brenda, because of this Brenda tries not to be carless (also other emotions). However I cannot deprive an AI model of pay because it messed up;
You might be looking for the word “accountability”
It is it even worse in a sense that. It is not either. It is not neither. It is not even both as variations of Branda exist throughout the multiverse in all shapes and forms including one that can troubleshoot her own formulas with ease and accuracy.
But you are absolutely right about one thing. Brenda can be asked and, depending on her experience, she might give you a good idea of what might have happened. LLMs still seem to not have that 'feature'.
No contradiction here:
When we say “machine”, we mean deterministic algorithms and predictable mechanisms.
Generative AI is neither of those things (in theory it is deterministic but not for any practical applications).
If we order by predictability:
Quick Sort > Brenda > Gen AI
There are two kinds of reliability:
Machine reliability does the same thing the same way every time. If there's an error on some input, it will always make that error on that input, and somebody can investigate it and fix it, and then it will never make that error again.
Human reliability does the job even when there are weird variances or things nobody bothered to check for. If the printer runs out of paper, the human goes to the supply cabinet and gets out paper and if there is no paper the human decides whether to run out right now and buy more paper or postpone the print job until tomorrow; possibly they decide that the printing doesn't need to be done at all, or they go downstairs and use a different printer... Humans make errors but they fix them.
LLMs are not machine reliable and not human reliable.
> . If the printer runs out of paper, the human goes to the supply cabinet and gets out paper and if there is no paper the human decides
Sure, these humans exists, but the others, that I happen to encounter every day unfortunately, are the ones that go into broken mode immediately when something is unexpected. Today I ordered something they ran out of and the girl behind the counter just stared in The Deep not having a clue what to do now. Do or say. Or yesterday at dinner, the PoS (on batteries) ran out of power when I tried to pay for dinner. The guy just walked off and went outside for a smoke. I stood there with waiting to pay. The owner apologized and fixed it after a while but I am saying, the employee who runs out of paper and then finds and puts more paper in is not very ... common... In the real world.
I was brought up on the refrain of "aren't computers silly, they do exactly what you tell them to do to the letter, even if it's not what you meant". That had its roots in computers mostly being programmable BASIC machines.
Then came the apps and notifications, and we had to caveat "... when you're writing programs". Which is a diminishing part of the computer experience.
And now we have to append "... unless you're using AI tools".
The distinction is clear to technical people. But it seems like an increasingly niche and alien thing from the broader societal perspective.
I think we need a new refrain, because with the AI stuff it increasingly seems "computers do what they want, don't even get it right, but pretend that they did."
We have absolutely descended, and rapidly, into “computers do whatever the fuck they want and there’s nothing you can do about it” in the past 5 years, and gen AI is only half of the problem.
The other half comes from how incredibly opinionated and controlling the tech giants have become. Microsoft doesn’t even ALLOW consent on windows (yes or maybe later), Google is doing all it can to turn the entire internet into a chrome-only experience, and Apple has to be fought for an entire decade to allow users to place app icons wherever they want on their Home Screen.
There is no question that the overly explicit quirky paradigm of the past was better for almost everyone. It allowed for user control and user expression, but apparently those concepts are bad for the wallet of big tech so they have to go. Generative AI is just the latest biggest nail in the coffin.
We have come a LONG way from the "Where do you want to go today?" of the 90s. Now, it's "You're going where we tell you that you can go, whether you like it or not!"
Flash-backs to dial-up and making sure I had my list of websites written down and ready for when I connected.
Pop culture characters like Lt. Commander Data seem anachronistic now.
It was Second Technician Arnold Judas Rimmer, BSc., SSc. all along.
If you think programs are predictable, I have a bridge to sell you.
The only relevant metric here is how often each thing makes mistakes. Programs are the most reliable, though far from 100%, humans are much less than that, and LLMs are around the level of humans, depending on the humans and the LLM.
When human makes a mistake, we call it a mistake. When human lies, we call it a lie. In both cases, we blame the human.
When LLM does the same, we call it hallucination and blame the human.
Programs can be very close to 100% reliable when made well.
In my life, I've never seen `sort` produce output that wasn't properly sorted. I've never seen a calculator come up with the wrong answer when adding two numbers. I have seen filesystems fail to produce the exact same data that was previously written, but this is something that happens once in a blue moon, and the process is done probably millions of times a day on my computers.
There are bugs, but bugs can be reduced to a very low level with time, effort, and motivation. And technically, most bugs are predictable in theory, they just aren't known ahead of time. There are hardware issues, but those are usually extremely rare.
Nothing is 100% predictable, but software can get to a point that's almost indistinguishable.
> Programs can be very close to 100% reliable when made well.
This is a tautology.
> I've never seen a calculator come up with the wrong answer when adding two numbers.
https://imgz.org/i6XLg7Fz.png
> And technically, most bugs are predictable in theory, they just aren't known ahead of time.
When we're talking about reliability, it doesn't matter whether a thing can be reliable in theory, it matters whether it's reliable in practice. Software is unreliable, humans are unreliable, LLMs are unreliable. To claim otherwise is just wishful thinking.
[delayed]
RE: the calculator screenshot - it's still reliable because the same answer will be produced for the same inputs every time. And the behavior, though possibly confusing to the end user at times, is based on choices made in the design of the system (floating point vs integer representations, rounding/truncating behavior, etc). It's reliable deterministic logic all the way down.
> If we order by predictability:
> Quick Sort > Brenda > Gen AI
Those last two might be the wrong way round.
There are other narratives going on in the background though both called out by the article and implied, including:
Brenda probably has annual refresher courses on GAAP, while her exec and the AI don't.
Automation is expected to be deterministic. The outputs can be validated for a given input. If you need some automation more than Excel functions, writing a power automate flow or recording an office script is sufficient & reliable as automation while being cheaper than AI. Can you validate AI as deterministic? This is important for accounting. Maybe you want some thinking around how to optimize a business process, but not for following them.
Brenda as the human-in-the-loop using AI will be much more able than her exec. Will Brenda + AI be better (or more valuable considering the cost of AI) than Brenda alone? That's the real question, I suppose.
AI in many aspects of our life is simply not good right now. For a lot of applications, AI is perpetually just a few years away from being as useful as you describe. If we get there, great.
"Thinking mode" only provides the illusion of debuggability. It improves performance by generating more tokens which hopefully steer the context towards one more likely to produce the desired response, but the tokens it generates do not reflect any sort of internal state or "reasoning chain" as we understand it in human cognition. They are still just stochastic spew. You have no more insight into why the model generates the particular "reasoning steps" it does than you do into any other output, and neither do you have insight into why the reasoning steps lead to whatever conclusion it comes to. The model is much less constrained by the "reasoning" than we would intuit for a human - it's entirely capable of generating an elaborate and plausible reasoning chain which it then completely ignores in favor of some invisible built-in bias.
I'm always amused when I see comments saying, "I asked it why it produced that answer, and it said...." Sorry, you've badly misunderstood how these things work. It's not analyzing how it got to that answer. It's producing what it "thinks" the response to that question should look like.
> We disavow AI because people like Brenda are perfect and the machine is error-prone.
No, no. We disavow AI because our great leaders inexplicably trust it more than Brenda.
I don't understand why generative AI gets a pass at constantly being wrong, but an average worker would be fired if they performed the same way. If a manager needed to constantly correct you or double check your work, you'd be out. Why are we lowering the bar for generative AI?
My kneejerk reaction is the sunk cost fallacy (AI is expensive), but I'm pretty sure it's actually because businesses have spent the last couple of decades doing absolutely everything they can to automate as many humans out of the workforce as possible.
I've been trying to open my mind and "give AI a chance" lately. I spent all day yesterday struggling with Claude Code's utter incompetence. It behaves worse than any junior engineer I've ever worked with:
- It says it's done when its code does not even work, sometimes when it does not even compile.
- When asked to fix a bug, it confidently declares victory without actually having fixed the bug.
- It gets into this mode where, when it doesn't know what to do, it just tries random things over and over, each time confidently telling me "Perfect! I found the error!" and then waiting for the inevitable response from me: "No, you didn't. Revert that change".
- Only when you give it explicit, detailed commands, "modify fade_output to be -90," will it actually produce decent results, but by the time I get to that level of detail, I might as well be writing the code myself.
To top it off, unlike the junior engineer, Claude never learns from its mistakes. It makes the same ones over and over and over, even if you include "don't make XYZ mistake" in the prompt. If I were an eng manager, Claude would be on a PIP.
Recently I've used Claude Code to build a couple TUIs that I've wanted for a long time but couldn't justify the time investment to write myself.
My experience is that I think of a new feature I want, I take a minute or so to explain it to Claude, press enter, and go off and do something else. When I come back in a few minutes, the desired feature has been implemented correctly with reasonable design choices. I'm not saying this happens most of the time, I'm saying it happens every time. Claude makes mistakes but corrects them before coming to rest. (Often my taste will differ from Claude's slightly, so I'll ask for some tweaks, but that's it.)
The takeaway I'm suggesting is that not everyone has the same experience when it comes to getting useful results from Claude. Presumably it depends on what you're asking for, how you ask, the size of the codebase, how the context is structured, etc.
Learning to use Claude Code (and similar coding agents) effectively takes quite a lot of work.
Did you have it creating and running automated tests as it worked?
> Learning to use Claude Code (and similar coding agents) effectively takes quite a lot of work.
I've tried to put in the work. I can even get it working well for a while. But then all of a sudden it is like the model suffers a massive blow to the head and can't produce anything coherent anymore. Then it is back to the drawing board, trying all over again.
It is exhausting. The promise of what it could be is really tempting fruit, but I am at the point that I can't find the value. The cost of my time to put in the work is not being multiplied in return.
> Did you have it creating and running automated tests as it worked?
Yes. I work in a professional capacity. This is a necessity regardless of who (or what) is producing the product.
yOu'Re HoLdInG iT wRoNg
If a worker could be right 50% of the time and get paid 1 cent to write a 5000 word essay on a random topic, and do it in less than 30 seconds.
Then I think managers would be fine hiring that worker for that rate as well.
5000 half-right words is worthless output. That can even lead to negative productivity.
great, now who are you paying to sort the right output from the wrong output?
There's a variety of reasons.
You don't have a human to manage. The relationship is completely one-sided, you can query a generative AI at 3 in the morning on new years eve. This entity has no emotions to manage and no own interests.
There's cost.
There's an implicit promise of improvement over time.
There's an the domain of expertise being inhumanly wide. You can ask about cookies right now, then about XII century France, then about biochemistry.
The fact that an average worker would be fired if they perform the same way is what the human actually competes with. They have responsibility, which is not something AI can offer. If it was the case that, say, Anthropic, actually signed contracts stating that they are liable for any mistakes, then humans would be absolutely toast.
It’s much cheaper than Brenda (superficially, at least). I’m not sure a worker that costs a few dollars a day would be fired, especially given the occasional brilliance they exhibit.
How much compute costs is it for the AI to do Brenda's job? Not total AI spend, but the fraction that replaced Brenda. That's why they'd fire a human but keep using the AI.
Brenda has been kissed on her forehead by the Excel goddess herself. She is irreplaceable.
(More seriously, she also has 20+ years of institutional knowledge about how the company works, none of which has ever been captured anywhere else.)
Because it doesn’t have to be as accurate as a human to be a helpful tool.
That is precisely why we have humans in the loop for so many AI applications.
If [AI + human reviewer to correct it] is some multiple more efficient than [human alone], there is still plenty of value.
> Because it doesn’t have to be as accurate as a human to be a helpful tool.
I disagree. If something can't be as accurate as a (good) human, then it's useless to me. I'll just ask the human instead, because I know that the human is going to be worth listening to.
Autopilot in airplanes is a good example to disprove that.
Good in most conditions. Not as good as a human. Which is why we still have skilled pilots flying planes, assisted by autopilot.
We don’t say “it’s not as good as a human, so stuff it.”
We say, “it’s great in most conditions. And humans are trained how to leverage it effectively and trained to fly when it cannot be used.”
Because it's much cheaper.
So now you don't have to pay people to do their actual work, you assign the work to ML ("AI") and then pay the people to check what it generated. That's a very different task, menial and boring, but if it produces more value for the same amount of input money, then it's economical to do so.
And since checking the output is often a lower skilled job, you can even pay the people less, pocketing more as an owner.
It’s not even greater trust. It’s just passive trust. The thing is, Brenda is her own QA department. Every good Brenda is precisely good because she checks her own work before shipping it. AI does not do this. It doesn’t even fully understand the problem/question sometimes yet provides a smart definitive sounding answer. It’s like the doctor on The Simpson’s, if you can’t tell he’s a quack, you probably would follow his medical advice.
Brenda + AI > Brenda
That’s definitely the hype. But I don’t know if I agree. I’m essentially a Brenda in my corporate finance job and so far have struggled to find any useful scenarios to use AI for.
I thought once this can build me a Gantt chart because that’s an annoying task in excel. I had the data. When I asked it to help me, “I can’t do that but I can summarize your data”. Not helpful.
Any type of analysis is exactly what I don’t want to trust it with. But I could use help actually building things, which it wouldn’t do.
Also, Brenda’s are usually fast. Having them use a tool like AI that can’t be fully trusted just slows them down. So IMO, we haven’t proven the AI variable in your equation is actually a positive value.
I can't speak to finance. In programming, it can be useful but it takes some time and effort to find where it works well.
I have had no success in using it to create production code. It's just not good enough. It tends to pattern-match the problem in somewhat broad strokes and produce something that looks good but collapses if you dig into it. It might work great for CRUD apps but my work is a lot more fiddly than that.
I've had good success in using it to create one-off helper scripts to analyze data or test things. For code that doesn't have to be good and doesn't have to stand the test of time, it can do alright.
I've had great success in having it do relatively simple analysis on large amounts of code. I see a bug that involves X, and I know that it's happening in Y. There's no immediately obvious connection between X and Y. I can dig into the codebase and trace the connection. Or I can ask the machine to do it. The latter is a hundred times faster.
The key is finding things where it can produce useful results and you can verify them quickly. If it says X and Y are connected by such-and-such path and here's how that triggers the bug, I can go look at the stuff and see if that's actually true. If it is, I've saved a lot of time. If it isn't, no big loss. If I ask it to make some one-off data analysis script, I can evaluate the script and spot-check the results and have some confidence. If I ask it to modify some complicated multithreaded code, it's not likely to get it right, and the effort it takes to evaluate its output is way too much for it to be worthwhile.
I'd agree. Programming is a solid use case for AI. Programming is a part of my job, and hobby too, and that's the main place where I've seen some value with it. It still is not living up to the hype but for simple things, like building a website or helping me generate the proper SQL to get what I want - it helps and can be faster than writing by hand. It's pretty much replaced StackOverflow for helping me debug things or look up how to do something that I know is already solved somewhere and I don't want to reinvent. But, I've also seen it make a complete mess of my codebase anytime I try to build something larger. It might technically give me a working widget after some vibe coding, but I'm probably going to have to clean the whole thing up manually and refactor some of it. I'm not certain that it's more efficient than just doing it myself from the start.
Every other facet of the world that AI is trying to 'take over', is not programming. Programming is writing text, what AI is good at. It's using references to other code, which AI has been specifically trained on. Etc. It makes sense that that use case is coming along well. Everything else, not even close IMO. Unless it's similar. It's probably great at helping people draft emails and finish their homework. I don't have those pain points.
Yes but:
By my measurement, AI < 0
“Let’s deploy something as or more error prone as Brad at infinite scale across our organisation”
Brenda has years (hopefully) of institutional knowledge and transferrable skills.
"hmm, those sales don't look right, that profit margin is unusually high for November"
"Last time I used vlookup I forgot to sort the column first"
"Wait, Bob left the company last month, how can he still be filing expenses"
> So, then - why don't people embrace AI with thinking mode as an acceptable form of automation?
"Thinking" mode is not thinking, it's generating additional text that looks like someone talking to themselves. It is as devoid of intention and prone to hallucinations as the rest of LLM's output.
> Can't the C-suite in this case follow its thought process and step in when it messes up?
That sounds like manual work you'd want to delegate, not automation.
The promise of AI is that it lets you "skip the drudgery of thinking about the details" but sometimes that is exactly what you don't want. You want one or more humans with experience in the business domain to demonstrate they have thought about the details very carefully. The spreadsheet computes a result but its higher purpose is a kind of "proof" this thinking was done.
If the actual thinking doesn't matter and you just need some plausible numbers that look the part (also a common situation), gen ai will do that pretty well.
We need to stop using AI as an umbrella term. It’s worth remembering that LLMs can’t play chess and that the best chess models like Leela Chess Zero use deep neutral networks.
Generative AI - which the world now believes is AI, is not the same as predictive / analytical AI.
It’s fairly easy to demonstrate this by getting ChatGPT to generate a new relatively complex spreadsheet then asking it to analyze and make changes to the same spreadsheet.
The problem we have now is uninformed people believing AI is the answer to everything… if not today then in the near future. Which makes it more of a religion than a technology.
Which may be the whole goal …
> Successful people create companies. More successful people create countries. The most successful people create religions.
— Sam Altman - https://blog.samaltman.com/successful-people
Ok yep, fair. My comment was about using copilot-ish tech to generate plausible looking spreadsheets.
The kind of things that a domain expert Brenda knows that ChatGPT doesn't know (yet) are like:
There are 3 vendors a, b, c who all look similar on paper but vendor c always tacks on weird extra charges that take a lot of angry phone calls to sort out.
By volume or weight it looks like you could get 100 boxes per truck but for industry specific reasons only 80 can legally be loaded.
Hyper specific details about real estate compliance in neighbouring areas that mean buildings that look similar on paper are in fact very different.
A good Brenda can understand the world around her as it actually is, she is a player in it and knows the "real" rules rather than operating from general understanding and what people have bothered to write down.
That automation you cite in your #1 is advocated for because it is deterministic and, with effort, fairly well understood (I have countless scripts solidly running for years).
I don't disavow AI, but like the author, I am not thrilled that the masses of excel users suddenly have access to Copilot (gpt4). I've used Copilot enough now to know that there will be huge, costly mistakes.
> We disavow AI because people like Brenda are perfect and the machine is error-prone.
I don't think that is the message here. The message is that while Brenda might know what she is doing and maybe AI helps her.
> She's gonna birth that formula for a financial report and then she's gonna send that financial report
The problem is people who might not know what they are doing
> he would have sent it back to Brenda but he's like oh I have AI and AI is probably like smarter than Brenda and then the AI is gonna fuck it up real bad
Because AI outputs sound so confident makes many people feel like experts. Rather than involve Brenda to debug the issue, C-suit might say - I believe! I can do it too. AI FTW!
Even when people advocate automation especially in areas like finance, because people are error prone, there is always a human in the loop whose job is to double check the automation. The day when this human finds errors in the machine there is going to be lot of noise. And if the day happens to be a quarterly or yearly closing/reporting there is going to be hell to pay once closing/reporting is done. Both the automation and developer are going to be hauled up (obviously I am exaggerating here).
The issue is reliability.
would you be willing to guarantee that some automation process will never mess up, and if/when it does, compensate the user with cash.
For a compiler, with a given set of test suites, the answer is generally yes, and you could probably find someone willing to insure you for a significant amount of money, that a compilation bug will not screw up in a such a large way that it will affect your business.
For a LLM, I have a believing that anyone will be willing to provide that same level of insurance.
If a LLM company said "hey use our product, it works 100% of the time, and if it does fuck up, we will pay up to a million dollars in losses" I bet a lot of people would be willing to use it. I do not believe any sane company will make that guarantee at this point, outside of extremely narrow cases with lots of guardrails.
That's why a lot of ai tools are consumer/dev tools, because if they fuck up, (which they will) the losses are minimal.
I feel like it comes down to predictability and overall trust and confidence. AI is still very fucky, and for people that don't understand the nuances, it definitely will hallucinate and potentially cause real issues. It is about as happy as a Linux rm command to nuke hours of work. Fortunately these tools typically have a change log you can undo, but still.
Also Brenda is human and we should prioritize keeping humans in jobs, but with the way shit is going that seems like a lost hope. It's already over.
The “Brenda” example is a lumped sum fallacy where there is an “average” person or phenomenon that we can benchmark against. Such a person doesn't exist, leading to these dissonant, contradictory dichotomies.
The fact of the matter is that there are some people who can hold lots of information in their head at once. Others are good at finding information. Others still are proficient at getting people to help them. Etc. Any of these people could be tasked with solving the same problem and they would leverage their actual, particular strengths rather than some nebulous “is good or bad at the task” metric.
As it happens, nearly all the discourse uses this lumped sum fallacy, leading to people simultaneously talking past one another while not fundamentally moving the discussion forward.
I see where you are coming from but in my head, Brenda isn't real.
She represents the typical domain-experts that use Excel imo. They have an understanding of some part of the business and express it while using Excel in a deterministic way: enter a value of X, multiply it by Y and it keeps producing Z forever!
You can train AI to be a better domain expert. That's not in question, however with AI, you introduce a dice roll: it may not miltiply X and Y to get Z... it might get something else. Sometimes. Maybe.
If your spreadsheet is a list of names going on the next annual accounts department outing then the risk is minimal.
If it's your annual accounts that the stock market needs to work out billion dollar investment portfolios, then you are asking for all the pain that it will likely bring.
> You can train AI to be a better domain expert. That's not in question.
I think that very much is in question.
Humans, legacy algorithmic systems, and LLM's have different error modes.
- Legacy systems typically have error modes where integrations or user interface breaks in annoying but obvious ways. Pure algorithms calculating things like payroll tend to be (relatively) rigorously developed and are highly deterministic.
- LLMs have error modes more similar to humans than legacy systems, but more limited. They're non-deterministic, make up answers sometimes, and almost never admit they can't do something; sometimes they make pure errors in arithmetic or logic too.
- Humans have even more unpredictable error modes; on top of the errors encountered in LLM's, they also have emotion, fatigue, org politics, demotivation, misaligned incentives, and so on. But because we've been dealing with working with other humans for ten thousand years we've gotten fairly good at managing each other... but it's still challenging.
LLMs probably need a mixture of "correctness tests" (like evals/unit tests) and "management" (human-in-the-loop).
> So, then - why don't people embrace AI with thinking mode as an acceptable form of automation
Mainly because Generative AI _is not automation_ . Automation is set on fixed ruleset, predictable, reliable and actually saving time. Generative AI ...is whatever it is, it is definitely not automation.
I feel like you've squashed a 3D concern (automations at different levels of the tech stack) into a 2D observation (global concerns about automations).
Human determinism, as elastic as it might be, is still different than AI non-determinism. Especially when it comes to numbers/data.
AI might be helpful with information but it's far less trustable for data.
In my opinion there's a big difference in deterministic and nondeterministic automation.
Non deterministic vs deterministic automation
The big problem with AI in back-office automation is that it will randomly decide to do something different than it had been doing. Meaning that it could be happily crunching numbers accurately in your development and launch experience, then utterly drop the ball after a month in production.
While humans have the same risk factors, human oriented back-office processes involve multiple rounds of automated/manual checks which are extremely laborious. Human errors in spreadsheets have particular flavors such as forgotten cell, misstyped number, or reading from the wrong file/column. Human's are pretty good at catching these errors as they produce either completely wrong results when the columns don't line up - or the typo'd number is completely out of distribution.
An AI may simply decide to hallucinate realistic column values rather than extracting its assigned input. Or hallucinate a fraction of column values. How do you QA this? You can't guarantee that two invocations of the AI won't hallucinate the same values, you can't guarantee that a different LLM won't hallucinate different values. To get a real human check, you'd need to re-do the task as a human. In theory you can have the LLM perform some symbolic manipulation to improve accuracy... but it can still hallucinate the reasoning traces etc.
If a human decided to make up accounting numbers one out of every 10000 accounting requests they would likely be charged with fraud. Good luck finding the AI hallucinations at the equivalent level before some disaster occurs. Likewise, how do you ensure the human excel operator doesn't get pressured into certifying the AIs numbers when the "don't get fired this week" button is sitting right their in their excel app? how do you avoid the race to the bottom where the "star" employee is the one certifying the AI results without thorough review?
I'm bullish on AI in backoffice, but ignoring the real difficulties in deployment doesn't help us get there.
This misunderstands complexity entirely:
The complexity of the task isn't a factor - it's complex to generate correct machine code, but we trust compilers to do it all the time.
By the same fascination, do computers become more complex to enhance people? or do people get more complex with the use of computers? Also, do computers allow people to become less skilled and inefficient? or do less skilled and inefficient people require the need for computers?
The vector of change is acceptable in one direction and disliked in another. People become greater versions of themselves with new tech. But people also get dumber and less involved because of new tech.
I'm disappointed that my human life has no value in a world of AI. You can retort with "ah but you'll be entertained and on super-drugs so you won't care!", but I would further retort that I'd rather live in a universe where I can contribute something, no matter how small.
The current generation of AI tools augment humans, they don't replace them.
One of the most under-rated harms of AI at the moment is this sense of despair it causes in people who take the AI vendors at their word ("AGI! Outperform humans at most economically valuable work!")
I mean you answer your own question.
Automation implies determinism. It reliable gives you the same predictable output for a given input, over and over again.
AI is non deterministic by design. You never quite no for sure what it's going to give you. Which is what makes it powerful. But also makes it higher risk.
The reason is oftentimes fairly simple, certain people have their material wealth and income threatened by such automation, and therefore it's bad (an intellectualized reason is created post-hoc)
I predict there will actually be a lot of work to be done on the "software engineering" side w.r.t. improving reliability and safety as you allude to, for handing off to less than sentient bots. Improved snapshot, commit, undo, quorum, functionalities, this sort of thing.
The idea that the AI should step into our programs without changing the programs whatsoever around the AI is a horseless carriage.
Co-pilot and AI has been shoved at the Microsoft Stack in my org for months. Most of the features were disabled or hopelessly bad. It’s cheaper for Microsoft to push this junk and claim they’re doing something, it’s going to improve their stock far more than not doing it, even though it’s basically useless currently.
Another issue is that my org disallows AI transcription bots. It’s a legit security risk if you have some random process recording confidential info because the person was too busy to attend the meeting and take notes themselves. Or possibly they just shirk off the meetings and have AI sit in.
Transcription is arguably one of the must useful enterprise AI tools avaliable. But i sure as heck wouldn't trust the cloud with it.
Still find the Copilot transcripts orders of magnitude worse than something like Wispr Flow and they tend to allucinate constantly and do not adapt to a company's context (that Copilot has access too...). I am talking about acronyms of products / teams, names of people (even when they are in the call), etc.
Can anyone familiar with the technical details shed light on why this is so.
Is it because of a globally trained model (as opposed to trained[tweaked on] on context specific data) or because of using different classes of models.
Neither copilot nor flow can natively handle audio to my understanding, so there is already a transcription model converting it to text that then GPT tries to summarise.
It could be they simply use a mediocre transcription model. Wispr is amazing but would hurt their pride to use a competitor.
But i feel it's more likley the experience is; GPT didn't actually improve on the raw transcription, just made it worse. Especially as any miss-transcipted words may trip it up and make it misunderstand while making the summary.
if i can choose between a potentially confused and misunderstood summary, and a badly spellchecked (flipped words) raw transcription, i would trust the latter.
Ye i didn't even think about advanced meetings summary bots. Just raw word for word transcription please. Wispr is pretty great.
It is notoriously unreliable
The worse part is to see it creep on developer stack at places where it should not be.
I am all good for nice completion on VS, or help decypher compiler errors, but lets do this AI push with some contention.
Also what I really deslike is the prompt interface, AI integrations have to feel natural transparent part of the workflow, not trying to put everything into a tiny chat window.
And while we're at it, can we please improve voice reckognition?
Hmmm the Brendas I know look a little different.
“There are two Brendas - their job is to make spreadsheets in the Finance department. Well, not quite - they add the months and categories to empty spreadsheets, then they ask the other departments to fill in their sales numbers every month so it can be presented to management.
“The two Brendas don’t seem to talk, otherwise they would realize that they’re both asking everyone for the same information, twice. And they’re so focused on their little spreadsheet worlds that neither sees enough of the bigger picture to say, ‘Wait… couldn’t we just automate this so we don’t need to do this song and dance every month? Then we wouldn’t need two people in different parts of the company compiling the same data manually.’
“But that’s not what Brenda was hired for. She’s a spreadsheet person, not a process fixer. She just makes the spreadsheets.”
We need fewer Brendas, and more people who can automate away the need for them.
With respect, you probably only see that bit of Finance, but doesn't mean that is all Brenda does.
At least half of the work in my senior Finance team involves meeting people in operations to find out what they are planning to do and to analyse the effects, and present them to decision makers to help them understand the consequences of decisions. For an AI to help, someone would have to trigger those conversations in the first place and ask the right questions.
The rest of the work involves tidying up all the exceptions that the automation failed on.
Meanwhile copilot in Excel can't even edit the sheet you are working on. If you say to it, 'give me a template for an expense claim' it will give you a sheet to download... probably with #REF written in where the answers should be.
I work in corporate finance and these issues are certainly present. However, they are almost always known and determined low priority to have a better process built. Finance processes are nearly always a non priority as a pure cost center/overhead there’s not many companies that want to invest in improving the situation, they’ll limp along with minimal investment even once big and profitable.
That said, every finance function is different and it may be unknown to them that you’re being asked for some data multiple times. If you’re enduring this process, I’m of the opinion you’re equally at fault. Suggest a solution that will be easier on you. As it’s possible they don’t even know it’s happening. In the case provided, email to all relevant finance people “Here’s a link to a shared workbook. I’ll drop the numbers here monthly, please save the link and get the data directly from that file. Thanks!” Problem solved. Until you don’t follow through which is what causes most finance people to be constantly asking for data/things. So be kind and also set yourself a monthly recurring reminder on your calendar and actually follow through.
And they've all been burned by enterprise finance products which were sold to solve exactly that problem.
Only different companies were all sold different enterprise finance products, but they need to communicate with each other (or themselves after mergers), so it all gets manually copied into Excel and emailed around each month.
I’ve just set the finance people up with read only access to our data source, and they now can poke through it themselves.
Also an acceptable solution. This is usually where the next step is have a BI type person just create a report for finance. Many reasons but what will end up is different people are filtering/retrieving the data differently causing inconsistencies.
But Usually finance is always preferring on demand access so the communication feedback loop of asking for stuff is not well liked so I’m sure they appreciate this middle step too.
There are many cases where there’s no easy way to give access to the data and a human in the loop is required. In that case, do the shared workbook thing I mentioned as a starting point at least. It may evolve from there.
And then you end up with a team of five people each tree times as expensive as Brenda, and what used to be an email now takes a sprint and has to go through ticket system.
That’s not what I had in mind.
Then you end up with a report that goes out automatically every month to leadership pulled directly from the Salesforce data, along with a real time dashboard anyone in the org can look at, broken down by team, vertical, and sales volume.
Why are people so attached to manual process?
Because when one exec ask: "Why is that?" the room goes silent.
It's not what you had in mind, but that's what you get. Because automation, integration, and AI are currently garbage -- Salesforce, Netsuite, doesn't matter. They don't do the magic that they promise. Because process is still very much a human problem, not a computational one.
> We need fewer Brendas...
We need more Brendas (those who excel goddesses come and kiss on the forehead) and need less people who are disrespectful of Brendas. The example in this post is someone giving more respect to AI than Brenda.
But then you need someone to maintain/look after that automation, and they'll be more expensive than two Brendas
And now if one of the Brendas wants to change their process slightly, add some more info, they can't just do it anymore. They have to have a three way discussion with the other Brenda, the automation guy and maybe a few managers. It will take months. So then its likely better for Brenda to just go back to using her spreadsheet again, and then you've got an automated process that no longer meets peoples needs and will be a faff to update.
"We need fewer Brendas, and more people who can automate away the need for them."
True... I have an on-staff data engineer for the purpose. But not all companies (especially in the SMB space) have that luxury.
People’s reaction to this varies based on the Brendas they’ve worked with. Some are given a specific task to do with their spreadsheets every week and have to just do as they are told even if they can see it’s not a good process. Others are secretly the brains of the company – the only one who really sees the whole picture. And a good number of Brendas are the company owner doing her best with the only tool she’s had the time to learn.
> But that’s not what Brenda was hired for.
Are you suggesting that Brenda should stay in her box?
No, I’m suggesting that she is ineffective exactly because she stays in her box.
She should replaced with someone who says, “this box doesn’t need to be here… there is a better way of doing things.”
NOT to be confused with the junior engineer who comes into a project and says it’s garbage and suggests we rewrite it from scratch in ${hotLanguage} because they saw it on a blog somewhere.
> She should replaced with someone who says, “this box doesn’t need to be here… there is a better way of doing things.”
The article is about this kind of Brenda.
It may not be what you meant to say, but it's exactly what you are saying where ${hotLanguage} is the latest automation platform or AI gimmick.
I’m not sure why you’re going down to the mat for hanging onto redundant people putting numbers in spreadsheets.
At large companies in particular, there are far too many people who simply turn their widgets - this was the entire point of the tech revolution.
Think about how many bookkeepers were needed before Excel. Someone could have made your exact same argument (but it’s just the latest gimmick!) about Excel 30 years ago. And yet, technology will make businesses more efficient whether people stand in its way or not.
Even at a small company of one or two, QuickBooks will reduce the amount of bookkeepers and accountants needed. TurboTax will further reduce that.
We will need fewer people in the future maintaining their Excel spreadsheets, and more people building the automation for those processes.
The change averse will always find reasons not to adapt - they will create their own obsolescence.
(inb4 but it’s way more expensive to pay developers to automate!)
That's a pretty specific example when there are a lot of good "spreadsheet people" out there who do a lot more than spreadsheets (maybe they had to write SQL queries or scripts to get those numbers), but commonly need to simplify things down to a spreadsheet or power point for upper management. I'm not saying you should have multiple people doing redundant work, but this style isn't entirely dumb.
What would this be replaced by? Some kind of large SAP like system that costs millions of dollars and requires a dozen IT staff to maintain?
Fair - I was creating a straw man mostly to make a point. The people I’m thinking aren’t running SQL queries or scripts, they’re merely collection points for data.
So one good BI developer who knows Tableau and Salesforce and Excel and SQL can replace those pure collection points with a better process, but they can also generate insight into the data because they have some business understanding from being close to the teams, which is what my hypothetical Brenda can’t do.
In my example, Brenda would be asking sales leaders to enter in their data instead of going into Salesforce herself because she doesn’t know that tool / side of the company well enough.
I was making the point that, contrary to the article, the Brendas I know aren’t touched by the Excel angels, they’re just maintaining spreadsheets that we probably shouldn’t have anyway.
I think that is a fair point too. The person that builds the Tableau dashboard could just send Brenda a screenshot once a month and that saves everyone time.
A screenshot of a Tableau dashboard is possibly the most dangerous form of internal data communication there is, because it entirely removes any chance of digging into that dashboard and figuring out what queries created it and spotting the incorrect assumptions they made along the way.
A hill I will die on is that business analytics need "view source" or they aren't worth the pixels they are rendered with.
You've lost the plot and are just trauma dumping.
Y'know why people don't automate their jobs? It's not a skill issue it's an incentives issue.
If you do your job, you get paid periodically. If you automate your job, you get paid once for automating it and then nothing, despite your automation constantly producing value for the company.
To fix this, we need to pay people continually for their past work as long as it keeps producing value.
it's a large human behavior question for me, the notion of work, value, economy, efficiency .. all muddied in there
- i used to work on small jobs younger, as a nerd, i could use software better than legacy employees, during the 3 months, i found their tools were scriptable so I did just that. I made 10x more with 2x less mental effort (I just "copilot" my script before it commits actual changes) all that for min wage. and i was happy like a puppy, being free to race as far as i want it to be, designing the script to fit exactly the needs of an operator.
- later i became a legit software engineer, i'm now paid a lot all things considered, to talk to the manager of legacy employees like the above, to produce some mediocre web app that will never match employees need because of all the middle layers and cost-pressure, which also means i'm tired because i'm not free to improve things and i have to obey the customer ...so for 6x more money you get a lot less (if you deliver, sometimes projects get canned before shipping)
Not always:
If you don’t automate it:
1a) your company keeps you hanging on forever maintaining the same widget until the end of time
OR
1b) more likely, someone realizes your job should be automated and lays you off at some point down the road
If you do automate it
2a) your company thanks you then fires you
OR
2b) you are now assigned to automate more stuff as you’ve proven that you are more valuable to the company than just maintaining your widget
————
2b is really the safest long term position for any employee, I think. It’s not always foolproof, as 2a can happen.
But I’d rather be in box 2 than box 1 any day of the week if we’re talking long term employment potential.
Yes, but notice what you are describing are all negative incentives.
When automation produces value for the company, the people automating it should capture a chunk of that value _as a matter of course_.
Even if you argue that you can then negotiate better compensation:
1) That is uncertain and delayed reward - and only if other people feel like it, it's not automatic.
2) The reward stops if you get fired or leave, despite the automation still producing value - you are also basically incentivized to build stuff that requires constant maintenance. Imagine you spend a man-month building the automation and then leave, it then requires a man-month of maintenance over the next 5 years. At the end of the 5 years, you should still be getting 50% of the reward.
My knee jerk reaction is to disagree, but on second thought, I’m open to hearing the argument.
What would that look like in practice?
Not every topic on HN needs a contrarian's hot take.
Well that wasn’t very nice.
Do you have anything to say other than, “I don’t need to hear what you have to say”?
That mirrors my experience as well. LLMs get instantly confused in real world scenarios in Excel and confidently hallucinate millions in errors
If you look at the demos for these it’s always something that is clean and abundantly available in training data. Like an income statement. Or a textbook example DCF. Or my personal fav „here is some data show me insights“. Real world excel use looks nothing like that.
I’m getting some utility out of them for some corporate tasks but zilch in excel space.
As somebody with non-existent experience with Excel, I could totally see myself getting a lot of value out of LLMs, if nothing else then simply for telling me what's possible, what functions and patterns exist at all etc.
This quote is pulled from a TikTok, I recommend watching the whole thing here: https://www.tiktok.com/@belligerentbarbies/video/75683800086...
(I pulled the quote by using yt-dlp to grab the MP4 and then running that through MacWhisper to generate a transcript.)
It's a little over two paragraphs. Seems like it would have been simpler just to... type it out?
Well if you do it once then yes, but if you automate this process it is different. E.g. I do this with YouTube videos, because watching 14 minutes video or reading 30 seconds summary is time saver. I still watch some videos fully, but many of them are not worth it.
So in summary I think it was just part of automated process (maybe) or it will become one in the future.
Why spend two minutes typing (and realistically longer than that, if I want to capture the exact transcript I would need to keep hitting pause and play and correcting myself) when I can spend ten seconds pasting a URL into my terminal and then dragging and dropping the resulting file onto the MacWhisper window?
I actually transcribed the whole TikTok which was about 50% longer than what I quoted, then edited it down to the best illustrative quote.
Where's the fun in that? :D
We choose to automate these things, not because they are easy, but because they are an interesting problem to solve
But then you would need a Brenda. Ai can write the automation script for you.
You... could have given the job to Brenda instead, unless the irony was the point?
The global economy isn't going to crash if I make a mistake with the transcript.
That's how it starts.
I can see that MacWhisper uses parakeet v2 as the model (although it allows choosing another model).
Is MacWhisper a $60 GUI for a Python script that just runs the model?
> Is MacWhisper a $60 GUI for a Python script that just runs the model?
Yes, a large genre of MacOS apps are "Native GUI wrappers around OSS scripts"
A lot of MacOS itself is this.
Which is incredibly value. The OSS script has zero value to someone who doesn't know it exists or doesn't understand how to run it.
There's also a free version that just uses Whisper. I recommend giving it a go, it's a very well constructed GUI wrapper. I use it multiple times a week, and I've run Whisper on my machine in other less convenient ways in the past.
This reminds me of a friend whose company ran a daily perl script that committed every financial transaction of the day to a database. Without the script, the company could literally make no money irrespectively of sales because this database was one piece in a complex system for payment processor interoperability.
The script ran in a machine located at the corner of a cubicle and only one employee had the admin password. Nobody but a handful of people knew of the machine's existence, certainly not anyone in middle management and above. The script could only be updated by an admin.
Copilot may be good, but sure as hell doesn't know that admin password.
If your mission critical process sits on some on-site box that no-one knows about, copilot being good or not is the least of your problems.
Everywhere I’ve ever worked has had that mission critical box.
At one of my jobs we had a server rack with UPS, etc, all the usual business. On the floor next to it was a dell desktop with a piece of paper on it that said “do not turn off”. It had our source control server in it, and the power button didn’t work. We did eventually move it to something more sensible but we had that for a long time
with only one person on earth being able to access it? so if that person is hit by a car everything goes down?
yeah. I mean, someone else would _eventually_ figure it out. There wasn't full disk encryption or anyhting on it, so if the guy got hit by a bus, and the machien turned off we probably would have just imaged the disk and got it running in a VM.
But we didn't (and nobody was hit by a bus)
Pretty much
An old colleague and friend used to print out a 30 page perl script he wrote to do almost exactly this in this scenario. A stapled copy could always be found on his dining room table.
Was the printed copy a backup system or casual reading?
Yes.
<3 inclusive or.
That sounds pretty bad. Not a great argument against AI: "Our employees have created such a bad mess that AI wont work because only they know how the mess they created works".
> "Our employees have created such a bad mess that AI wont work because only they know how the mess they created works".
This is an ironclad argument against fully replacing employees with AI.
Every single organization on Earth requires the people who were part of creating the current mess to be involved in keeping the organization functioning.
Yes you can improve the current mess. But it's still just a slightly better mess and you still need some of the people around who have been part of creating the new mess.
Just run a thought experiment: every employee in a corporation mysteriously disappear from the face of the Earth. If you bring in an equal number of equally talented people the next day to run it, but with no experience with the current processes of the corporation, how long will it take to get to the same capability of the previous employees?
That is the luxury of theory.
Yes, most situations are terrible compared to what would be if an expert was present to perfect it.
Except if there isn’t an expert, and there’s a normal person, how do they know the output is right ?
not sure I get your point?
This sort of gimmick is not going to help anyone keeping their job.
Sadly, nah. It works.
Excel is the most popular programming environment in the universe. It has optimized the five minute out of the box experience so well that grade schoolers can use it.
Other than that, it is pretty horrible for coding.
Excel is programming. Spreadsheets have been full of bugs for decades. How is Brenda any different from a developer? Why are people scared when the LLM might affect their dollar calculations, and less bothered when it affects their product?
+100 this. Programmers who work in Excel (and never even dream of calling themselves programmers) are still programmers.
Many fears of “AI mucking it up” could be mitigated with an ability to connect a workbook to a git repository. Not for data, but for VBA, cell formulas, and cell metadata. When you can encapsulate the changes a contributor (in this case co-pilot) makes into a commit, you can more easily understand what changes it/they made.
Brenda has been getting slower over the years -as we all have-, but soon the boss will learn that it was a small price to pay for knowing well how to keep such house of cards from collapsing.
And then the boss will make the decision to outsource her job, to a company that promises the use of AI to make finance better, and faster, and while Brenda is in the unemployment line, someone else thousands of miles away is celebrating a new job
We are setting AI deployed in the US, but actually Indians. They are not better, but they are cheaper. They are probably worse, but they are cheaper.
Excel is the “beast that drives the ENTIRE economy” and he’s worried about Brenda from the finance department losing her job because then her boss will get bad financial reports
I suppose the person that wrote that have not ideia Excel is just an app builder where you embed data together with code.
You know that we have excel because computers didn’t understand column names in databases and so data extraction needed to be made by humans. Humans then design those little apps in excel to massage the data.
Well, now an agent can read the boss saying gimme the sales from last month and the agent don’t need excel for that, because it can query the database itself, massage the data itself using python and present the data itself with html or PNGs.
So, we are in the process of automating Brenda AND excel away.
Also, finance departments are a very small part of excel users. Just think everywhere were people need small programs, excel is there.
In most cases where the excel spreadsheet is business critical, the spreadsheet _is_ the database. These companies aren’t using an erp system. They are directly entering inventory and sales numbers in the spreadsheet.
The post is clearly hyperbole obviously the sole issue being brought up isn't 'brenda losing her job may be bad for the company' you're being facetious.
Found the person who hasn’t seen excel in the real world.
Excel - whatever its origin story - is the actual Swiss Army knife of the tech world.
There’s easily a few billion people who use excel. There is a reason it survives.
20+% of the world population uses Excel? Any citations on that?
You missed this bit “.. and then the AI is gonna fuck it up real bad and he won't be able to recognize it because he doesn't understand because AI hallucinates.”
Brendas have fucked it up multiple times, by themselves or because their boss demanded
The underlying assumption is that Brenda generally does her job pretty well. Human errors exist but usually peers/managers (or the person who did it) can identify and correct them reliably.
If we have to compare LLM’s against people who are bad at their jobs in order to highlight their utility we’re going the wrong direction.
There are a lot of underlying assumptions: Brenda, the woman, is accurate and trustworthy and has mastered an accurate and trustworthy technology; the upper manager, the male, will introduce error by not understanding that the technology he brings to bear on the situation is hallucinatory. The woman is lower in status and pay than the male. The woman is necessary to the functioning of "the economy" and "capitalism," while the man threatens those. There are a lot of unsubtle political undertones on TikTok.
I was focused on a particular element but sure
Good luck with that.
Don't be like that. I work at a Fortune 500 and Brenda wants that co-pilot in Excel because it can help her achieve so much more. What is so much more you ask? Brenda and her C-Suits can define it but they know for sure Copilot in excel will lead to enormous time saving.
I've partied with Brenda on the weekends, and let me tell you... SOMETIMES Brenda hallucinates.
But never during work hours. The woman's a saint M-F.
It's verifier law.
Coding agents are useful and good and real products because when they screw up, things stop working almost always before they can do damage. Coding agents are flawed in ways that existing tools are good at catching, never mind the more obvious build and runtime errors.
Letting AI write your emails and create your P&L and cash flow projections doesn't have to run the gauntlet of tools that were created to stop flawed humans from creating bad code.
Nah, I've seen them screw in all sorts of ways that would fail in some conditions and not others. You're way too optimistic about this.
Fair. I've been using the coding agent in Android Studio Canary to do exploratory code in Dart/Flutter and using ATProto. Low stakes, but higher productivity is a significant benefit. It's a daily surprise how brilliant it is it's some things and how abysmal at others.
Don't worry, in Teams it bothers me just one time a day, and with the click of a button it's gone... For another whole day.
Using ai does not absolve you from the responsibility of doing it correctly. If you use ai, then you better have the skills to have done the job yourself, and so have the ability to check the AI did things correctly.
You can save time still, but perhaps not as much as you think, because you need to check the ai's work thoroughly.
10 billion dollars is probably going to be spent on automating excel, it’s going to happen
There needs to a financial equivalent to the Mythical Man Month.
There are plenty of things that play the role.
The problem is that people ignore them.
Let it all crash and burn
another cheese that will affect the outcome of major tournaments, not a good look for microsoft
its like the xlookup situation all over again, yet another move aimed at the casual audience, designed to bring in the party gamers and make the program an absolute mess competitively
"You know who's not hallucinating?
Brenda"
I don't know about that. There could be lots of interesting ways Brenda can (be convinced to) hallucinate.
I agree - having watched many people use Excel over the years, I'd say people often overestimate their skills. I see three categories of Excel users. First there are the people that are intimidated by it and stay away from any task involving Excel. Second are the people that know a little bit (a few basic formulas) and overestimate their skills because they only compare themselves to the first group. And the third group are the actual power users but know to keep that quiet because otherwise they become the "excel person" and have to fix every sheet that has issues.
I don't know if AI is going to make any of the above better or worse. I expect the only group to really use it will be that second group.
I have seen lots and lots of different uses for Excel in my line of work:
- password database - script to automatically rename jpeg files - game - grocery lists - Book keeping (and try and not get caught for fraud several years, because the monthly spending limit is $5000 and $4999 a month is below that...) - embed/collect lots of Word documents - coloring book - Minecraft processes - Resume database - ID scans
Simon posting tiktok quotes on his blog was not on my 2025 bingo card.
This isn't the first: https://simonwillison.net/2025/Aug/8/pearlmania500/ and https://simonwillison.net/2024/Jul/29/dealing-with-your-ai-o...
Also this fun diversion into Occlupanids: https://simonwillison.net/2024/Dec/8/holotypic-occlupanid-re...
A lot of people complain that the internet isn't as weird and funny as it used to be. The weird and funny stuff is all on TikTok!
The mismatch between what people not on it think TikTok is like and what it's actually like (once you get the algo tuned to your taste) is pretty crazy.
But then the "new user" experience is so horrific in terms of the tacky default content it serves you that I'm not surprised so many people don't get past it.
"the sweat from Brenda's brow is what allows us to do capitalism."
The CEO has been itching to fire this person and nuke her department forever. She hasn't gotten the hint with the low pay or long hours, but now Copilot creates exactly the opening the CEO has been looking for.
I'm more shocked that someone is using TikTok to speak things that actually make sense instead of mindless memes.
Excel doesn't need AI to ruin your work: https://www.science.org/content/article/one-five-genetics-pa...
At some point, a publicly-listed company will go bankrupt due to some catastrophic AI-induced fuck-up. This is a massive reputational risk for AI platforms, because ego-defensive behaviour guarantees that the people involved will make as much noise as they can about how it's all the AI's fault.
That will never happen, AI cannot be allowed to fail, so we'll be paying for that AI bail-out.
Do you really want these kind of companies to succeed? Let them burn tbh
I don't find comments along the lines of 'those people over there are bad' to be interesting, especially when I agree with them. My comment is about why it'll go wrong for them.
Make sure you’re not part of the kindling, then.
I see the inverse of that happening: every critical decision will incorporate AI somehow. If the decision was good, the leadership takes credit. If something terrible happens, blame it on the AI. I think it's the part no one is saying out loud. That AI may not do a damn useful thing, but it can be a free insurance policy or surrogate to throw under the bus when SHTF.
This works at most one time. If you show up to every board meeting and blame AI, you’re going to get fired.
This is true if you blame a bad vendor, or something you don’t even control like the weather. Your job is to deliver. If bad weather is the new norm, you better figure out how to build circus tents so you can do construction in the rain. If your AI call center is failing, you better hire 20 people to answer phones.
I'm actually not that worried about this, because again I would classify this as a problem that already exists. There are already idiots in senior management who pass off bullshit and screw things up. There are natural mechanisms to cope with this, primarily in business reputation - if you're one of those idiots who does this people very quickly start just discounting what you're saying, they might not know how you're wrong, but they learn very quickly to discount what you're saying because they know you can't be trusted to self-check.
I'm not saying that this can't happen and it's not bad. Take a look at nudge theory - the UK government created an entire department and spent enormous amounts of time and money on what they thought was a free lunch - that they could just "nudge" people into doing the things they wanted. So rather than actually solving difficult problems the uk government embarked on decades of pseudo-intellectual self agrandizement. The entire basis of that decades long debacle was based on bullshit data and fake studies. We didn't need AI to fuck it up, we managed it perfectly well by ourselves.
Nudge theory isn't useless, it's just not anything like as powerful as money or regulation.
It was taken up by the UK government at that time because the government was, unusually, a coalition of two quite different parties, and thus found it hard to agree to actually use the normal levers of power.
This NY Times opinion piece by Loewenstein and Ubel makes some good arguments along these lines: https://web.archive.org/web/20250906130827/https://www.nytim...
It looks like the OP is thinking that AI causing errors in spreadsheets is going to make the whole economy collapse.
When tools break, people stop using them before they sink the ship down. If AI is that terrible at spreadsheet, people will just revert to Brenda.
And it's not like spreadsheets have no errors right now.
This is transparent nonsense. People are very very happy to introduce errors into excel spreadsheets without any help from AI.
Financial statements are correct because of auditors who check the numbers.
If you have a good audit process then errors get detected even if AI helped introduce them. If you aren't doing a good audit then I suspect nobody cares whether your financial statement is correct (anyone who did would insist on an audit).
It’s like calling out the county to inspect the home you built but when they arrive it’s a bouncy castle.
> If you have a good audit process then errors get detected even if AI helped introduce them. If you aren't doing a good audit then I suspect nobody cares whether your financial statement is correct (anyone who did would insist on an audit).
Volume matters. The single largest problem I run into: AI can generate slop faster than anyone can evaluate it.
Brendas hallucinate all the time.
320784788
Nay-sayers need to decide whether they fear AI because AI is dumb and will fuckup or because AI is smart and will take over.
Silly calling Simon a nay-sayer.
Are you a fanatic that thinks anyone saying that there are any limitations to current models = nay-sayer?
Like if someone says they wouldnt wanna get a heart transplant operation done purely by GPT5, are they a nay-sayer or is that just reflecting reality?
Simon willson is definitely not a nay sayer.
Both are valid concerns, no need to decide. Take the USA: They are currently lead by a patently dumb president who fucks up the global economy, and at the same time they are powerful enough to do so!
For a more serious example, consider the Paperclip Problem[0] for a very smart system that destroys the world due to very dumb behaviour.
[0]: https://cepr.org/voxeu/columns/ai-and-paperclip-problem
The paperclip problem is a bit hand-wavey about intelligence. It is taken as a given than unlimited intelligence would automatically win presumably because it could figure out how to do literally anything.
But let's consider real life intelligence:
- Our super geniuses do not take over the world. It is the generationally wealthy who do.
- Super geniuses also have a tendency to be terribly neurotic, if not downright mentally ill. They can have trouble functioning in society.
- There is no thought here about different kinds of intelligence and the roles they play. It is assumed there is only one kind, and AI will have it in the extreme.
To be clear, I don't think the paperclip scenario is a realistic one. The point was that it's fairly easy to conceive an AI system that's simultaneously extremely savant and therefore dangerous in a single domain, yet entirely incapable of grasping the consequences or wider implications of its actions.
None of us knows what an actual, artificial intelligence really looks like. I find it hard to draw conclusions from observing human super geniuses, when their minds may have next to nothing in common with the AI. Entirely different constraints might apply to them—or none at all.
Having said all that, I'm pretty sceptical of an AI takeover doomsday scenario, especially if we're talking about LLMs. They may turn out to be good text generators, but not the road to AGI. But it's very hard to make accurate predictions in either direction.
Everything is now about verification.
AI may be able to spit out ann excel sheet or formula - But if it can’t be verified, so what ?
And here’s my analogy to think about the debugging of an excel sheet - you can debug most corporate excel sheets with a calculator.
But when AI is spitting out excel sheets - when the program is making smaller programs - what is the calculator in this analogy ?
Are we going to be using excel sheets to debug the output of AI?
I think this is the inherent limiter to the uptake of AI.
There’s only so much intellectual / experiential / training depth present.
And now we’re going to be training even fewer people.
At the end of the day I /customers need something to work.
But failing that - I will settle for someone to blame.
Brenda handles a lot of blame. Is OpenAI going to step into that gap ?