LLMs, kind of like Bill Bryson's books, are great at presenting "information" that seems completely plausible, authoritative, and convincing to the reader. But when you actually do know the truth about a subject, you realize how completely full of crap they too often are. And somehow after being given a patently counterfactual response to one query, we just blindly continue to take their responses to other queries as having value.
At the moment, I find them to be the perfect tool to get started with learning about something. I don't expect it to tell me everything I need to know or to even be right, but if I ask ChatGPT or another LLM a question about a subject I'm not familiar with then it will at least use a bunch of terminology that I didn't have in my vocabulary before starting.
For example, I just bought a 1990 Miata and I want to install a couple of rocker switches in the dash to individually control the pop-up headlights. I have enough circuits knowledge to safely change outlets and light switches, but I didn't know about relays. I asked ChatGPT how to add these switches and it immediately mentioned buying DPDT switches and tying in the OEM relay into a SPDT relay. It may have gotten the actual circuit diagram completely wrong, but now I know exactly what to read up on.
Now let me ask you the more fundamental question.. did this do you any better than if you had searched a youtube video or some other source? Would this be video from 2016 be relevant? This may not be the right video but my approach for DIY in the last 10-20 years was to hit youtube up.
https://www.youtube.com/watch?v=77q9KtjnNTU
I'm trying to gauge whether LLMs are truly expanding our capabilities in a fundamental way or are really just another way to search for answers without going to google or a library.
For obscure things, it's often very hard to find videos like that, and the videos vary greatly in quality. ChatGPT helped me fix my washing machine and my dryer yesterday with perfect advice, walking me through every step. Those are both projects I would've made a half assed attempt at and then thrown my hands up and called someone to do in the past.
I wonder if that can be attributed to search engines and search fields on various websites being intentionally worsened in order to push specific content and ads.
Google search and Youtube search used to almost always get you what you were looking for. Now you have to fight with it to maybe get what you are looking for because of all the sponsored ads.
Completely not related to any LLM usage, but welcome to the world of NA Miata ownership! I think you'll find that with just general maintenance it'll treat you very well -- My '91 is the most reliable car in the drive, and by far the most whimsical. (I just got back from a Miata errand trip in the pouring rain -- Why did I drive the Miata? Winter is very soon, and it gets put away for ~3 ish months -- so at this time of year, every possible trip is a miata trip!)
I don't quite disagree but this comparison is typically unfair, because when you really know about a subject you tend to ask way more difficult questions than about other subjects, so of course the LLMs are gonna struggle more. If you ask really basic questions they will regurgitate well known bachelor-level knowledge and look good. What do I know about biology anyway? about silos for grain storage? any passable answer is enough to wow me on those topics. But on the topics I really know about, I never ask the basics.
Perhaps a similar issue is at root in both scenarios. I know this is highly simplified but...
Bill writes a book with general information about the short history of everything. The book is meant to appeal to a broader audience so the material must sometimes sacrifice accuracy in the interest of moving the reader along to other new ideas.
That's fine for a general understanding of a subject, but as the OP stated, when an expert looks closer, they see holes in the book. They're upset that Bill hasn't updated his book.
An LLM is trained on data. We now realize early LLMs were trained on publically available data from sites like Reddit, stack overflow, and probably hacker news. Any data in these sites became boilerplate to current LLM responses.
With widespread adoption of LLMs, the quality of sites like stack overflow and reddit has been reduced due to lack of trust and engagement from real persons and everyone realizing where all those network traffic costs were going.
Companies shift LLM training from public data to feedback from LLM users. Any 'cutting edge' use cases or cases poorly documented are highly hallucinated (or contain data that mixes version specific solutions).
Chaos becomes the norm and whoever has the best trained LLM service with the highest quality feedback will eventually swallow up all other services.
> like Bill Bryson's books, are great at presenting "information" that seems completely plausible, authoritative, and convincing to the reader. But when you actually do know the truth about a subject, you realize how completely full of crap
Wow, I have a couple Bill Bryson books on my reading list, can you share some examples of that?
Honestly I wouldn't worry about it. He's a wonderful writer, the problem is that he doesn't let reality get in the way of a good story. Just classify them with the rest of the fiction-non-fiction books and enjoy the journey. If you ever find yourself asking "wow is that true?" then it probably isn't.
I'm not sure this is something I really worry about. Whenever I use an LLM I feel dumber, not smarter; there's a sensation of relying on a crutch instead of having done the due diligence of learning something myself. I'm less confident in the knowledge and less likely to present it as such. Is anyone really cocksure on the basis of LLM received knowledge?
> As I ChatGPT user I notice that I’m often left with a sense of certainty.
They have almost the opposite effect on me.
Even with knowledge from books or articles I've learned to multi-source and question things, and my mind treats the LLMs as a less reliable averaging of sources.
> Is anyone really cocksure on the basis of LLM received knowledge?
I work for a company with an open source product and the number of support requests we get from people who ask the chatbot to do their config and then end up with something nonfunctioning is quite significant. Goes up to users complaining our api is down because the chatbot hallucinated the endpoint.
LLMs do love to make up endpoints and parameters, but I have found that ones with web access are pretty good at copy/pasting configs if they can find them, so it might be worth a few minutes of exploring what people are actually finding that's causing it to make up an endpoint. I have not (yet!) seen an instance where making something easier for LLMs to parse didn't also help human comprehension.
I remember back when I was in secondary school, something commonly heard was
"Don't just trust wikipedia, check it's resources, because it's crowdsourced and can be wrong".
Now, almost 2 decades later, I rarely hear this stance and I see people relying on wikipedia as an authoritative source of truth. i.e, linking to wikipedia instead of the underlying sources.
In the same sense, I can see that "Don't trust LLMs" will slowly fade away and people will blindly trust them.
> "Don't just trust wikipedia, check it's resources, because it's crowdsourced and can be wrong"
This comes from decades of teachers misremembering what the rule was, and eventually it morphed into the Wikipedia specific form we see today - the actual rule is that you cannot cite an encyclopaedia in an academic paper. full stop.
Wikipedia is an encyclopaedia and therefore should not be cited.
Wikipedia is the only encyclopaedia most people have used in the last 20 years, therefore Wikipedia = encyclopaedia in most people's minds.
There's nothing wrong with using an encyclopaedia for learning or introducing yourself to a topic (in fact this is what teachers told students to do). And there's nothing specifically wrong about Wikipedia either.
I remember all of our encyclopedias being decades out of date growing up. My parents bought a set of Encyclopedia Brittanica in 1976 or something like that, so by the time I was reading the Encyclopedia for research on papers in the late 90s and early 00s, it was without a doubt less factual than even the earliest incarnation of Wikipedia was.
Either way, you are correct, we weren't allowed to cite any encyclopedia, but they were meant to be jumping off points for papers. After Wikipedia launched when I was in 9th grade, we weren't allowed to even look at it (blocked from school computers).
The original rule when I was a lad (when wikipedia was a baby) was, "don't trust stuff on the internet, especially Wikipedia where people can change it at will."
Today they might have better trust for Wikipedia-- and I know I use it as a source of truth for a lot of things-- but back in my day teachers were of the opinion that it couldn't be trusted. This was for like middle and high school, not college or university, so we would cite encyclopedias and that sort of thing, since we weren't reading cutting edge papers back then (maybe today kids read them, who knows).
Edit: Also, I think the GP comment was proven correct by all of the replies claiming that Wikipedia was never controversial because it was very clear to everyone my age when Wikipedia was created/founded that teachers didn't trust the internet nor Wikipedia at the time.
There was a period of time where Wikipedia was more scrutinized than print encyclopedias because people did not understand the power of having 1000s of experts and the occasional non-experts editing an entry for free instead of underpaying one sudo-expert. They couldn't comprehend how an open source encyclopedia would even work or trust that humans could effectively collaborate on the task. They imagined that 1000s of self-interested chaos monkeys would spend all of their energy destroying what 2-3 hard working people has spent hours creating instead of the inverse. Humans are very pessimistic about other humans. In my experience when humans are given the choice to cooperate or fight, most choose to cooperate.
All of that said, I trust Wikipedia more than I trust any LLMs but don't rely on either as a final source for understanding complex topics.
> the power of having 1000s of experts and the occasional non-experts editing an entry
When Wikipedia was founded, it was much easier to change articles without notice. There may not have been 1000s of experts at the time, like there are today. There's also other things that Wikipedia does to ensure articles are accurate today that they may not have done or been able to do decades ago.
I am not making a judgment of Wikipedia, I use it quite a bit, I am just stating that it wasn't trusted when it first came out specifically because it could be changed by anyone. No one understood it then, but today I think people understand that it's probably as trustworthy or moreso than a traditional encyclopedia is/was.
> In my experience when humans are given the choice to cooperate or fight, most choose to cooperate.
Personally, my opinion of human nature falls somewhere in the middle of those two extremes.
I think when humans are given the choice to cooperate or fight, most choose to order a pizza.
A content creator I used to follow was fond of saying "Chill out, America isn't headed towards another civil war. We're way too fat and lazy for that."
Sure but I hope you get my point. Fighting takes effort, cooperation takes effort. Most people have other things to worry about and don't care about whatever it is you're fighting or cooperating over. People aren't motivated enough to try and sabotage the wikipedia articles of others. Even if they could automate it. There's just nothing in it for them.
> "They imagined that 1000s of self-interested chaos monkeys would spend all of their energy destroying what 2-3 hard working people has spent hours creating instead of the inverse."
Isn't that exactly what happens on any controversial Wikipedia page?
There's not that many controversial topics at any given time. One of Wikipedia's solutions was to lock pages until a controversy subsided. Perma-controversy has been managed in other ways, like avoiding the statement of opinion as fact, the use of clear and uncontroversial language, using discussion pages to hash out acceptable and unacceptable content, competent moderators... Rage burns itself and people get bored with vandalism.
More colloquially, people would say that Wikipedia could not be trusted because "anyone can edit the pages or write whatever they want."
Of course that's demonstrative of the genesis fallacy. Anyone can write or publish a book, too. So it always comes down to "how can you trust information?" That's where individual responsibility to think critically comes in. There's not really anything you can do about the fact that a lot of people will choose to not think.
You gave some good context I missed - The (even) more technical (read: pretentious) explanation is that it's a tertiary source. As a general rule of thumb secondary sources are preferred over primary sources, but both are acceptable in the right academic context.
I do understand the "latest version" argument, and it is a weakness, but it's also a double edged sword - it means Wikipedia can also be more up-to-date than (almost) any other source for the information. Thats why I say there's "nothing specifically wrong about Wikipedia either" it can be held in similar regard to other tertiary sources and encyclopaedias - with all the problems that come with those.
Encyclopedias are tertiary sources, compilations of information generated by others. They are neither sources of first hand information (primary sources) nor original analysis (secondary sources). You can't cite encyclopedias because there's nothing to cite. The encyclopedia was not the first place the claim was made, even if it was the first place you happened to read it. You don't attribute a Wayne Gretsky quote to Michael Scott no matter how clearly he told you Wayne Gretsky said it.
Well also years of Wikipedia proving to be more accurate than anything in print and rarely and not for very long misrepresenting source materials. For LLMs to get that same respect they would have to pull off all of the same reassuring qualities.
> Now, almost 2 decades later, I rarely hear this stance and I see people relying on wikipedia as an authoritative source of truth. i.e, linking to wikipedia instead of the underlying sources.
That's a different scenario. You shouldn't _cite wikipedia in a paper_ (instead you should generally use its sources), but it's perfectly fine in most circumstances to link it in the course of an internet argument or whatever.
There’s also the fact that both Wikipedia and LLMs are non-stationary. The quality of wikipedia has grown immensely since its inception and LLMs will get more accurate (if not explicitly “smarter”)
I think you would need a complicated set of metrics to claim something like "improved" that wasn't caveated to death. An immediate conflict being total number of articles vs impressions of articles labeled with POV biases. If both go up has the site improved?
I find I trust Wikipedia less these days, though still more than LLM output.
I can't think of a better accidental metric than that!
I'll go ahead and speculate that the number of incoherent sentences per article has gone down substantially over the last decade, probably due to the relevant tooling getting better over the same period.
> I can see that "Don't trust LLMs" will slowly fade away and people will blindly trust them.
That's already happening. I don't even think we had a very long "Don't trust LLMs" phase, if we did it was very short.
The "normies" already trust whatever they spit out. At leadership meetings at my work, if I say anything that goes against the marketing hype for LLMs, such as talking about "Don't trust LLMs", it's met with eye rolls and I'm not forward thinking enough, blah blah.
Management-types have 100% bought into the hype and are increasingly more difficult to convince otherwise.
I can’t speak to your specific experience, but I do some of this kind of eye-rolling when people bring short term limitations on LLMs into long term strategy.
I’m reminded of when people at work assured me the internet was never going to impact media consumption because 28.8kbps is not nearly enough for video.
A big reason for this is that Wikipedia's source is often a book or a journal article that is either offline or behind an academic paywall. Checking the source is effectively impossible without visiting a college campus's library. The likelihood that the cited information is wrongly summarizing the contents is low enough and the cost is high enough that doing so regularly would be irrational.
A bigger problem in this respect with Wikipedia is it often cites secondary sources hidden behind an academic fire/paywall. It very often cites review articles and some of these aren't necessary entirely accurate.
I don't think the cases are really the same. With Wikipedia people have learned to trust that the probability of the information being at least reasonably good is pretty high because there's an editing crucible around it and the ability to correct misinformation surgically. No one can hotpatch a LLM in 5mins.
The best LLM powered solutions are as little LLM and as much conventional search engine / semantic database lookups and handcrafted coaxing as possible. But even then, the conversational interface is nice and lets you do less handcrafting in the NLP department.
Using Perplexity or Claude in "please source your answer" mode is much more like a conventional search engine than looking up data embedded in 5 trillion (or whatever) parameters.
Except when they glaringly get things wrong like "character X on show Y said catchphrase Z", and two queries produce two different values of X, one right, one wrong. The more I use gemini summaries for things I know a bit about, the worse my opinion of them..
It wasn't just Wikipedia, which was a relatively recent addition to the web, everything online was a 'load of rubbish'.
In turn-of-the-century boomer world, reality was what you saw on TV. If you saw something with your own eyes that contradicted the world view presented by the media, then one's eyes were to be disbelieved. The only reputable sources of news were the mainstream media outlets. The only credible history books would be those with reviews from the mainstream media, with anything else just being the 'ramblings of a nutter'.
In short, we built a beautiful post-truth world and now we are set on outsourcing our critical thinking to LLMs.
This captures my experience quite well. I can "get a lot more done," but it's not really me doing the things, and I feel like a bit of a fraud. And as the workday and the workweek roll on, I find myself needing to force myself to look things up and experiment rather than just asking the LLM. It's quite clear that for most people LLMs will make the more dependent. People with better discipline I think will really benefit in big ways, and you'll see this become a new luxury belief; the disciplined geniuses around us will genuinely be perplexed why people are saying that LLMs have made them less capable, much in the same way they wonder why people can't just limit their drug use recreationally.
It's appropriate to think this way with LLM output because LLMs are still terrible some significant portion of the time. If you don't actually know what you're doing, you have no way to distinguish between their output being correct or their output being able to pass the tests you can think of.
As a software developer, your job is to understand code and business constraints so you can solve problems the way most appropriate for the situation. If you aren't actually keeping up with those constraints as they change through time, you're not doing your job. And yeah, that's a kind of fraud. Maybe it's more on yourself than your employer most of the time, but... It's your job. If you don't want to do it, maybe it's more respectful of your own time, energy, and humanity to move on.
I've been thinking about that comparison as well. A common fantasy is that civilization will collapse and the guy who knows how to hunt and start a fire will really excel. In practice, this never happens and he's sort of left behind unless he also has other skills relevant to the modern world.
And, for instance, I have barely any knowledge of how my computer works, but it's a tool I use to do my job. (and to have fun at home.)
Why are these different than using LLMs? I think at least for me the distinction is whether or not something enables me to perform a task, or whether it's just doing the task for me. If I had to write my own OS and word processor just to write a letter, it'd never happen. The fact that the computer does this for me facilitates my task. I could write the letter by hand, but doing it in a word processor is way better. Especially if I want to print multiple copies of the letter.
But for LLMs, my task might be something like "setting up apache is easy, but I've never done it so just tell me how do it so I don't fumble through learning and make it take way longer." The task was setting up Apache. The task was assigned to me, but I didn't really do it. There wasn't necessarily some higher level task that I merely needed Apache for. Apache was the whole task! And I didn't do it!
Now, this will not be the case for all LLM-enabled tasks, but I think this distinction speaks to my experience. In the previous word processor example, the LLM would just write my document for me. It doesn't allow me to write my document more efficiently. It's efficient, only in the sense that I no longer need to actually do it myself, except for maybe to act as an editor. (and most people don't even do much of that work) My skill in writing either atrophies or never fully develops since I don't actually need to spend any time doing it or thinking about it.
In a perfect world, I use self-discipline to have the LLM show me how to set up Apache, then take notes, and then research, and then set it up manually in subsequent runs; I'd have benefited from learning the task much more quickly than if I'd done it alone, but also used my self-discipline to make sure I actually really learned something and developed expertise as well. My argument is that most people will not succeed in doing this, and will just let the LLM think for them.
> But for LLMs, my task might be something like "setting up apache is easy, but I've never done it so just tell me how do it so I don't fumble through learning and make it take way longer." The task was setting up Apache. The task was assigned to me, but I didn't really do it. There wasn't necessarily some higher level task that I merely needed Apache for. Apache was the whole task! And I didn't do it!
To play devil's advocate: Setting up Apache was your task. A) Either it was a one-off that you'll never have to do again, in which case it wasn't very important that you learn the process inside and out, or b) it is a task you'll have to do again (and again), and having the LLM walk you through the setup the first time acts as training wheels (unless you just lazily copy & paste and let it become a crutch).
I frequently have the LLM walk me through an unfamiliar task and, depending on several factors such as whether I expect to have to do it again soon, the urgency of the task, and my interest and/or energy at the moment, I will ask the LLM follow-up questions, challenge it on far-fetched claims, investigate alternative techniques, etc. Execute one command at a time, once you've understood what it's meant to do, what the program you're running does, how its parameters change what it does, and so on, and let the LLM help you get the picture.
The alternative is to try to piece together a complete picture of the process from official documentation like tutorials & user manuals, disparate bits of information in search results, possibly wrong and/or incomplete information from Q&A forums, and muddle through lots of trial and error. Time-consuming, labor-intensive, and much less efficient at giving your a broad-strokes idea of how the whole thing works.
I much prefer the back-and-forth with the LLM and think it gives me a better understanding of the big picture than the slow and frustrating muddling approach.
You don't just set up Apache to have run Apache? You set it up to serve web content! It is middleware, it is not in of itself useful?
Isn't setting up Apache robbing yourself of the opportunity to learn about writing your own HTTP server? In C? And what a bad idea that is?
The LLM helping you configure a web server is no different than the web server helping you serve HTTP instead of implementing a web server from scratch. You've just seemingly? arbitrarily decided your preferred abstraction layer is where "real work" happens.
Okay, maybe LLMs might disappear tomorrow and so for some reason the particular skill of configuring Apache will become useful again, maybe! But I'm already using brainpower to memorize phone numbers in case my smartphone contacts disappear, so maybe I won't have room for those Apache configs ;-)
I remember seeing a tweet awhile back that talked about how modernity separated work from physicality, and now you have to do exercise on purpose. I think the Internet plus car-driven societies had done something similar to being social, and LLMs are doing something to both thinking, as well as the kind of virtue that enables one to master a craft.
So, while it's an imperfect answer that I haven't really nailed down yet, maybe the answer is just to realize this and make sure we're doing hard things on purpose sometimes. This stuff has enabled free time, we just can't use it to doomscroll.
>Internet plus car-driven societies had done something similar to being social,
That's an interesting take on the loneliness crisis that I had not considered. I think you're really onto something. Thanks for sharing. I don't want to dive into this topic too much since it's political and really off-topic for the thread, but thank you for suggesting this.
Radio and especially TV also had large social effects. People used to play cards, instruments, and other social things before TV. Then household TV watching maxxed at 9 hours/day in 2010 (5hr/d in 1950). (Would like to know the per person watching and these are from Nielsen who would want higher numbers) [1].
Cars help people be social in my world. I would say that riding on a train in your own bubble with strangers is not a social activity, but others would disagree.
I don't have any great answer. But when I think about this for myself, I realize there is are different kinds of abstraction that qualitatively change the nature of the work.
I don't want my software developer's experience to turn into a real estate developer's experience. I don't want to go from being a technical knowledge worker to a financier or contract negotiator. I've realized I was never in it for the outcome. I was in it for the exploration and puzzles.
Similarly, I don't want to become a "Hollywood producer" cliche. This caricature was a common joke earlier in my tech career in Southern California. We detested the idea of becoming a "tech" person acting like a Steve Martin parody of a Hollywood wheeler-dealer. Someone sitting in a cafe, pitching ideas that was nothing more than a reference to an existing work with an added gimmick or casting change.
To me, that caricature combines two negative aspects. One is the heavily derivative and cynical nature. The other is the stratospheric abstraction level, where folks at this level see themselves as visionaries rather than just patrons of someone else doing all the creative work.
I don't want to be a patron of an LLM or other black box.
I would say that with a computer you're using a tool to take care of mundane details and speed up the mechanics of tasks in your life. Such as writing a document, or playing a game. I can't think of a way I would be seriously disadvantaged by not having the ability to hand-write an essay or have games I can readily play without a computer. Computers are more like tools in the way a hammer is a tool. I don't mind being totally dependent on a computer for those tasks in the same way I don't mind that I need a hammer anytime I want to drive a nail.
But for many people, LLMs replace critical thinking. They offer the allure of outsourcing planning, research, and generating ideas. These skills seem more fundamental to me, and I would say there's definitely a loss somehow of one's humanity if you let those things atrophy to the point you become utterly dependent on LLMs.
I mostly agree with this. LLMs are just another tool, and we've learned how to use and adapted to using many other tools throughout our history just fine.
With the caveat of for our field in particular, it's one of the few that require continuous learning and adaptation, so tech workers in a way are better predisposed to this line of thinking and tool adoption without some of the potential harmful side effects.
To pick on spell check, it has been showing that we can develop a dependency on it and thereby losing our own ability to spell and reason about language. But, is that a bad thing? I don't know.
What I do know is humans have been outsourcing our thinking for a long time. LLMs are another evolution in that process, just another way to push off cognitive load onto a tool like we've done with stone tablets, books, paper notes, digital notes, google, etc.
> Is anyone really cocksure on the basis of LLM received knowledge?
Some people certainly seem to be. You see this a lot on webforums; someone spews a lot of confident superficially plausible-looking nonsense, then when someone points out that it is nonsense, they say they got it from a magic robot.
I think this is particularly common for non-tech people, who are more likely to believe that the magic robots are actually intelligent.
Most of the time it feels like a crutch to me. There has been a few moments where it unlocked deep motivation (by having a feel for the size of a solution based on chatgpt output) and one time a research project where any crazy idea I threw, it would imagine what it would entail in terms of semantics and then I was inspired even more.
The jury is Still out on what value these things will bring
If you feel dumber, it’s because you’re using the LLM to do raw work instead of using it for research. It should be a google/stackoverflow replacement, not a really powerful intellisense. You should feel no dumber than using google to investigate questions.
I don't think this is entirely accurate. If you look at this: https://www.media.mit.edu/publications/your-brain-on-chatgpt..., it shows that search engines do engage your brain _more_ than LLM usage. So you'll remember more through search engine use (and crawling the web 'manually') than by just prompting a chatbot.
I find that it is terrible for research, and hallucinates 25% to 90% of its references.
If you tell it to find something and give it a detailed description of what you're looking for, it will pretend like it has verified that that thing exists, and give you a bulletpoint lecture about why it is such an effective and interesting thing that 1) you didn't ask for, and 2) is really it parroting your description back to you with embellishments.
I thought I was going to be able to use LLMs primarily for research, because I have read an enormous number of things (books, papers) in my life, and I can't necessarily find them again when they would be useful. Trying to track them down through LLMs is rarely successful and always agonizing, like pulling teeth that are constantly lying to you. A surprising outcome is that I often get so frustrated by the LLM and so detailed in how I'm complaining about its stupid responses that I remind myself of something that allows me to find the reference on my own.
I have to suspect that people who find it useful for research are researching things that are easily discoverable through many other means. Those are not the things that are interesting. I totally find it useful to find something in software docs that I'm too lazy to look up myself, but it's literally saving me 10 minutes.
I used to get this same feeling during lectures in uni. Often the information was presented well and, along with some clear examples, everything seemed to make perfect sense.
It wasn't until working through practice problems later, on my own, did it become clear how much detail I was missing.
Now consider what's happening to the learning process of the (rather large) subset of current college students choosing to replace that struggle for detailed understanding with LLM queries.
I've heard experts comment on this from the other side, that they'll give a quick layperson's soundbite about their subject of expertise that doesn't defensibly lay out all the possible exceptions and edge cases and weirdness for reasons of time and audience interest and then they'll be inundated with comments calling them a liar and accused of falsifying things or not actually understanding the subject.
Detail missing and being a confidently wrong are two different things though ?
Edit: Claude told me the other day told me my entire building might have to be demolished due to a slightly bow in my newly poured stem wall, I uploaded a photo etc and it was liked, “yes this is a serious structural issue blah blah blah” , the inspector came to look at it and literally laughed that I was worried about it.
> LLMs should not be seen as knowledge engines but as confidence engines.
The thing I like best about LLM is when I ask question about some technical problem, and it tells that it is a KNOWN problem. It thus gives me confiidence that I don't need to spend time) to look for solution where there is no good soloution. Just go around it somehow. It let's me know I'm not the only person with this problem. And that way it gives me confidence that I'm not stupid, the problem is a real problem.
As an example I was working with WebStorm and tried to find a way to make the Threads-tab the default tab shown when debugger opens. AI told me there is no way it knows about. Good, problem solved, solved by finding out there is no solution.
This is the kind of stuff AI lies about all the time. I can get it to tell me "That is some good insight, and is a known issue..." with things I make up out of thin air.
There’s a gap that LLMs are trying to fill in such cases, which is that there’s too much information that we can possibly hope to make sense of in a lifetime. Just as it’s possible to compute something incorrectly with a calculator, you can definitely be led astray by an LLM, which is why I am surprised that people think these models are good enough to replace humans at work. The only thing which makes sense is to both raise the bar for publishing, and to only take published works seriously. If something isn’t published, then authors should provide code to demonstrate the effect they’re describing.
> which is why I am surprised that people think these models are good enough to replace humans at work.
There are a lot of office jobs that I'd fit into the category of "bullshit jobs." They may serve some purpose in the huge bureaucracy of enterprises but the day to day ultimately boils doing to managing someone's calendar and sending emails.
Quite a few people at my work have now started using Copilot for their emails. It's obviously AI (at least to me), and yet, the content and formatting are an improvement over what they were sending before.
So much of the marketing hype on LLMs is about how it'll replace all the engineering work (the MBA's wet dream, to replace all the expensive labor). In reality, I think its more capable at replacing non-tech labor and middle management.
An LLM can send out an email to the team and analyze a project check-in faster, and better, than some overpaid middle manager can. I have no doubts an LLM could probably serve the role of a project management office, or a business analyst.
Sure, there should still be a human in the loop for now, but you need far, far less humans in those roles than previously.
I go back and forth on the idea that some jobs are bullshit, maybe I haven’t been exposed to enough industries or work places. Every place I worked definitely didn’t have bullshit jobs to hand out as adult daycare, but I can see how some places can become bloated because an over ambitious middle manager wants to say they manage X number of people on their resume. So there are bullshit jobs in that there are people who aren’t being utilized correctly, so in that case I’d say they’re no bullshit jobs, just bullshit leadership or managers.
Yeah, I agree with that, and a more accurate description than mine.
The people in those roles are being mismanaged/misutilized rather than the job itself being bullshit.
I've seen the bloating first hand though, and you're right, that's usually what leads to those jobs. Some department over hires to pad the resume of some middle manager, and now you have a team that's severely over staffed to the point each individual contributor has maybe 2 to 3 hours of actual work to do in a day.
I've seen this! Following some Math and Physics subreddits it's a regular occurrence for a new submitter to come in and post some 40 pages of incomprehensible bullshit and claim that they developed a unifying theory of physics with ChatGPT and that ChatGPT has told them it's a breakthrough in the field. Of course that used to happen regularly before LLMs but not nearly as often.
Including the former CEO of Uber. I’m somewhat curious what these people even think they’ve discovered, what outstanding problem they think they’ve actually solved… but I’m not curious enough to actually dig through their slop.
I ascribe the effect of LLMs as similar to reading the newspaper, when I learn about something I have no knowledge base in I come away feeling like I learned a lot. When I interact with a newspaper or LLM in an area where I have real domain expertise I realize they don’t know what they are talking about - which is concerning about the information I get from them about topics I don’t have that high level of domain expertise.
And why stop at newspapers, it's been a while since one could say books have any integrity, pretty much anyone can get anything into print these days. From political shenanigans to self help books designed to confirm people's biases to sell more units. Video's by far the hardest to fake but that's changing as well.
Regardless of what media you get your info from you have to be selective of what sources you trust. It's more true today than ever before, because the bar for creating content has never been lower.
The problem is that LLM output is so incredibly confident in tone. It really sounds like you're talking to an expert who has years of experience and has done the research for you - and tech companies push this angle quite hard.
That's bad when their output can be complete garbage at times.
I think eventually humans are going to need to disregard confidence as any kind of indicator of quality, which will be very difficult. Something about us is hardwired to believe a confident delivery of words.
It makes me really sad how Google pushes this technology that is simply flat out wrong sometimes. I forgot what exactly I searched for, but I searched for a color model that Krita supports hoping to get the online documentation as the first result and the under several Youtube thumbnails the AI overview was telling me that Krita doesn't support that color model and you need a plugin for that. Under the AI overview was the search result I was looking for about that color model in Krita.
And worse of all is that it's not even consistent, because I tried the same searches again and I couldn't get the same answer, so it just randomly decides to assert complete nonsense sometimes while other times it gives the right answer or says something completely unrelated.
It's really been a major negative in my search experience. Every time I search for something I can't be sure that it's actually quoting anything verbatim, so I need to check the sources anyway. Except it's much harder to find the link to the source with these AI's than it is to just browse the verbatim snippets in a simple list of search results. So it's just occupying space with something that is simply less convenient.
The AI is also indiscriminate with what "sources" it chooses. Even deep research mode in gemini.
You can go through and look at the websites it checked, and it's 80% blogspam with no other sources cited on said blog.
When I'm manually doing a Google search, I'm not just randomly picking the first few links I'm deliberately filtering for credible domains or articles, not just picking whatever random marketing blog SEO'd their way to the top.
Sorry Gemini, an Advertorial from Times of India is not a reliable source for what I'm looking for. Nor is this xyz affiliate marketing blog stuffed to the brim with ads and product placement.
Some of that is due to that's probably 90% of the internet, but weren't these things trained on huge amounts of books, and published peer-reviewed works? Where are those in the sources?
It's trained on them, yes. But is it trained to prefer them as sources when doing web search?
The distinction is rather important.
We have a lot of data that teaches LLMs useful knowledge, but data that teaches LLMs complex and useful behaviors? Far less represented in the natural datasets.
It's why we have to do SFT, RLHF and RLVR. It's why AI contamination in real world text datasets, counterintuitively, improves downstream AI performance.
The next time you're working on your car google bolt torque specs and cross reference the shit their "AI" says with the factory shop manual. Hilarity ensues.
I feel like when I talk to someone and they tell me a fact, that fact goes into a kind of holding space, where I apply a filter of 'who is this person that is telling me this thing to know what the thing they are telling me is'. There's how well I know them, there's the other beleifs I know they have, there's their professional experience and their personal experience. That fact then gets marked as 'probably a true fact' or 'mark beleives in aliens'.
When I use chatGPT I do the same before I've asked for the fact: how common is this problem? how well known is it? How likely is that chatgpt both knows it and can surface it? Afterwards I don't feel like I know something, I feel like I've got a faster broad idea of what facts might exist and where to look for them, a good set of things to investigate, etc.
> Afterwards I don't feel like I know something, I feel like I've got a faster broad idea of what facts might exist and where to look for them, a good set of things to investigate, etc.
Can you cite a specific example where this happened for you? I'm interested in how you think you went from "broad idea" to building actual knowledge.
Sure. I wanted to tile my bathroom, from chatgpt i learned about laser levels, ledger boards, and levelling spacers (id only seen those cross corner ones before).
I guess. I also used it to check the side effects of coming off prednisolone, and it gave me some areas to look at. I've used it a bunch to check out things around kidney transplants and everything ive verified has been correct.
> The mental motion of “I didn’t really parse that paragraph, but sure, whatever, I’ll take the author’s word for it” is, in my introspective experience, absolutely identical to “I didn’t really parse that paragraph because it was bot-generated and didn’t make any sense so I couldn’t possibly have parsed it”, except that in the first case, I assume that the error lies with me rather than the text. This is not a safe assumption in a post-GPT2 world. Instead of “default to humility” (assume that when you don’t understand a passage, the passage is true and you’re just missing something) the ideal mental action in a world full of bots is “default to null” (if you don’t understand a passage, assume you’re in the same epistemic state as if you’d never read it at all.)
The important part of this is the "I feel like" bit. There's a fair but growing bit of research that the "fact" is more durable in your memory than the context, and over time, across a lot of information, you will lose some of the mappings and integrate things you "know" to be false into model of the world.
This more closely fits our models of cognition anyway. There is nothing really very like a filter in the human mind, though there are things that feel like them.
Maybe but then thats the same wether I talk to chatGPT or a human isnt it? except with chatgpt i instantly verify what im looking for, whereas with a human i cant do that.
I wouldn't assume that it's the same, no. For all we knock them unconscious biases seem to get a lot of work done, we do all know real things that we learned from other unreliable humans, somehow. Not a perfect process at all but one we are experienced at and have lifetimes of intuition for.
The fact that LLMs seem like people but aren't, specifically have a lot of the signals of a reliable source in some ways, I'm not sure how these processes will map. I'm skeptical of anyone who is confident about it in either way, in fact.
>I think LLMs should not be seen as knowledge engines but as confidence engines.
This is a good line, and I think it tempers the "not just misinformed, but misinformed with conviction" observation quite a bit, because sometimes moving forward with an idea at less than 100% accuracy will still bring the best outcome.
Obviously that's a less than ideal thing to say, but imo (and in my experience as the former gifted student who struggles to ship) intelligent people tend to underestimate the importance of doing stuff with confidence.
Confidence has multiple benefits. But one of those benefits is social - appearing confident triggers others to trust you, even when they shouldn’t.
Seeing others get burned by that pattern over and over can encourage hesitation and humility, and discourage confident action. It’s essentially an academic attitude and can be very unfortunate and self-defeating.
I find the biggest crime with LLMs to be the size of the problems we feed them.
Every time I start getting lazy and asking ChatGPT things like "write me a singleton that tracks progression for XYZ in a unity project", I wind up with a big hole where some deeper understanding of my problem should be. A better approach is to prompt it like "Show me a few ways to persist progression-like data in a unity project. Compare and contrast them".
Having an LLM development policy where you ~blindly accept a solution simply because it works is like an HOV lane to hell. It is very tempting to do this when you are tired or in a rush. I do it all the time.
Speaking of uncertainty, I wish more people would accept their uncertainty with regards to the future of LLMs rather than dash off yet another cocksure article about how LLMs are {X}, and therefore {completely useless}|{world-changing}.
Quantity has a quality of its own. The first chess engine to beat Gary Kasparov wasn't fundamentally different than earlier ones--it just had a lot more compute power.
The original Google algorithm was trivial: rank web pages by incoming links--its superhuman power at giving us answers ("I'm feeling lucky") was/is entirely due to a massive trove of data.
And remember all the articles about how unreliable Wikipedia was? How can you trust something when anyone can edit a page? But again, the power of quantity--thousands or millions of eyeballs identifying errors--swamped any simple attacks.
Yes, LLMs are literally just matmul. How can anything useful, much less intelligent, emerge from multiplying numbers really fast? But then again, how can anything intelligent emerge from a wet mass of brain cells? After all, we're just meat. How can meat think?
> How can you trust something when anyone can edit a page? But again, the power of quantity--thousands or millions of eyeballs identifying errors--swamped any simple attacks.
Sure, but now the established power users are free to insert more subtle attacks. The https://xkcd.com/978/ problem never stopped and the "reliable sources" consideration process allows for considerable political bias.
Some of us used to think that meat spontaneously generated flies. Maybe someday we'll (re-)learn that meat doesn't spontaneously generate thought either?
I recall trying to use GPT-4 to plan a trip through the PNW in ~Spring of 2023.
It presented a reasonable agenda, however 80% of the rockhounding spots were completely made up!
Over time, and as LLMs have gotten less sycophantic, I’ve found myself trusting them a bit more (a dangerous and slippery slope).
With that said, GPT-4o in particular, seemed to rank user satisfaction above truth.
I’ve found that GPT-5 Pro is currently the best at pushing back against silly ideas, and does a decent job of informing me that my questions could be better (:
As always, trust, but verify! Google maps lists "made up" places or outdated info. AI isn't scouting these locations physically...
Of course, at that point, the real question is, what's the value difference (taking into account personal, external and social costs) between asking chatgpt and /r/rockhounding (or whatever message boards they frequent)? At least if you start a thread on reddit, you might meet other people in the area with the same hobby, find a spot no one's talked about yet, get expert context and leave a trail for others to find.
How is "don't use LLMs as a source of truth" still news today? The machine does work, it doesn't know anything. Let the sucker fetch websites and write code.
I partly share the author's point that ChatGPT users (myself included) can "walk away not just misinformed, but misinformed with conviction". Sometimes I want to criticise aloud, write a post blaming this technology for those colourful, sophisticated, yet empty bullshits I hear from a colleague or read in an online post.
But I always resist the urge. Because I think: Isn't it always going to have some kinds of people like that? With or without this LLM thing.
If there is anything to hate about this technology, for the more and more bullshits we see/hear in daily life, it is:
(1) Its reach: More people of all ages, of different backgrounds, expertise, and intents are using it. Some are heavily misusing it.
(2) Its (ever increasing) capability: Yes, it has already become pretty easy for ChatGPT or any other LLMs to produce a sophisticated but wrong answer on a difficult topic. And I think the trend is that with later, more advanced versions, it would become harder and take more effort to spot a hidden failure lurking in a more information-dense LLM's answer.
Everyone thinks they're the intelligent ones, of course. Which reinforces the repetition ad nauseam of Dunning Kruger. Which is on itself dumb AF because the effect described by Dunning and Kruger has been repeatedly exaggerated and misinterpreted. Which in turn is even dumber because Dunning-Kruger effect is debatable and reproducibility is weak at best.
LLMs basically act as defense attorneys for all your dumbest ideas. It is very easy to assume their confidence in you is justified, especially if you already lean narcissistic.
You now see threads on X of famous people using Grok to explain how smart their ideas are. But there’s a problem: You can literally get it to do that with every single dumb idea.
My opinion: if LLM's speed you up, you're doing it wrong. You have to carefully review and audit every line that comes out of an LLM. You have to spend a lot of time forcing LLM's to prove that the code it wrote is correct. You should be nit-picking everything.
Despite, LLM's are useful. I could write the code faster without an LLM, but then I'd have code that wasn't carefully reviewed line-by-line because my coworkers trust me (the fools). It'd have far fewer tests because nobody forced me to prove everything. It'd have worse naming because every once in a while the LLM does that better than me. It'll be missing a few edge cases the LLM thought of that I didn't. It'd have forest/trees problems because if I was writing the code I'd be focused on the code instead of the big picture.
> You have to carefully review and audit every line that comes out of an LLM. You have to spend a lot of time forcing LLM's to prove that the code it wrote is correct. You should be nit-picking everything.
I'm not sure this statement is true most of the time. This kind of reasoning reminds me of the discussion around 'code correctness'. In my opinion there are very few instances where correctness is really important. Most of the time you just need something that works well enough.
Imagine you have a continuous numeric scale that goes from 'never works' to '100% formal proofs' to indicate the correctness of every piece of software. Pushing your code to the '100% formal proofs' side takes a lot of resources, that could be deployed on other places.
At least for us, every bug that makes it into a release that gets installed on a client computer costs us 100x - 1000x as much as a bug that gets caught earlier.
Sometimes getting the new capability around that bug to market faster is worth the tradeoff, because the revenue or market position from the capability with that bug is way more important to the business than the 1000x cost of the fix after distribution.
As long as you have some mechanism to catch the issues before it hits customers. Too many software companies are OK shoveling crap on customers because it's easy to fix it in the field. Yes, it's easy to fix in the field, after you've inconvenienced and wasted the time of thousands of customers.
I start feeling that LLM are hallucinating less than people, no matter the fields, I am at the stage where I trust more code written by an LLM than by a person.
Typically, for the last 2 years, I don't feel that anyone can or bother to read anymore.
I very much agree. I've been telling folks in trainings that I do that the term "artificial intelligence" is a cognitohazard, in that it pre-consciously steers you to conceptualize a LLM as an entity.
LLMs are cool and useful technology, but if you approach them with the attitude you're talking with an other, you are leaving yourself vulnerable to all sorts of cognitive distortions.
It certainly isn't helped by the RLHF and chat interface encouraging this. LLM providers have every incentive to make their users engage it like an other. It was much harder to accidentally do when it was just a completion UI and not designed to roleplay as a person.
I don't think that is actually a problem. For decades people have believed that computers can't be wrong. Why, now, suddenly, would it be worse if they believed the computer wasn't a computer?
The larger problem is cognitive offloading. The people for whom this is a problem were already not doing the cognitive work of verifying facts and forming their own opinions. Maybe they watched the news, read a Wikipedia article, or listened to a TEDtalk, but the results are the same: an opinion they felt confident in without a verified basis.
To the extent this is on 'steroids', it is because they see it as an expert (in everything) computer and because it is so much faster than watching a TED talk or reading a long form article.
It can also dispense agreeable confirmation on tap, with very little friction and hardly any chance of accidentally encountering something unexpected or challenging. Even TED talks occasionally have a point of view that isn't perfectly crafted for each hearer.
This is a problem if 'everybody' is using it but I suspect there will be a few groups. It will be a 'tortoise and the hare' situation.
The LLM folks (the hare) will get the initial upper hand as it appears as though they are moving far faster than others but with limited or wrong actual results. This could change if we can solve the hallucination issue. Yes, they are in personal echo chambers but that can only get you so far when you hit the real world. It will be painful and messy but it will resolve long term. Worse case we end up with a Dune "do not make machines that think like a person".
The slow group (tortoise) are those that do not actively engadge in these things. Yes, this trying to keep up but using much slower mental faculties. I suspect long term they will do better as the fast group fail to deliver. Again if we do not solve issues of LLMs which is not certain.
So long as there is still the slow group, we probably would not go down the dark path of individual echo chambers. Long term, eventually if you trip over the same mental stumbling block, you learn to not do that any more.
I think this is true. It can super charge some bad takes.
But I've had the opposite experience. The average person is never going to read a scientific study, nor invest the time to find out the real details of any topic they are opinionated about other than simply typing a Youtube search and finding a video that is:
- Entertaining
- The person has their same biases
- the present the information in a short, consumable manner that doesn't require much investment.
In comparison to this dynamic LLMs are wonderful. They can reference scientific data. I have noticed that they do push back on bad takes (very gently) and steer people towards truth.
It's not that I think LLMs are perfect. They are not. But they are infinitely better than the average human at discovering truth.
Use an agent to create something with a non-negotiable outcome. Eg software that does something useful, or fails to, in a language you don’t program in. This is a helpful way to calibrate your own understanding of what LLMs are capable of.
>How often do you think a ChatGPT user walks away not just misinformed, but misinformed with conviction? I would bet this happens all the time. And I can’t help but wonder what the effects are in the big picture.
this is so wrong! i simply can't get ChatGPT to admit something clearly wrong. it can play both sides and gives nuance which is exactly what i expect. but it is so un-sycopanthic that it won't leave you feeling like you are right. any examples of it doing so are welcome! show me examples where it takes a clearly wrong or false idea and makes it look as if it is a good idea (unless you specifically ask it to do it).
I recently asked a leading GenAI chatbot to help me understand a certain physics concept. As I pressed it on the aspect I was confused about, the bot repeatedly explained, and in our discussion, consistently held firm that I was misunderstanding something, and made guesses about what I was misunderstanding. Eventually I realized and stated my mistake, and the chatbot confirmed and explained the difference between my wrong version and the truth. I looked at some sources and confirmed that the bot was right, and I had misremembered something.
I was quite impressed that it didn't "give in" and validate my wrong idea.
I've seen similar results in physics. I suspect LLMs are capable of redirecting the user accurately when there have been long discussions on the web about that topic. When an LLM can pattern-match on whole discussions, it becomes a next-level search engine.
Next, I hope we can somehow get LLMs to distinguish between reliable and less-reliable results.
I wonder if LLMs need to be this way owing to the role of pseudo-intelligent conversation partners they've been shoehorned into or if it's a deliberate choice of the vendors.
It's possible that the Dunning-Kruger effect is not real, only a measurement or statistical artefact [1]. So it probably needs more and better studies.
The author misses the science of emergence. Reductionist views can’t fully explain macro-level capabilities that arise in these systems. Something emerges at higher scales from the possibility space as model sizes grow; they stop being mere “stochastic parrots” or black boxes running simple regressions.
The weights develop their own inherent logic based on how they relate to each other, analogous to how brain waves encode memory at a level higher than individual neuron networks.
Ultimately, the value of AI lies in the imagination of its wielder. The Unknown Unknowns framework is a useful tool for navigating AI effectively (it powerful to help elaborate on Known Unknowns and identify Unknown Unknowns), along with a healthy dose of critical thinking and understanding how reinforcement learning and RLHF work post-pretraining.
8 months or so ago, my quip regarding LLMs was “stochastic parrot.”
The term I’ve been using of late is “authority simulator.” My formative experiences with “authority figures” was a person who can speak with breadth and depth about a subject and who seems to have internalized it because they can answer quickly and thoroughly. Because LLMs do this so well, it’s really easy to feel like you’re talking to an authority in a subject. And even though my brain intellectually knows this isn’t true, emotionally, the simulation of authority is comforting.
I'm pretty well "on the spectrum" and people glazing me in real life produce suspicion and discomfort rather than any good feelings.
I don't have a problem just ignoring all the LLM glazing, although I'd really like the ability to turn it off.
The fact that they've all been trained to do it, because so many of the "normies" fall for it, is kind of an indictment in my eyes. Bit of a mirror held up to society.
You should probably be worried about how fake flattery works so well in society, and how this enables sociopaths and narcissists to flourish and control everything.
This seems to move the idea that you might not understand how skilled you are to some kind of law that ties humility to knowledge more strictly.
Maybe this is my misunderstanding but I don't think the common invocation really took it as a law that the unknowledgeable always think their skills are higher.
>> How often do you think a ChatGPT user walks away not just misinformed, but misinformed with conviction? I would bet this happens all the time.
Why is the "Dunning-Kruger" is not mentioned anywhere in the article body while is gloriously visible in the title? By the way, AI is not wrong "all the time".
It's remarkably easy to get ChatGPT to hallucinate. A go-to for me is to ask it about factual things as compared to other factual things. The description of the Bluey episode is wrong in nearly every possible way. Additionally, the response continues on analyzing in multiple ways, and again continues to get the fundamental facts and therefore the analysis wrong.
Prompt: Compare the Space episode of Bluey to Interstellar
Response: That’s a really fun and surprisingly fitting comparison! Let’s break down how Bluey’s “Space” episode parallels Christopher Nolan’s Interstellar — both emotionally and thematically — even though one is a 7-minute preschool cartoon and the other a 3-hour sci-fi epic.
1. Core Theme: Love Across Distance
Interstellar:
The film centers on a father (Cooper) who leaves his daughter (Murph) to save humanity, exploring how love transcends time and space — literally. The emotional heart of Interstellar is the bond between parent and child that endures despite impossible separation.
Bluey: “Space”
In this episode, Bluey, Bingo, and their dad (Bandit) play an imaginative game where they’re astronauts who have to separate to save their mission. The moment Bluey and Bandit must say goodbye echoes that same emotional tension — the pain of letting go but also the love that motivates sacrifice.
Parallel: Both stories use the setting of outer space as a metaphor for emotional distance — how people we love can feel far away, and how connection can bridge that gap.
Humans broadly have a tenuous grasp of “reality” and “truth.” Propagandists, spies and marketers know what philosophers of mind prove all too well: most humans do not perceive or interact with reality as it is, rather their perception of it as it contributes or contradicts their desired future.
Provide a person confidence in their opinion and they will not challenge it, as that would risk the reward of lend you live in a coherent universe.
The majority person has never heard the term “epistemology” despite the concept being central to how people derive coherence. Yet all these trite pieces written about AI and its intersectionality with knowledge claim some important technical distinction.
I’m hopeful that a crisis of epistemology is coming, though that’s probably too hopeful. I’m just enjoying the circus at this point
I hate to comment on just a headline—thought I did read the article—but it's wrong enough to warrant correcting.
This is not what the Dunning-Kruger effect is. It's lacking metacognitive ability to understand one's own skill level. Overconfidence resulting from ignorance isn't the same thing. Joe Rogan propagated the version of this phenomenon that infiltrated public consciousness, and we've been stuck with it ever since.
Ironically, you can plug this story into your favorite LLM, and it will tell you the same thing. And, also ironically, the LLM will generally know more than you in most contexts, so anyone with a degree epistemic humility is better served taking it at least as seriously as their own thoughts and intuitions, if not at face value.
Freely available online information is very often educationally incredibly shallow and commonly oversimplified to the point of being wrong. So of course an agent trained on it would be, too.
There are so many guardrails now that are being improved daily. This blog post is a year out of date. Not to mention that people know how to prompt better these days.
To make his point, you need specific examples from specific LLMs.
> I feel like LLMs are a fairly boring technology. They are stochastic black boxes. The training is essentially run-of-the-mill statistical inference. There are some more recent innovations on software/hardware-level, but these are not LLM-specific really.
This is pretty ironic, considering the subject matter of that blog post. It's a super-common misconception that's gained very wide popularity due to reactionary (and, imo, rather poor) popular science reporting.
The author parroting that with confidence in a post about Dunner-Krugering gives me a bit of a chuckle.
Yeah, but that's their interface. That informs surprisingly little about their inner workings.
ANNs are arbitrary function approximators. The training process uses statistical methods to identify a set of parameters that approximate the function as best as possible. That doesn't necessarily mean that the end result is equivalent to a very fancy multi-stage linear regression. It's a possible outcome of the process, but it's not the only possible outcome.
Looking at a LLMs I/O structure and training process is not enough to conclude much of anything. And that's the misconception.
> Yeah, but that's their interface. That informs surprisingly little about their inner workings.
I'm not sure I follow. LLMs are probabilistic next-token prediction based on current context, that is a factual, foundational statement about the technology that runs all LLMs today.
We can ascribe other things to that, such as reasoning or knowledge or agency, but that doesn't change how they work. Their fundamental architecture is well understood, even if we allow for the idea that maybe there are some emergent behaviors that we haven't described completely.
> It's a possible outcome of the process, but it's not the only possible outcome.
Again, you can ascribe these other things to it, but to say that these external descriptions of outputs call into question the architecture that runs these LLMs is a strange thing to say.
> Looking at a LLMs I/O structure and training process is not enough to conclude much of anything. And that's the misconception.
I don't see how that's a misconception. We evaluate all pretty much everything by inputs and outputs. And we use those to infer internal state. Because that's all we're capable of in the real world.
Then why not say "they are just computer programs"?
I think the reason people don't say that is because they want to say "I already understand what they are, and I'm not impressed and it's nothing new". But what the comment you are replying to is saying is that the inner workings are the important innovative stuff.
> Then why not say "they are just computer programs"?
LLMs are probabilistic or non-deterministic computer programs, plenty of people say this. That is not much different than saying "LLMs are probabilistic next-token prediction based on current context".
> I think the reason people don't say that is because they want to say "I already understand what they are, and I'm not impressed and it's nothing new". But what the comment you are replying to is saying is that the inner workings are the important innovative stuff.
But we already know the inner workings. It's transformers, embeddings, and math at a scale that we couldn't do before 2015. We already had multi-layer perceptrons with backpropagation and recurrent neural networks and markov chains before this, but the hardware to do this kind of contextual next-token prediction simply didn't exist at those times.
I understand that it feels like there's a lot going on with these chatbots, but half of the illusion of chatbots isn't even the LLM, it's the context management that is exceptionally mundane compared to the LLM itself. These things are combined with a carefully crafted UX to deliberately convey the impression that you're talking to a human. But in the end, it is just a program and it's just doing context management and token prediction that happens to align (most of the time) with human expectations because it was designed to do so.
The two of you seem to be implying there's something spooky or mysterious happening with LLMs that goes beyond our comprehension of them, but I'm not seeing the components of your argument for this.
No one understands how an LLM works. Some people just delude themselves into thinking that they do.
Saying "I know how LLMs work because I read a paper about transformer architecture" is about as delusional as saying "I read a paper about transistors, and now I understand how Ryzen 9800X3D works". Maybe more so.
It takes actual reverse engineering work to figure out how LLMs can do small bits and tiny slivers of what they do. And here you are - claiming that we actually already know everything there is to know about them.
> Saying "I know how LLMs work because I read a paper about transformer architecture" is about as delusional as saying "I read a paper about transistors, and now I understand how Ryzen 9800X3D works". Maybe more so.
Which is to say, not delusional at all.
Or else we have to accept that basically hardly anyone "understands" anything. You set an unrealistic standard.
Beginners play abstract board games terribly. We don't say that this means they "don't understand" the game until they become experts; nor do we say that the experts "haven't understood" the game because it isn't strongly solved. Knowing the rules, consistently making legal moves and perhaps having some basic tactical ideas is generally considered sufficient.
Similarly, people who took the SICP course and didn't emerge thoroughly confused can reasonably be said to "understand how to program". They don't have to create MLOC-sized systems to prove it.
> It takes actual reverse engineering work to figure out how LLMs can do small bits and tiny slivers of what they do. And here you are - claiming that we actually already know everything there is to know about them.
No; it's a dismissal of the relevance of doing more detailed analysis, specifically to the question of what "understanding" entails.
The fact that a large pile of "transformers" is capable of producing the results we see now, may be surprising; and we may lack the mental resources needed to trace through a given calculation and ascribe aspects of the result to specific outputs from specific parts of the computation. But that just means it's a massive computation. It doesn't fundamentally change how that computation works, and doesn't negate the "understanding" thereof.
Understanding a transistor is an incredibly small part of how Ryzen 9800X3D does what it does.
Is it a foundational part? Yes. But if you have it and nothing else, that adds up to knowing almost nothing about how the whole CPU works. And you could come to understand much more than that without ever learning what a "transistor" even is.
Understanding low level foundations does not automatically confer the understanding of high level behaviors! I wish I could make THAT into a nail, and drive it into people's skulls, because I keep seeing people who INSIST on making this mistake over and over and over and over and over again.
I'm not sure what claim your disputing or making with this.
What more are LLMs than statistical inference machines? I don't know that I'd assert that's all they are with confidence but all the configurations options I can play with during generation (Top K, Top P, Temperature, etc.) are all ways to _not_ select the most likely next token which leads me to believe that they are, in fact, just statistical inference machines.
What more are human brains than piles of wet meat?
It's not an argument - it's a dismissal. It's boneheaded refusal to think on the matter in any depth, or consider any of the implications.
The main reason to say "LLMs are just next token predictions" is to stop thinking about all the inconvenient things. Things like "how the fuck does training on piles of text make machines that can write new short stories" or "why is a big fat pile of matrix multiplications better at solving unseen math problems than I am".
> What more are human brains than piles of wet meat?
Calculation isn't what makes us special; that's down to things like consciousness, self-awareness and volition.
> The main reason to say "LLMs are just next token predictions" is to stop thinking about all the inconvenient things. Things like...
They do it by iteratively predicting the next token.
Suppose the calculations to do a more detailed analysis were tractable. Why should we expect the result to be any more insightful? It would not make the computer conscious, self-aware or motivated. For the same reason that conventional programs do not.
Do you have, by chance, a set of benchmarks that could be administered to humans and LLMs both, and used to measure and compare the levels of "consciousness, self-awareness and volition" in them?
Because if not, it's worthless philosophical drivel. If it can't be defined, let alone measured, then it might as well not exist.
What is measurable and does exist: performance on specific tasks.
And the pool of tasks where humans confidently outperform LLMs is both finite and ever diminishing. That doesn't bode well for human intelligence being unique or exceptional in any way.
I quite regularly ask LLMs to take the other side of an argument, or to tell me where something is wrong.
Unfortunately, they don't seem very good at this process, and in some ways seem to defend the previous position.
Does anyone else take this approach and have success with it?
LLMs, kind of like Bill Bryson's books, are great at presenting "information" that seems completely plausible, authoritative, and convincing to the reader. But when you actually do know the truth about a subject, you realize how completely full of crap they too often are. And somehow after being given a patently counterfactual response to one query, we just blindly continue to take their responses to other queries as having value.
At the moment, I find them to be the perfect tool to get started with learning about something. I don't expect it to tell me everything I need to know or to even be right, but if I ask ChatGPT or another LLM a question about a subject I'm not familiar with then it will at least use a bunch of terminology that I didn't have in my vocabulary before starting.
For example, I just bought a 1990 Miata and I want to install a couple of rocker switches in the dash to individually control the pop-up headlights. I have enough circuits knowledge to safely change outlets and light switches, but I didn't know about relays. I asked ChatGPT how to add these switches and it immediately mentioned buying DPDT switches and tying in the OEM relay into a SPDT relay. It may have gotten the actual circuit diagram completely wrong, but now I know exactly what to read up on.
Now let me ask you the more fundamental question.. did this do you any better than if you had searched a youtube video or some other source? Would this be video from 2016 be relevant? This may not be the right video but my approach for DIY in the last 10-20 years was to hit youtube up. https://www.youtube.com/watch?v=77q9KtjnNTU
I'm trying to gauge whether LLMs are truly expanding our capabilities in a fundamental way or are really just another way to search for answers without going to google or a library.
For obscure things, it's often very hard to find videos like that, and the videos vary greatly in quality. ChatGPT helped me fix my washing machine and my dryer yesterday with perfect advice, walking me through every step. Those are both projects I would've made a half assed attempt at and then thrown my hands up and called someone to do in the past.
I wonder if that can be attributed to search engines and search fields on various websites being intentionally worsened in order to push specific content and ads.
Google search and Youtube search used to almost always get you what you were looking for. Now you have to fight with it to maybe get what you are looking for because of all the sponsored ads.
Search used to be a nearly solved problem.
Yeah, it's definitely been terrific for figuring out terminology or "the right word" to use for things.
Completely not related to any LLM usage, but welcome to the world of NA Miata ownership! I think you'll find that with just general maintenance it'll treat you very well -- My '91 is the most reliable car in the drive, and by far the most whimsical. (I just got back from a Miata errand trip in the pouring rain -- Why did I drive the Miata? Winter is very soon, and it gets put away for ~3 ish months -- so at this time of year, every possible trip is a miata trip!)
This weekend I stumbled upon a cars and coffee in Fremont. Was expecting a wide variety of cars, and was surprised to see instead all Miatas.
I don't quite disagree but this comparison is typically unfair, because when you really know about a subject you tend to ask way more difficult questions than about other subjects, so of course the LLMs are gonna struggle more. If you ask really basic questions they will regurgitate well known bachelor-level knowledge and look good. What do I know about biology anyway? about silos for grain storage? any passable answer is enough to wow me on those topics. But on the topics I really know about, I never ask the basics.
Perhaps a similar issue is at root in both scenarios. I know this is highly simplified but...
Bill writes a book with general information about the short history of everything. The book is meant to appeal to a broader audience so the material must sometimes sacrifice accuracy in the interest of moving the reader along to other new ideas.
That's fine for a general understanding of a subject, but as the OP stated, when an expert looks closer, they see holes in the book. They're upset that Bill hasn't updated his book.
An LLM is trained on data. We now realize early LLMs were trained on publically available data from sites like Reddit, stack overflow, and probably hacker news. Any data in these sites became boilerplate to current LLM responses.
With widespread adoption of LLMs, the quality of sites like stack overflow and reddit has been reduced due to lack of trust and engagement from real persons and everyone realizing where all those network traffic costs were going.
Companies shift LLM training from public data to feedback from LLM users. Any 'cutting edge' use cases or cases poorly documented are highly hallucinated (or contain data that mixes version specific solutions).
Chaos becomes the norm and whoever has the best trained LLM service with the highest quality feedback will eventually swallow up all other services.
Sound about right?
> like Bill Bryson's books, are great at presenting "information" that seems completely plausible, authoritative, and convincing to the reader. But when you actually do know the truth about a subject, you realize how completely full of crap
Wow, I have a couple Bill Bryson books on my reading list, can you share some examples of that?
I read this good breakdown on 'The Mother Tongue' on everything2 sometime ago: https://everything2.com/title/The+Mother+Tongue%253A+English...
Honestly I wouldn't worry about it. He's a wonderful writer, the problem is that he doesn't let reality get in the way of a good story. Just classify them with the rest of the fiction-non-fiction books and enjoy the journey. If you ever find yourself asking "wow is that true?" then it probably isn't.
I think it is truly hilarious that you brought Bill Bryson into this discussion.
geLLMan amnesia
I've most frequently heard this referred to as “Gell-Mann Amnesia,” and yes, LLMs are fertile ground to find it.
> But when you actually do know the truth about a subject, you realize how completely full of crap they too often are
The Gell-Mann Amnesia Effect https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect
Similar to (same as?) Gell-Mann amnesia effect.
I'm not sure this is something I really worry about. Whenever I use an LLM I feel dumber, not smarter; there's a sensation of relying on a crutch instead of having done the due diligence of learning something myself. I'm less confident in the knowledge and less likely to present it as such. Is anyone really cocksure on the basis of LLM received knowledge?
> As I ChatGPT user I notice that I’m often left with a sense of certainty.
They have almost the opposite effect on me.
Even with knowledge from books or articles I've learned to multi-source and question things, and my mind treats the LLMs as a less reliable averaging of sources.
> Is anyone really cocksure on the basis of LLM received knowledge?
I work for a company with an open source product and the number of support requests we get from people who ask the chatbot to do their config and then end up with something nonfunctioning is quite significant. Goes up to users complaining our api is down because the chatbot hallucinated the endpoint.
LLMs do love to make up endpoints and parameters, but I have found that ones with web access are pretty good at copy/pasting configs if they can find them, so it might be worth a few minutes of exploring what people are actually finding that's causing it to make up an endpoint. I have not (yet!) seen an instance where making something easier for LLMs to parse didn't also help human comprehension.
I remember back when I was in secondary school, something commonly heard was
"Don't just trust wikipedia, check it's resources, because it's crowdsourced and can be wrong".
Now, almost 2 decades later, I rarely hear this stance and I see people relying on wikipedia as an authoritative source of truth. i.e, linking to wikipedia instead of the underlying sources.
In the same sense, I can see that "Don't trust LLMs" will slowly fade away and people will blindly trust them.
> "Don't just trust wikipedia, check it's resources, because it's crowdsourced and can be wrong"
This comes from decades of teachers misremembering what the rule was, and eventually it morphed into the Wikipedia specific form we see today - the actual rule is that you cannot cite an encyclopaedia in an academic paper. full stop.
Wikipedia is an encyclopaedia and therefore should not be cited.
Wikipedia is the only encyclopaedia most people have used in the last 20 years, therefore Wikipedia = encyclopaedia in most people's minds.
There's nothing wrong with using an encyclopaedia for learning or introducing yourself to a topic (in fact this is what teachers told students to do). And there's nothing specifically wrong about Wikipedia either.
I remember all of our encyclopedias being decades out of date growing up. My parents bought a set of Encyclopedia Brittanica in 1976 or something like that, so by the time I was reading the Encyclopedia for research on papers in the late 90s and early 00s, it was without a doubt less factual than even the earliest incarnation of Wikipedia was.
Either way, you are correct, we weren't allowed to cite any encyclopedia, but they were meant to be jumping off points for papers. After Wikipedia launched when I was in 9th grade, we weren't allowed to even look at it (blocked from school computers).
I definitely used it though.
The original rule when I was a lad (when wikipedia was a baby) was, "don't trust stuff on the internet, especially Wikipedia where people can change it at will."
Today they might have better trust for Wikipedia-- and I know I use it as a source of truth for a lot of things-- but back in my day teachers were of the opinion that it couldn't be trusted. This was for like middle and high school, not college or university, so we would cite encyclopedias and that sort of thing, since we weren't reading cutting edge papers back then (maybe today kids read them, who knows).
Edit: Also, I think the GP comment was proven correct by all of the replies claiming that Wikipedia was never controversial because it was very clear to everyone my age when Wikipedia was created/founded that teachers didn't trust the internet nor Wikipedia at the time.
There was a period of time where Wikipedia was more scrutinized than print encyclopedias because people did not understand the power of having 1000s of experts and the occasional non-experts editing an entry for free instead of underpaying one sudo-expert. They couldn't comprehend how an open source encyclopedia would even work or trust that humans could effectively collaborate on the task. They imagined that 1000s of self-interested chaos monkeys would spend all of their energy destroying what 2-3 hard working people has spent hours creating instead of the inverse. Humans are very pessimistic about other humans. In my experience when humans are given the choice to cooperate or fight, most choose to cooperate.
All of that said, I trust Wikipedia more than I trust any LLMs but don't rely on either as a final source for understanding complex topics.
> the power of having 1000s of experts and the occasional non-experts editing an entry
When Wikipedia was founded, it was much easier to change articles without notice. There may not have been 1000s of experts at the time, like there are today. There's also other things that Wikipedia does to ensure articles are accurate today that they may not have done or been able to do decades ago.
I am not making a judgment of Wikipedia, I use it quite a bit, I am just stating that it wasn't trusted when it first came out specifically because it could be changed by anyone. No one understood it then, but today I think people understand that it's probably as trustworthy or moreso than a traditional encyclopedia is/was.
> In my experience when humans are given the choice to cooperate or fight, most choose to cooperate.
Personally, my opinion of human nature falls somewhere in the middle of those two extremes.
I think when humans are given the choice to cooperate or fight, most choose to order a pizza.
A content creator I used to follow was fond of saying "Chill out, America isn't headed towards another civil war. We're way too fat and lazy for that."
Even ordering a pizza requires the cooperation of a functioning telecom system, a pizza manufacturer, a delivery person, a hungry customer...
Sure but I hope you get my point. Fighting takes effort, cooperation takes effort. Most people have other things to worry about and don't care about whatever it is you're fighting or cooperating over. People aren't motivated enough to try and sabotage the wikipedia articles of others. Even if they could automate it. There's just nothing in it for them.
> "They imagined that 1000s of self-interested chaos monkeys would spend all of their energy destroying what 2-3 hard working people has spent hours creating instead of the inverse."
Isn't that exactly what happens on any controversial Wikipedia page?
There's not that many controversial topics at any given time. One of Wikipedia's solutions was to lock pages until a controversy subsided. Perma-controversy has been managed in other ways, like avoiding the statement of opinion as fact, the use of clear and uncontroversial language, using discussion pages to hash out acceptable and unacceptable content, competent moderators... Rage burns itself and people get bored with vandalism.
More colloquially, people would say that Wikipedia could not be trusted because "anyone can edit the pages or write whatever they want."
Of course that's demonstrative of the genesis fallacy. Anyone can write or publish a book, too. So it always comes down to "how can you trust information?" That's where individual responsibility to think critically comes in. There's not really anything you can do about the fact that a lot of people will choose to not think.
That's why you should cite Grokipedia instead /s
Both comments are missing the reason that an encyclopedia should not be cited:
An encyclopedia does not cite its sources, and does not claim to be a primary source, its potentialymistakes cannot be checked.
(Wikipedia has the additional problem that, by default, the version cited is the ever-changing "latest" version, not a fixed and identified version.)
You gave some good context I missed - The (even) more technical (read: pretentious) explanation is that it's a tertiary source. As a general rule of thumb secondary sources are preferred over primary sources, but both are acceptable in the right academic context.
I do understand the "latest version" argument, and it is a weakness, but it's also a double edged sword - it means Wikipedia can also be more up-to-date than (almost) any other source for the information. Thats why I say there's "nothing specifically wrong about Wikipedia either" it can be held in similar regard to other tertiary sources and encyclopaedias - with all the problems that come with those.
That's not at all the reason.
Encyclopedias are tertiary sources, compilations of information generated by others. They are neither sources of first hand information (primary sources) nor original analysis (secondary sources). You can't cite encyclopedias because there's nothing to cite. The encyclopedia was not the first place the claim was made, even if it was the first place you happened to read it. You don't attribute a Wayne Gretsky quote to Michael Scott no matter how clearly he told you Wayne Gretsky said it.
Maybe you haven't used Wikipedia? It very definitely cites its sources. Material that doesn't have a cited source is removed regularly.
Well also years of Wikipedia proving to be more accurate than anything in print and rarely and not for very long misrepresenting source materials. For LLMs to get that same respect they would have to pull off all of the same reassuring qualities.
> Now, almost 2 decades later, I rarely hear this stance and I see people relying on wikipedia as an authoritative source of truth. i.e, linking to wikipedia instead of the underlying sources.
That's a different scenario. You shouldn't _cite wikipedia in a paper_ (instead you should generally use its sources), but it's perfectly fine in most circumstances to link it in the course of an internet argument or whatever.
There’s also the fact that both Wikipedia and LLMs are non-stationary. The quality of wikipedia has grown immensely since its inception and LLMs will get more accurate (if not explicitly “smarter”)
Wikipedia probably wins here because you can link to a permalink version of an article.
I'm not entirely convinced that the quality of Wikipedia has improved substantially in the last decade.
I think you would need a complicated set of metrics to claim something like "improved" that wasn't caveated to death. An immediate conflict being total number of articles vs impressions of articles labeled with POV biases. If both go up has the site improved?
I find I trust Wikipedia less these days, though still more than LLM output.
Care to provide any counter-examples? Please make it know if you end up using Wikipedia for your source of if Wikipedia's quality has changed
How in the world would you supply a counter-example for "the quality of Wikipedia has/hasn't improved substantially in the last decade"?
I also can't even read the second sentence. I think there are typos there, but there's no mental correction I can do to make it coherent for me.
I can't think of a better accidental metric than that!
I'll go ahead and speculate that the number of incoherent sentences per article has gone down substantially over the last decade, probably due to the relevant tooling getting better over the same period.
> I can see that "Don't trust LLMs" will slowly fade away and people will blindly trust them.
That's already happening. I don't even think we had a very long "Don't trust LLMs" phase, if we did it was very short.
The "normies" already trust whatever they spit out. At leadership meetings at my work, if I say anything that goes against the marketing hype for LLMs, such as talking about "Don't trust LLMs", it's met with eye rolls and I'm not forward thinking enough, blah blah.
Management-types have 100% bought into the hype and are increasingly more difficult to convince otherwise.
I can’t speak to your specific experience, but I do some of this kind of eye-rolling when people bring short term limitations on LLMs into long term strategy.
I’m reminded of when people at work assured me the internet was never going to impact media consumption because 28.8kbps is not nearly enough for video.
A big reason for this is that Wikipedia's source is often a book or a journal article that is either offline or behind an academic paywall. Checking the source is effectively impossible without visiting a college campus's library. The likelihood that the cited information is wrongly summarizing the contents is low enough and the cost is high enough that doing so regularly would be irrational.
A bigger problem in this respect with Wikipedia is it often cites secondary sources hidden behind an academic fire/paywall. It very often cites review articles and some of these aren't necessary entirely accurate.
I don't think the cases are really the same. With Wikipedia people have learned to trust that the probability of the information being at least reasonably good is pretty high because there's an editing crucible around it and the ability to correct misinformation surgically. No one can hotpatch a LLM in 5mins.
The best LLM powered solutions are as little LLM and as much conventional search engine / semantic database lookups and handcrafted coaxing as possible. But even then, the conversational interface is nice and lets you do less handcrafting in the NLP department.
Using Perplexity or Claude in "please source your answer" mode is much more like a conventional search engine than looking up data embedded in 5 trillion (or whatever) parameters.
Wikipedia is usually close enough and most users don't require perfection for their "facts"
Ive noticed things like gemini summaries on Google searches are also generally close enough.
Except when they glaringly get things wrong like "character X on show Y said catchphrase Z", and two queries produce two different values of X, one right, one wrong. The more I use gemini summaries for things I know a bit about, the worse my opinion of them..
https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect
Thanks for the Wikipedia link, do you have a source? /s
Close enough only counts in horseshoes and hand grenades
And most human communication
It wasn't just Wikipedia, which was a relatively recent addition to the web, everything online was a 'load of rubbish'.
In turn-of-the-century boomer world, reality was what you saw on TV. If you saw something with your own eyes that contradicted the world view presented by the media, then one's eyes were to be disbelieved. The only reputable sources of news were the mainstream media outlets. The only credible history books would be those with reviews from the mainstream media, with anything else just being the 'ramblings of a nutter'.
In short, we built a beautiful post-truth world and now we are set on outsourcing our critical thinking to LLMs.
This captures my experience quite well. I can "get a lot more done," but it's not really me doing the things, and I feel like a bit of a fraud. And as the workday and the workweek roll on, I find myself needing to force myself to look things up and experiment rather than just asking the LLM. It's quite clear that for most people LLMs will make the more dependent. People with better discipline I think will really benefit in big ways, and you'll see this become a new luxury belief; the disciplined geniuses around us will genuinely be perplexed why people are saying that LLMs have made them less capable, much in the same way they wonder why people can't just limit their drug use recreationally.
>it's not really me doing the things, and I feel like a bit of a fraud
I've been thinking about this a bit. We don't really think this way in other areas, is it appropriate to think this way here?
My car has an automatic transmission, am I a fraud because the machine is shifting gears for me?
My tractor plows a field, am I a fraud because I'm not using draft horses or digging manually?
Spell check caught a word, am I a fraud because I didn't look it up in a dictionary?
It's appropriate to think this way with LLM output because LLMs are still terrible some significant portion of the time. If you don't actually know what you're doing, you have no way to distinguish between their output being correct or their output being able to pass the tests you can think of.
As a software developer, your job is to understand code and business constraints so you can solve problems the way most appropriate for the situation. If you aren't actually keeping up with those constraints as they change through time, you're not doing your job. And yeah, that's a kind of fraud. Maybe it's more on yourself than your employer most of the time, but... It's your job. If you don't want to do it, maybe it's more respectful of your own time, energy, and humanity to move on.
I've been thinking about that comparison as well. A common fantasy is that civilization will collapse and the guy who knows how to hunt and start a fire will really excel. In practice, this never happens and he's sort of left behind unless he also has other skills relevant to the modern world.
And, for instance, I have barely any knowledge of how my computer works, but it's a tool I use to do my job. (and to have fun at home.)
Why are these different than using LLMs? I think at least for me the distinction is whether or not something enables me to perform a task, or whether it's just doing the task for me. If I had to write my own OS and word processor just to write a letter, it'd never happen. The fact that the computer does this for me facilitates my task. I could write the letter by hand, but doing it in a word processor is way better. Especially if I want to print multiple copies of the letter.
But for LLMs, my task might be something like "setting up apache is easy, but I've never done it so just tell me how do it so I don't fumble through learning and make it take way longer." The task was setting up Apache. The task was assigned to me, but I didn't really do it. There wasn't necessarily some higher level task that I merely needed Apache for. Apache was the whole task! And I didn't do it!
Now, this will not be the case for all LLM-enabled tasks, but I think this distinction speaks to my experience. In the previous word processor example, the LLM would just write my document for me. It doesn't allow me to write my document more efficiently. It's efficient, only in the sense that I no longer need to actually do it myself, except for maybe to act as an editor. (and most people don't even do much of that work) My skill in writing either atrophies or never fully develops since I don't actually need to spend any time doing it or thinking about it.
In a perfect world, I use self-discipline to have the LLM show me how to set up Apache, then take notes, and then research, and then set it up manually in subsequent runs; I'd have benefited from learning the task much more quickly than if I'd done it alone, but also used my self-discipline to make sure I actually really learned something and developed expertise as well. My argument is that most people will not succeed in doing this, and will just let the LLM think for them.
> But for LLMs, my task might be something like "setting up apache is easy, but I've never done it so just tell me how do it so I don't fumble through learning and make it take way longer." The task was setting up Apache. The task was assigned to me, but I didn't really do it. There wasn't necessarily some higher level task that I merely needed Apache for. Apache was the whole task! And I didn't do it!
To play devil's advocate: Setting up Apache was your task. A) Either it was a one-off that you'll never have to do again, in which case it wasn't very important that you learn the process inside and out, or b) it is a task you'll have to do again (and again), and having the LLM walk you through the setup the first time acts as training wheels (unless you just lazily copy & paste and let it become a crutch).
I frequently have the LLM walk me through an unfamiliar task and, depending on several factors such as whether I expect to have to do it again soon, the urgency of the task, and my interest and/or energy at the moment, I will ask the LLM follow-up questions, challenge it on far-fetched claims, investigate alternative techniques, etc. Execute one command at a time, once you've understood what it's meant to do, what the program you're running does, how its parameters change what it does, and so on, and let the LLM help you get the picture.
The alternative is to try to piece together a complete picture of the process from official documentation like tutorials & user manuals, disparate bits of information in search results, possibly wrong and/or incomplete information from Q&A forums, and muddle through lots of trial and error. Time-consuming, labor-intensive, and much less efficient at giving your a broad-strokes idea of how the whole thing works.
I much prefer the back-and-forth with the LLM and think it gives me a better understanding of the big picture than the slow and frustrating muddling approach.
But what is Apache for?
You don't just set up Apache to have run Apache? You set it up to serve web content! It is middleware, it is not in of itself useful?
Isn't setting up Apache robbing yourself of the opportunity to learn about writing your own HTTP server? In C? And what a bad idea that is?
The LLM helping you configure a web server is no different than the web server helping you serve HTTP instead of implementing a web server from scratch. You've just seemingly? arbitrarily decided your preferred abstraction layer is where "real work" happens.
Okay, maybe LLMs might disappear tomorrow and so for some reason the particular skill of configuring Apache will become useful again, maybe! But I'm already using brainpower to memorize phone numbers in case my smartphone contacts disappear, so maybe I won't have room for those Apache configs ;-)
I remember seeing a tweet awhile back that talked about how modernity separated work from physicality, and now you have to do exercise on purpose. I think the Internet plus car-driven societies had done something similar to being social, and LLMs are doing something to both thinking, as well as the kind of virtue that enables one to master a craft.
So, while it's an imperfect answer that I haven't really nailed down yet, maybe the answer is just to realize this and make sure we're doing hard things on purpose sometimes. This stuff has enabled free time, we just can't use it to doomscroll.
>Internet plus car-driven societies had done something similar to being social,
That's an interesting take on the loneliness crisis that I had not considered. I think you're really onto something. Thanks for sharing. I don't want to dive into this topic too much since it's political and really off-topic for the thread, but thank you for suggesting this.
Radio and especially TV also had large social effects. People used to play cards, instruments, and other social things before TV. Then household TV watching maxxed at 9 hours/day in 2010 (5hr/d in 1950). (Would like to know the per person watching and these are from Nielsen who would want higher numbers) [1].
Cars help people be social in my world. I would say that riding on a train in your own bubble with strangers is not a social activity, but others would disagree.
[1]https://www.bunkhistory.org/resources/when-did-tv-watching-p...
I don't have any great answer. But when I think about this for myself, I realize there is are different kinds of abstraction that qualitatively change the nature of the work.
I don't want my software developer's experience to turn into a real estate developer's experience. I don't want to go from being a technical knowledge worker to a financier or contract negotiator. I've realized I was never in it for the outcome. I was in it for the exploration and puzzles.
Similarly, I don't want to become a "Hollywood producer" cliche. This caricature was a common joke earlier in my tech career in Southern California. We detested the idea of becoming a "tech" person acting like a Steve Martin parody of a Hollywood wheeler-dealer. Someone sitting in a cafe, pitching ideas that was nothing more than a reference to an existing work with an added gimmick or casting change.
To me, that caricature combines two negative aspects. One is the heavily derivative and cynical nature. The other is the stratospheric abstraction level, where folks at this level see themselves as visionaries rather than just patrons of someone else doing all the creative work.
I don't want to be a patron of an LLM or other black box.
> Why are these different than using LLMs?
I would say that with a computer you're using a tool to take care of mundane details and speed up the mechanics of tasks in your life. Such as writing a document, or playing a game. I can't think of a way I would be seriously disadvantaged by not having the ability to hand-write an essay or have games I can readily play without a computer. Computers are more like tools in the way a hammer is a tool. I don't mind being totally dependent on a computer for those tasks in the same way I don't mind that I need a hammer anytime I want to drive a nail.
But for many people, LLMs replace critical thinking. They offer the allure of outsourcing planning, research, and generating ideas. These skills seem more fundamental to me, and I would say there's definitely a loss somehow of one's humanity if you let those things atrophy to the point you become utterly dependent on LLMs.
I mostly agree with this. LLMs are just another tool, and we've learned how to use and adapted to using many other tools throughout our history just fine.
With the caveat of for our field in particular, it's one of the few that require continuous learning and adaptation, so tech workers in a way are better predisposed to this line of thinking and tool adoption without some of the potential harmful side effects.
To pick on spell check, it has been showing that we can develop a dependency on it and thereby losing our own ability to spell and reason about language. But, is that a bad thing? I don't know.
What I do know is humans have been outsourcing our thinking for a long time. LLMs are another evolution in that process, just another way to push off cognitive load onto a tool like we've done with stone tablets, books, paper notes, digital notes, google, etc.
Yeah, everything I get out of the AI stinks of wrongness, even when its not materially wrong. There is a flimsiness to everything.
Well I think your experience is, if not in the minority, at least not the overwhelming majority.
Lots of folks think it's amazing and greatly empowers them.
> Is anyone really cocksure on the basis of LLM received knowledge?
Some people certainly seem to be. You see this a lot on webforums; someone spews a lot of confident superficially plausible-looking nonsense, then when someone points out that it is nonsense, they say they got it from a magic robot.
I think this is particularly common for non-tech people, who are more likely to believe that the magic robots are actually intelligent.
Most of the time it feels like a crutch to me. There has been a few moments where it unlocked deep motivation (by having a feel for the size of a solution based on chatgpt output) and one time a research project where any crazy idea I threw, it would imagine what it would entail in terms of semantics and then I was inspired even more.
The jury is Still out on what value these things will bring
Nah, I feel smart to use it in a smart way to get stuff done faster than before.
> Is anyone really cocksure on the basis of LLM received knowledge?
Yeah, the stupid.
unfortunately im like you and we are in the minority. The manager class loves the llm and doesnt seem to consider its flaws like that.
Nothing stops you from spending the paltry $60 to find out how ridiculously good coding agents are. It’s only a matter of time for other problems.
If you feel dumber, it’s because you’re using the LLM to do raw work instead of using it for research. It should be a google/stackoverflow replacement, not a really powerful intellisense. You should feel no dumber than using google to investigate questions.
I don't think this is entirely accurate. If you look at this: https://www.media.mit.edu/publications/your-brain-on-chatgpt..., it shows that search engines do engage your brain _more_ than LLM usage. So you'll remember more through search engine use (and crawling the web 'manually') than by just prompting a chatbot.
I find that it is terrible for research, and hallucinates 25% to 90% of its references.
If you tell it to find something and give it a detailed description of what you're looking for, it will pretend like it has verified that that thing exists, and give you a bulletpoint lecture about why it is such an effective and interesting thing that 1) you didn't ask for, and 2) is really it parroting your description back to you with embellishments.
I thought I was going to be able to use LLMs primarily for research, because I have read an enormous number of things (books, papers) in my life, and I can't necessarily find them again when they would be useful. Trying to track them down through LLMs is rarely successful and always agonizing, like pulling teeth that are constantly lying to you. A surprising outcome is that I often get so frustrated by the LLM and so detailed in how I'm complaining about its stupid responses that I remind myself of something that allows me to find the reference on my own.
I have to suspect that people who find it useful for research are researching things that are easily discoverable through many other means. Those are not the things that are interesting. I totally find it useful to find something in software docs that I'm too lazy to look up myself, but it's literally saving me 10 minutes.
I used to get this same feeling during lectures in uni. Often the information was presented well and, along with some clear examples, everything seemed to make perfect sense.
It wasn't until working through practice problems later, on my own, did it become clear how much detail I was missing.
Now consider what's happening to the learning process of the (rather large) subset of current college students choosing to replace that struggle for detailed understanding with LLM queries.
It’s the biggest crisis since math students started using graphing calculators.
[delayed]
> It wasn't until working through practice problems later, on my own, did it become clear how much detail I was missing.
This is a common problem in learning. Recognition is easier than recall and smoothness is confused for understanding.
You actually need to struggle with the concepts a bit to learn effectively. Without the struggle it feels more effective, but is not.
I've heard experts comment on this from the other side, that they'll give a quick layperson's soundbite about their subject of expertise that doesn't defensibly lay out all the possible exceptions and edge cases and weirdness for reasons of time and audience interest and then they'll be inundated with comments calling them a liar and accused of falsifying things or not actually understanding the subject.
Detail missing and being a confidently wrong are two different things though ?
Edit: Claude told me the other day told me my entire building might have to be demolished due to a slightly bow in my newly poured stem wall, I uploaded a photo etc and it was liked, “yes this is a serious structural issue blah blah blah” , the inspector came to look at it and literally laughed that I was worried about it.
> LLMs should not be seen as knowledge engines but as confidence engines.
The thing I like best about LLM is when I ask question about some technical problem, and it tells that it is a KNOWN problem. It thus gives me confiidence that I don't need to spend time) to look for solution where there is no good soloution. Just go around it somehow. It let's me know I'm not the only person with this problem. And that way it gives me confidence that I'm not stupid, the problem is a real problem.
As an example I was working with WebStorm and tried to find a way to make the Threads-tab the default tab shown when debugger opens. AI told me there is no way it knows about. Good, problem solved, solved by finding out there is no solution.
This is the kind of stuff AI lies about all the time. I can get it to tell me "That is some good insight, and is a known issue..." with things I make up out of thin air.
Without being to self-centred here but since I have been using LLM heavily I have always challenged the results given
The post seems to propose the following vector:
Idea-> LLm validation -> confidence -> no further checks
My process is more :
Idea-> LLm response -> skeptical reflection -> adversarial prompting -> synthesis
There’s a gap that LLMs are trying to fill in such cases, which is that there’s too much information that we can possibly hope to make sense of in a lifetime. Just as it’s possible to compute something incorrectly with a calculator, you can definitely be led astray by an LLM, which is why I am surprised that people think these models are good enough to replace humans at work. The only thing which makes sense is to both raise the bar for publishing, and to only take published works seriously. If something isn’t published, then authors should provide code to demonstrate the effect they’re describing.
> which is why I am surprised that people think these models are good enough to replace humans at work.
There are a lot of office jobs that I'd fit into the category of "bullshit jobs." They may serve some purpose in the huge bureaucracy of enterprises but the day to day ultimately boils doing to managing someone's calendar and sending emails.
Quite a few people at my work have now started using Copilot for their emails. It's obviously AI (at least to me), and yet, the content and formatting are an improvement over what they were sending before.
So much of the marketing hype on LLMs is about how it'll replace all the engineering work (the MBA's wet dream, to replace all the expensive labor). In reality, I think its more capable at replacing non-tech labor and middle management.
An LLM can send out an email to the team and analyze a project check-in faster, and better, than some overpaid middle manager can. I have no doubts an LLM could probably serve the role of a project management office, or a business analyst.
Sure, there should still be a human in the loop for now, but you need far, far less humans in those roles than previously.
I go back and forth on the idea that some jobs are bullshit, maybe I haven’t been exposed to enough industries or work places. Every place I worked definitely didn’t have bullshit jobs to hand out as adult daycare, but I can see how some places can become bloated because an over ambitious middle manager wants to say they manage X number of people on their resume. So there are bullshit jobs in that there are people who aren’t being utilized correctly, so in that case I’d say they’re no bullshit jobs, just bullshit leadership or managers.
Yeah, I agree with that, and a more accurate description than mine.
The people in those roles are being mismanaged/misutilized rather than the job itself being bullshit.
I've seen the bloating first hand though, and you're right, that's usually what leads to those jobs. Some department over hires to pad the resume of some middle manager, and now you have a team that's severely over staffed to the point each individual contributor has maybe 2 to 3 hours of actual work to do in a day.
Spend half a day with me, and you'll understand why LLM can replace most people in a company.
I've seen this! Following some Math and Physics subreddits it's a regular occurrence for a new submitter to come in and post some 40 pages of incomprehensible bullshit and claim that they developed a unifying theory of physics with ChatGPT and that ChatGPT has told them it's a breakthrough in the field. Of course that used to happen regularly before LLMs but not nearly as often.
Including the former CEO of Uber. I’m somewhat curious what these people even think they’ve discovered, what outstanding problem they think they’ve actually solved… but I’m not curious enough to actually dig through their slop.
https://gizmodo.com/billionaires-convince-themselves-ai-is-c...
I ascribe the effect of LLMs as similar to reading the newspaper, when I learn about something I have no knowledge base in I come away feeling like I learned a lot. When I interact with a newspaper or LLM in an area where I have real domain expertise I realize they don’t know what they are talking about - which is concerning about the information I get from them about topics I don’t have that high level of domain expertise.
Also known as the "Gell-Mann amnesia effect" [1].
[1] https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect
And why stop at newspapers, it's been a while since one could say books have any integrity, pretty much anyone can get anything into print these days. From political shenanigans to self help books designed to confirm people's biases to sell more units. Video's by far the hardest to fake but that's changing as well.
Regardless of what media you get your info from you have to be selective of what sources you trust. It's more true today than ever before, because the bar for creating content has never been lower.
The problem is that LLM output is so incredibly confident in tone. It really sounds like you're talking to an expert who has years of experience and has done the research for you - and tech companies push this angle quite hard.
That's bad when their output can be complete garbage at times.
I think eventually humans are going to need to disregard confidence as any kind of indicator of quality, which will be very difficult. Something about us is hardwired to believe a confident delivery of words.
It makes me really sad how Google pushes this technology that is simply flat out wrong sometimes. I forgot what exactly I searched for, but I searched for a color model that Krita supports hoping to get the online documentation as the first result and the under several Youtube thumbnails the AI overview was telling me that Krita doesn't support that color model and you need a plugin for that. Under the AI overview was the search result I was looking for about that color model in Krita.
And worse of all is that it's not even consistent, because I tried the same searches again and I couldn't get the same answer, so it just randomly decides to assert complete nonsense sometimes while other times it gives the right answer or says something completely unrelated.
It's really been a major negative in my search experience. Every time I search for something I can't be sure that it's actually quoting anything verbatim, so I need to check the sources anyway. Except it's much harder to find the link to the source with these AI's than it is to just browse the verbatim snippets in a simple list of search results. So it's just occupying space with something that is simply less convenient.
The AI is also indiscriminate with what "sources" it chooses. Even deep research mode in gemini.
You can go through and look at the websites it checked, and it's 80% blogspam with no other sources cited on said blog.
When I'm manually doing a Google search, I'm not just randomly picking the first few links I'm deliberately filtering for credible domains or articles, not just picking whatever random marketing blog SEO'd their way to the top.
Sorry Gemini, an Advertorial from Times of India is not a reliable source for what I'm looking for. Nor is this xyz affiliate marketing blog stuffed to the brim with ads and product placement.
Some of that is due to that's probably 90% of the internet, but weren't these things trained on huge amounts of books, and published peer-reviewed works? Where are those in the sources?
It's trained on them, yes. But is it trained to prefer them as sources when doing web search?
The distinction is rather important.
We have a lot of data that teaches LLMs useful knowledge, but data that teaches LLMs complex and useful behaviors? Far less represented in the natural datasets.
It's why we have to do SFT, RLHF and RLVR. It's why AI contamination in real world text datasets, counterintuitively, improves downstream AI performance.
The next time you're working on your car google bolt torque specs and cross reference the shit their "AI" says with the factory shop manual. Hilarity ensues.
I think it's ok. When wikipedia arrived, everyone was up in arms that people are learning from something that's open for anyone to edit.
But it rectified itself.
The same thing happened when Internet arrived. "Don't believe anything you read on the Internet."
I guess the reaction was same when printed media arrived.
But the thing is, things get better over time.
> But it rectified itself.
Or did it?
Here's a thought - improving AI is a completely different ball game.
Ah so nothing bad happening anymore due to people believing what they read on the internet, huh? Interesting take.
I feel like when I talk to someone and they tell me a fact, that fact goes into a kind of holding space, where I apply a filter of 'who is this person that is telling me this thing to know what the thing they are telling me is'. There's how well I know them, there's the other beleifs I know they have, there's their professional experience and their personal experience. That fact then gets marked as 'probably a true fact' or 'mark beleives in aliens'.
When I use chatGPT I do the same before I've asked for the fact: how common is this problem? how well known is it? How likely is that chatgpt both knows it and can surface it? Afterwards I don't feel like I know something, I feel like I've got a faster broad idea of what facts might exist and where to look for them, a good set of things to investigate, etc.
> Afterwards I don't feel like I know something, I feel like I've got a faster broad idea of what facts might exist and where to look for them, a good set of things to investigate, etc.
Can you cite a specific example where this happened for you? I'm interested in how you think you went from "broad idea" to building actual knowledge.
Sure. I wanted to tile my bathroom, from chatgpt i learned about laser levels, ledger boards, and levelling spacers (id only seen those cross corner ones before).
FWIW that seems like low stakes compared to what I see other people using LLMs for (e.g medical advice).
I guess. I also used it to check the side effects of coming off prednisolone, and it gave me some areas to look at. I've used it a bunch to check out things around kidney transplants and everything ive verified has been correct.
Reminds me of "default to null":
> The mental motion of “I didn’t really parse that paragraph, but sure, whatever, I’ll take the author’s word for it” is, in my introspective experience, absolutely identical to “I didn’t really parse that paragraph because it was bot-generated and didn’t make any sense so I couldn’t possibly have parsed it”, except that in the first case, I assume that the error lies with me rather than the text. This is not a safe assumption in a post-GPT2 world. Instead of “default to humility” (assume that when you don’t understand a passage, the passage is true and you’re just missing something) the ideal mental action in a world full of bots is “default to null” (if you don’t understand a passage, assume you’re in the same epistemic state as if you’d never read it at all.)
https://www.greaterwrong.com/posts/4AHXDwcGab5PhKhHT/humans-...
The important part of this is the "I feel like" bit. There's a fair but growing bit of research that the "fact" is more durable in your memory than the context, and over time, across a lot of information, you will lose some of the mappings and integrate things you "know" to be false into model of the world.
This more closely fits our models of cognition anyway. There is nothing really very like a filter in the human mind, though there are things that feel like them.
Maybe but then thats the same wether I talk to chatGPT or a human isnt it? except with chatgpt i instantly verify what im looking for, whereas with a human i cant do that.
I wouldn't assume that it's the same, no. For all we knock them unconscious biases seem to get a lot of work done, we do all know real things that we learned from other unreliable humans, somehow. Not a perfect process at all but one we are experienced at and have lifetimes of intuition for.
The fact that LLMs seem like people but aren't, specifically have a lot of the signals of a reliable source in some ways, I'm not sure how these processes will map. I'm skeptical of anyone who is confident about it in either way, in fact.
>I think LLMs should not be seen as knowledge engines but as confidence engines.
This is a good line, and I think it tempers the "not just misinformed, but misinformed with conviction" observation quite a bit, because sometimes moving forward with an idea at less than 100% accuracy will still bring the best outcome.
Obviously that's a less than ideal thing to say, but imo (and in my experience as the former gifted student who struggles to ship) intelligent people tend to underestimate the importance of doing stuff with confidence.
Confidence has multiple benefits. But one of those benefits is social - appearing confident triggers others to trust you, even when they shouldn’t.
Seeing others get burned by that pattern over and over can encourage hesitation and humility, and discourage confident action. It’s essentially an academic attitude and can be very unfortunate and self-defeating.
I find the biggest crime with LLMs to be the size of the problems we feed them.
Every time I start getting lazy and asking ChatGPT things like "write me a singleton that tracks progression for XYZ in a unity project", I wind up with a big hole where some deeper understanding of my problem should be. A better approach is to prompt it like "Show me a few ways to persist progression-like data in a unity project. Compare and contrast them".
Having an LLM development policy where you ~blindly accept a solution simply because it works is like an HOV lane to hell. It is very tempting to do this when you are tired or in a rush. I do it all the time.
Speaking of uncertainty, I wish more people would accept their uncertainty with regards to the future of LLMs rather than dash off yet another cocksure article about how LLMs are {X}, and therefore {completely useless}|{world-changing}.
Quantity has a quality of its own. The first chess engine to beat Gary Kasparov wasn't fundamentally different than earlier ones--it just had a lot more compute power.
The original Google algorithm was trivial: rank web pages by incoming links--its superhuman power at giving us answers ("I'm feeling lucky") was/is entirely due to a massive trove of data.
And remember all the articles about how unreliable Wikipedia was? How can you trust something when anyone can edit a page? But again, the power of quantity--thousands or millions of eyeballs identifying errors--swamped any simple attacks.
Yes, LLMs are literally just matmul. How can anything useful, much less intelligent, emerge from multiplying numbers really fast? But then again, how can anything intelligent emerge from a wet mass of brain cells? After all, we're just meat. How can meat think?
> How can you trust something when anyone can edit a page? But again, the power of quantity--thousands or millions of eyeballs identifying errors--swamped any simple attacks.
Sure, but now the established power users are free to insert more subtle attacks. The https://xkcd.com/978/ problem never stopped and the "reliable sources" consideration process allows for considerable political bias.
Most of HN has probably seen this gem about "thinking meat", but in case you haven't: https://www.mit.edu/people/dpolicar/writing/prose/text/think...
> How can meat think?
Some of us used to think that meat spontaneously generated flies. Maybe someday we'll (re-)learn that meat doesn't spontaneously generate thought either?
My light-sensing meat read this as "spontaneously generated files" about 3 times before finally seeing maggots instead of tokens...
I don't give much merit to ideas that demand the existence of Magic Fairy Dust.
And especially not now. Not when LLMs can already do pretty much anything that a human can - and some of those things they can even do well.
I recall trying to use GPT-4 to plan a trip through the PNW in ~Spring of 2023.
It presented a reasonable agenda, however 80% of the rockhounding spots were completely made up!
Over time, and as LLMs have gotten less sycophantic, I’ve found myself trusting them a bit more (a dangerous and slippery slope).
With that said, GPT-4o in particular, seemed to rank user satisfaction above truth.
I’ve found that GPT-5 Pro is currently the best at pushing back against silly ideas, and does a decent job of informing me that my questions could be better (:
As always, trust, but verify! Google maps lists "made up" places or outdated info. AI isn't scouting these locations physically...
Of course, at that point, the real question is, what's the value difference (taking into account personal, external and social costs) between asking chatgpt and /r/rockhounding (or whatever message boards they frequent)? At least if you start a thread on reddit, you might meet other people in the area with the same hobby, find a spot no one's talked about yet, get expert context and leave a trail for others to find.
How is "don't use LLMs as a source of truth" still news today? The machine does work, it doesn't know anything. Let the sucker fetch websites and write code.
I partly share the author's point that ChatGPT users (myself included) can "walk away not just misinformed, but misinformed with conviction". Sometimes I want to criticise aloud, write a post blaming this technology for those colourful, sophisticated, yet empty bullshits I hear from a colleague or read in an online post.
But I always resist the urge. Because I think: Isn't it always going to have some kinds of people like that? With or without this LLM thing.
If there is anything to hate about this technology, for the more and more bullshits we see/hear in daily life, it is: (1) Its reach: More people of all ages, of different backgrounds, expertise, and intents are using it. Some are heavily misusing it. (2) Its (ever increasing) capability: Yes, it has already become pretty easy for ChatGPT or any other LLMs to produce a sophisticated but wrong answer on a difficult topic. And I think the trend is that with later, more advanced versions, it would become harder and take more effort to spot a hidden failure lurking in a more information-dense LLM's answer.
>“the problem with the world is that the stupid are cocksure, while the intelligent are full of doubt.”
Is it me or does everyone find that dumb people seem to use this statement more than ever?
It appears to be a paraphrasing of William Butler Yeats https://en.wikipedia.org/wiki/The_Second_Coming_(poem)
Ugh. You can be cocksure of your doubts. It's still confidence, duh.
Everyone thinks they're the intelligent ones, of course. Which reinforces the repetition ad nauseam of Dunning Kruger. Which is on itself dumb AF because the effect described by Dunning and Kruger has been repeatedly exaggerated and misinterpreted. Which in turn is even dumber because Dunning-Kruger effect is debatable and reproducibility is weak at best.
Yeah, nobody who ever mentions the DK effect (myself included) ever stops to consider they might be in the "dumb" cohort ;)
We are all geniuses!
LLMs basically act as defense attorneys for all your dumbest ideas. It is very easy to assume their confidence in you is justified, especially if you already lean narcissistic.
You now see threads on X of famous people using Grok to explain how smart their ideas are. But there’s a problem: You can literally get it to do that with every single dumb idea.
My opinion: if LLM's speed you up, you're doing it wrong. You have to carefully review and audit every line that comes out of an LLM. You have to spend a lot of time forcing LLM's to prove that the code it wrote is correct. You should be nit-picking everything.
Despite, LLM's are useful. I could write the code faster without an LLM, but then I'd have code that wasn't carefully reviewed line-by-line because my coworkers trust me (the fools). It'd have far fewer tests because nobody forced me to prove everything. It'd have worse naming because every once in a while the LLM does that better than me. It'll be missing a few edge cases the LLM thought of that I didn't. It'd have forest/trees problems because if I was writing the code I'd be focused on the code instead of the big picture.
> You have to carefully review and audit every line that comes out of an LLM. You have to spend a lot of time forcing LLM's to prove that the code it wrote is correct. You should be nit-picking everything.
I'm not sure this statement is true most of the time. This kind of reasoning reminds me of the discussion around 'code correctness'. In my opinion there are very few instances where correctness is really important. Most of the time you just need something that works well enough.
Imagine you have a continuous numeric scale that goes from 'never works' to '100% formal proofs' to indicate the correctness of every piece of software. Pushing your code to the '100% formal proofs' side takes a lot of resources, that could be deployed on other places.
At least for us, every bug that makes it into a release that gets installed on a client computer costs us 100x - 1000x as much as a bug that gets caught earlier.
Cost to fix, yes.
Sometimes getting the new capability around that bug to market faster is worth the tradeoff, because the revenue or market position from the capability with that bug is way more important to the business than the 1000x cost of the fix after distribution.
Most code is not critical like that. A lot of the stuff I write has very little impact if things go wrong and it's easy to tell if it's incorrect.
As long as you have some mechanism to catch the issues before it hits customers. Too many software companies are OK shoveling crap on customers because it's easy to fix it in the field. Yes, it's easy to fix in the field, after you've inconvenienced and wasted the time of thousands of customers.
I start feeling that LLM are hallucinating less than people, no matter the fields, I am at the stage where I trust more code written by an LLM than by a person.
Typically, for the last 2 years, I don't feel that anyone can or bother to read anymore.
I very much agree. I've been telling folks in trainings that I do that the term "artificial intelligence" is a cognitohazard, in that it pre-consciously steers you to conceptualize a LLM as an entity.
LLMs are cool and useful technology, but if you approach them with the attitude you're talking with an other, you are leaving yourself vulnerable to all sorts of cognitive distortions.
It certainly isn't helped by the RLHF and chat interface encouraging this. LLM providers have every incentive to make their users engage it like an other. It was much harder to accidentally do when it was just a completion UI and not designed to roleplay as a person.
I don't think that is actually a problem. For decades people have believed that computers can't be wrong. Why, now, suddenly, would it be worse if they believed the computer wasn't a computer?
The larger problem is cognitive offloading. The people for whom this is a problem were already not doing the cognitive work of verifying facts and forming their own opinions. Maybe they watched the news, read a Wikipedia article, or listened to a TEDtalk, but the results are the same: an opinion they felt confident in without a verified basis.
To the extent this is on 'steroids', it is because they see it as an expert (in everything) computer and because it is so much faster than watching a TED talk or reading a long form article.
It can also dispense agreeable confirmation on tap, with very little friction and hardly any chance of accidentally encountering something unexpected or challenging. Even TED talks occasionally have a point of view that isn't perfectly crafted for each hearer.
been thinking about this for a while - how will society progress when everyone has their own version of "yes man" confirming everything they think of?
This is a problem if 'everybody' is using it but I suspect there will be a few groups. It will be a 'tortoise and the hare' situation.
The LLM folks (the hare) will get the initial upper hand as it appears as though they are moving far faster than others but with limited or wrong actual results. This could change if we can solve the hallucination issue. Yes, they are in personal echo chambers but that can only get you so far when you hit the real world. It will be painful and messy but it will resolve long term. Worse case we end up with a Dune "do not make machines that think like a person".
The slow group (tortoise) are those that do not actively engadge in these things. Yes, this trying to keep up but using much slower mental faculties. I suspect long term they will do better as the fast group fail to deliver. Again if we do not solve issues of LLMs which is not certain.
So long as there is still the slow group, we probably would not go down the dark path of individual echo chambers. Long term, eventually if you trip over the same mental stumbling block, you learn to not do that any more.
I think this is true. It can super charge some bad takes.
But I've had the opposite experience. The average person is never going to read a scientific study, nor invest the time to find out the real details of any topic they are opinionated about other than simply typing a Youtube search and finding a video that is:
- Entertaining - The person has their same biases - the present the information in a short, consumable manner that doesn't require much investment.
In comparison to this dynamic LLMs are wonderful. They can reference scientific data. I have noticed that they do push back on bad takes (very gently) and steer people towards truth.
It's not that I think LLMs are perfect. They are not. But they are infinitely better than the average human at discovering truth.
Use an agent to create something with a non-negotiable outcome. Eg software that does something useful, or fails to, in a language you don’t program in. This is a helpful way to calibrate your own understanding of what LLMs are capable of.
>How often do you think a ChatGPT user walks away not just misinformed, but misinformed with conviction? I would bet this happens all the time. And I can’t help but wonder what the effects are in the big picture.
this is so wrong! i simply can't get ChatGPT to admit something clearly wrong. it can play both sides and gives nuance which is exactly what i expect. but it is so un-sycopanthic that it won't leave you feeling like you are right. any examples of it doing so are welcome! show me examples where it takes a clearly wrong or false idea and makes it look as if it is a good idea (unless you specifically ask it to do it).
they do not have to be. People who seek an idea bubble end up finding one.
I recently asked a leading GenAI chatbot to help me understand a certain physics concept. As I pressed it on the aspect I was confused about, the bot repeatedly explained, and in our discussion, consistently held firm that I was misunderstanding something, and made guesses about what I was misunderstanding. Eventually I realized and stated my mistake, and the chatbot confirmed and explained the difference between my wrong version and the truth. I looked at some sources and confirmed that the bot was right, and I had misremembered something.
I was quite impressed that it didn't "give in" and validate my wrong idea.
I've seen similar results in physics. I suspect LLMs are capable of redirecting the user accurately when there have been long discussions on the web about that topic. When an LLM can pattern-match on whole discussions, it becomes a next-level search engine.
Next, I hope we can somehow get LLMs to distinguish between reliable and less-reliable results.
I wonder if LLMs need to be this way owing to the role of pseudo-intelligent conversation partners they've been shoehorned into or if it's a deliberate choice of the vendors.
It's possible that the Dunning-Kruger effect is not real, only a measurement or statistical artefact [1]. So it probably needs more and better studies.
[1] https://www.mcgill.ca/oss/article/critical-thinking/dunning-...
The author misses the science of emergence. Reductionist views can’t fully explain macro-level capabilities that arise in these systems. Something emerges at higher scales from the possibility space as model sizes grow; they stop being mere “stochastic parrots” or black boxes running simple regressions.
The weights develop their own inherent logic based on how they relate to each other, analogous to how brain waves encode memory at a level higher than individual neuron networks.
Ultimately, the value of AI lies in the imagination of its wielder. The Unknown Unknowns framework is a useful tool for navigating AI effectively (it powerful to help elaborate on Known Unknowns and identify Unknown Unknowns), along with a healthy dose of critical thinking and understanding how reinforcement learning and RLHF work post-pretraining.
8 months or so ago, my quip regarding LLMs was “stochastic parrot.”
The term I’ve been using of late is “authority simulator.” My formative experiences with “authority figures” was a person who can speak with breadth and depth about a subject and who seems to have internalized it because they can answer quickly and thoroughly. Because LLMs do this so well, it’s really easy to feel like you’re talking to an authority in a subject. And even though my brain intellectually knows this isn’t true, emotionally, the simulation of authority is comforting.
And for some of us, it may be an anti-authoritarianism stimulator.
> 8 months or so ago, my quip regarding LLMs was “stochastic parrot.” The term I’ve been using of late is “authority simulator.”
I guess soon we'll hear them called weapons of mass epistemic destruction.
The title makes this incomprehensible. The author seemingly defines Dunning-Kruger as the... opposite of the Dunning-Kruger effect.
The "Dunning-Kruger Effect" Effect: A reference to Dunning-Kruger Effect is almost certainly incorrect.
I'll take big - enterprise AI written code, over offshore any day of the week. I for one welcome AI for this exact use case.
I'm pretty well "on the spectrum" and people glazing me in real life produce suspicion and discomfort rather than any good feelings.
I don't have a problem just ignoring all the LLM glazing, although I'd really like the ability to turn it off.
The fact that they've all been trained to do it, because so many of the "normies" fall for it, is kind of an indictment in my eyes. Bit of a mirror held up to society.
You should probably be worried about how fake flattery works so well in society, and how this enables sociopaths and narcissists to flourish and control everything.
This LLM problem is just a symptom.
From the title I thought this was a repost of 'AI is Dunning-Kruger as a service ' https://news.ycombinator.com/item?id=45851483
It is not.
Just a reminder that the "Dunning-Kruger effect" is probably not real [0].
It makes sense to refer to it as a concept but it's probably not an appropriate assumption to make about people.
[0] https://www.mcgill.ca/oss/article/critical-thinking/dunning-...
This seems to move the idea that you might not understand how skilled you are to some kind of law that ties humility to knowledge more strictly.
Maybe this is my misunderstanding but I don't think the common invocation really took it as a law that the unknowledgeable always think their skills are higher.
>> How often do you think a ChatGPT user walks away not just misinformed, but misinformed with conviction? I would bet this happens all the time.
Why is the "Dunning-Kruger" is not mentioned anywhere in the article body while is gloriously visible in the title? By the way, AI is not wrong "all the time".
"all the time" in this context means frequently, not constantly, and is usually understood as a global rate.
for example: people fall in love all the time.
I don't think you get "frequently" misinformed by ChatGPT. What kind of prompts you are using, that make ChatGPT to give false info very often?
It's remarkably easy to get ChatGPT to hallucinate. A go-to for me is to ask it about factual things as compared to other factual things. The description of the Bluey episode is wrong in nearly every possible way. Additionally, the response continues on analyzing in multiple ways, and again continues to get the fundamental facts and therefore the analysis wrong.
Prompt: Compare the Space episode of Bluey to Interstellar
Response: That’s a really fun and surprisingly fitting comparison! Let’s break down how Bluey’s “Space” episode parallels Christopher Nolan’s Interstellar — both emotionally and thematically — even though one is a 7-minute preschool cartoon and the other a 3-hour sci-fi epic.
1. Core Theme: Love Across Distance
Interstellar: The film centers on a father (Cooper) who leaves his daughter (Murph) to save humanity, exploring how love transcends time and space — literally. The emotional heart of Interstellar is the bond between parent and child that endures despite impossible separation.
Bluey: “Space” In this episode, Bluey, Bingo, and their dad (Bandit) play an imaginative game where they’re astronauts who have to separate to save their mission. The moment Bluey and Bandit must say goodbye echoes that same emotional tension — the pain of letting go but also the love that motivates sacrifice.
Parallel: Both stories use the setting of outer space as a metaphor for emotional distance — how people we love can feel far away, and how connection can bridge that gap.
Humans broadly have a tenuous grasp of “reality” and “truth.” Propagandists, spies and marketers know what philosophers of mind prove all too well: most humans do not perceive or interact with reality as it is, rather their perception of it as it contributes or contradicts their desired future.
Provide a person confidence in their opinion and they will not challenge it, as that would risk the reward of lend you live in a coherent universe.
The majority person has never heard the term “epistemology” despite the concept being central to how people derive coherence. Yet all these trite pieces written about AI and its intersectionality with knowledge claim some important technical distinction.
I’m hopeful that a crisis of epistemology is coming, though that’s probably too hopeful. I’m just enjoying the circus at this point
I hate to comment on just a headline—thought I did read the article—but it's wrong enough to warrant correcting.
This is not what the Dunning-Kruger effect is. It's lacking metacognitive ability to understand one's own skill level. Overconfidence resulting from ignorance isn't the same thing. Joe Rogan propagated the version of this phenomenon that infiltrated public consciousness, and we've been stuck with it ever since.
Ironically, you can plug this story into your favorite LLM, and it will tell you the same thing. And, also ironically, the LLM will generally know more than you in most contexts, so anyone with a degree epistemic humility is better served taking it at least as seriously as their own thoughts and intuitions, if not at face value.
Funnily enough, DK is also not real -- just a statistical artifact of a poorly chosen analysis.
Freely available online information is very often educationally incredibly shallow and commonly oversimplified to the point of being wrong. So of course an agent trained on it would be, too.
Found somewhere on the internet a few days ago: LLMs are Dunning-Kruger as a service.
Edit: it was https://christianheilmann.com/2025/10/30/ai-is-dunning-kruge...
There are so many guardrails now that are being improved daily. This blog post is a year out of date. Not to mention that people know how to prompt better these days.
To make his point, you need specific examples from specific LLMs.
> I feel like LLMs are a fairly boring technology. They are stochastic black boxes. The training is essentially run-of-the-mill statistical inference. There are some more recent innovations on software/hardware-level, but these are not LLM-specific really.
This is pretty ironic, considering the subject matter of that blog post. It's a super-common misconception that's gained very wide popularity due to reactionary (and, imo, rather poor) popular science reporting.
The author parroting that with confidence in a post about Dunner-Krugering gives me a bit of a chuckle.
I also find it hard to get excited about black boxes - imo there's no real meat to the insights they give, only the shell of a "correct" answer
What's the misconception? LLMs are probabilistic next-token prediction based on current context, right?
Yeah, but that's their interface. That informs surprisingly little about their inner workings.
ANNs are arbitrary function approximators. The training process uses statistical methods to identify a set of parameters that approximate the function as best as possible. That doesn't necessarily mean that the end result is equivalent to a very fancy multi-stage linear regression. It's a possible outcome of the process, but it's not the only possible outcome.
Looking at a LLMs I/O structure and training process is not enough to conclude much of anything. And that's the misconception.
> Yeah, but that's their interface. That informs surprisingly little about their inner workings.
I'm not sure I follow. LLMs are probabilistic next-token prediction based on current context, that is a factual, foundational statement about the technology that runs all LLMs today.
We can ascribe other things to that, such as reasoning or knowledge or agency, but that doesn't change how they work. Their fundamental architecture is well understood, even if we allow for the idea that maybe there are some emergent behaviors that we haven't described completely.
> It's a possible outcome of the process, but it's not the only possible outcome.
Again, you can ascribe these other things to it, but to say that these external descriptions of outputs call into question the architecture that runs these LLMs is a strange thing to say.
> Looking at a LLMs I/O structure and training process is not enough to conclude much of anything. And that's the misconception.
I don't see how that's a misconception. We evaluate all pretty much everything by inputs and outputs. And we use those to infer internal state. Because that's all we're capable of in the real world.
Then why not say "they are just computer programs"?
I think the reason people don't say that is because they want to say "I already understand what they are, and I'm not impressed and it's nothing new". But what the comment you are replying to is saying is that the inner workings are the important innovative stuff.
> Then why not say "they are just computer programs"?
LLMs are probabilistic or non-deterministic computer programs, plenty of people say this. That is not much different than saying "LLMs are probabilistic next-token prediction based on current context".
> I think the reason people don't say that is because they want to say "I already understand what they are, and I'm not impressed and it's nothing new". But what the comment you are replying to is saying is that the inner workings are the important innovative stuff.
But we already know the inner workings. It's transformers, embeddings, and math at a scale that we couldn't do before 2015. We already had multi-layer perceptrons with backpropagation and recurrent neural networks and markov chains before this, but the hardware to do this kind of contextual next-token prediction simply didn't exist at those times.
I understand that it feels like there's a lot going on with these chatbots, but half of the illusion of chatbots isn't even the LLM, it's the context management that is exceptionally mundane compared to the LLM itself. These things are combined with a carefully crafted UX to deliberately convey the impression that you're talking to a human. But in the end, it is just a program and it's just doing context management and token prediction that happens to align (most of the time) with human expectations because it was designed to do so.
The two of you seem to be implying there's something spooky or mysterious happening with LLMs that goes beyond our comprehension of them, but I'm not seeing the components of your argument for this.
> But we already know the inner workings.
Overconfident and wrong.
No one understands how an LLM works. Some people just delude themselves into thinking that they do.
Saying "I know how LLMs work because I read a paper about transformer architecture" is about as delusional as saying "I read a paper about transistors, and now I understand how Ryzen 9800X3D works". Maybe more so.
It takes actual reverse engineering work to figure out how LLMs can do small bits and tiny slivers of what they do. And here you are - claiming that we actually already know everything there is to know about them.
> Saying "I know how LLMs work because I read a paper about transformer architecture" is about as delusional as saying "I read a paper about transistors, and now I understand how Ryzen 9800X3D works". Maybe more so.
Which is to say, not delusional at all.
Or else we have to accept that basically hardly anyone "understands" anything. You set an unrealistic standard.
Beginners play abstract board games terribly. We don't say that this means they "don't understand" the game until they become experts; nor do we say that the experts "haven't understood" the game because it isn't strongly solved. Knowing the rules, consistently making legal moves and perhaps having some basic tactical ideas is generally considered sufficient.
Similarly, people who took the SICP course and didn't emerge thoroughly confused can reasonably be said to "understand how to program". They don't have to create MLOC-sized systems to prove it.
> It takes actual reverse engineering work to figure out how LLMs can do small bits and tiny slivers of what they do. And here you are - claiming that we actually already know everything there is to know about them.
No; it's a dismissal of the relevance of doing more detailed analysis, specifically to the question of what "understanding" entails.
The fact that a large pile of "transformers" is capable of producing the results we see now, may be surprising; and we may lack the mental resources needed to trace through a given calculation and ascribe aspects of the result to specific outputs from specific parts of the computation. But that just means it's a massive computation. It doesn't fundamentally change how that computation works, and doesn't negate the "understanding" thereof.
Understanding a transistor is an incredibly small part of how Ryzen 9800X3D does what it does.
Is it a foundational part? Yes. But if you have it and nothing else, that adds up to knowing almost nothing about how the whole CPU works. And you could come to understand much more than that without ever learning what a "transistor" even is.
Understanding low level foundations does not automatically confer the understanding of high level behaviors! I wish I could make THAT into a nail, and drive it into people's skulls, because I keep seeing people who INSIST on making this mistake over and over and over and over and over again.
I'm not sure what claim your disputing or making with this.
What more are LLMs than statistical inference machines? I don't know that I'd assert that's all they are with confidence but all the configurations options I can play with during generation (Top K, Top P, Temperature, etc.) are all ways to _not_ select the most likely next token which leads me to believe that they are, in fact, just statistical inference machines.
What more are human brains than piles of wet meat?
It's not an argument - it's a dismissal. It's boneheaded refusal to think on the matter in any depth, or consider any of the implications.
The main reason to say "LLMs are just next token predictions" is to stop thinking about all the inconvenient things. Things like "how the fuck does training on piles of text make machines that can write new short stories" or "why is a big fat pile of matrix multiplications better at solving unseen math problems than I am".
> What more are human brains than piles of wet meat?
Calculation isn't what makes us special; that's down to things like consciousness, self-awareness and volition.
> The main reason to say "LLMs are just next token predictions" is to stop thinking about all the inconvenient things. Things like...
They do it by iteratively predicting the next token.
Suppose the calculations to do a more detailed analysis were tractable. Why should we expect the result to be any more insightful? It would not make the computer conscious, self-aware or motivated. For the same reason that conventional programs do not.
Do you have, by chance, a set of benchmarks that could be administered to humans and LLMs both, and used to measure and compare the levels of "consciousness, self-awareness and volition" in them?
Because if not, it's worthless philosophical drivel. If it can't be defined, let alone measured, then it might as well not exist.
What is measurable and does exist: performance on specific tasks.
And the pool of tasks where humans confidently outperform LLMs is both finite and ever diminishing. That doesn't bode well for human intelligence being unique or exceptional in any way.