I think AI can be a tool to understand a cosebase but needs human insight to turn into real docs.
I have used AI to ask specific questions about a codebase and it has been helpful in narrowing the search base. Think of AI as probable cause, not evidence. It speeds up getting to the truth but cant be trusted as the truth.
This is terrific writing and what we lose when we pretend AI can do terrific writing.
The biggest problem we face right now is that the large majority of people are terrible writers and can't recognize why this is awful. It really felt like the moment before chatgpt arrived we were coming into a new world where the craft of writing was surging in popularity and making a difference. That all feels lost.
If you're upset about these things, the one thing not to do is to empower them by yes-anding the people involved as they debase the meaning of words, in the way the author of this article does:
> I’ve tried it on one of my pet projects and it produced an entire wiki full of dev docs
Did it? No, it didn't. "Wiki" is not a synonym for "project documentation". (You could set up a wiki to manage the documentation for your project. But that's not what any of these things are about.)
Even without introducing LLMs into the equation, I've been brought on as the technical writer for many projects where the team says "oh, we already have a readme, you just need to clean it up" and then all of the readme definitions for parameters or settings or whatever are like:
brickLock: The lock of the brick.
brickDrink: The drink of the brick.
brickWink: The wink of the brick.
...which is to say, definitions that just restate whatever's evident from the code or variable names themselves, and that make sense if you're already familiar with the thing being defined, but don't actually explain their purpose or provide context for how to use them (in other words, the main reasons to have documentation).
My role as a writer is then to (1) extract net-new information out of the team, (2) figure out how all of that new info fits together, (3) figure out the implications of that info for readers/users, and then (4) assemble it in an attractive manner.
An autogenerated code wiki (or a lazy human) can presumably do the fourth step, but it can't do the first three steps preceding it, and without those preceding steps you're just rearranging known data. There are times where that can be helpful, but it's more often just gloss
This is what I wanted to focus on so thanks for starting the convo. This all feels like 100% coverage = perfectly tested, no bugs possible. Nooooo, there needs to be more than that. I lately had a really good readme for a project in a heavy development phase. Basically everything I'd done, every command, every concept, got documented. That's worry about cleanup later. I did not put in every line of code, I put concepts. So when a new person got brought on and asked stuff like "well but how do I change the config?" Bam, it's in the readme. Over and over, every task I had to do, they had to at least consider or understand, so it's in the readme. Of course I did start with a quick-start "how do I use this repo" and only later did "how do I develop this repo" but still, it was all useful because it's what I needed.
It doesn't seem impossible for an LLM to go "hmmm, the way this repo passes configurations around isn't standard. I should focus more on that." But that's a level of understanding I don't think they currently have
> But that's a level of understanding I don't think they currently have
I think they do, at least in some of the cases, especially if it's something well represented in the dataset. I've been surprised sometimes by the insights it provides, and other times it's completely useless. That's one of the problems, it's unreliable, so you have to treat all info it gives you with doubt. But, anyways, at times it makes very surprising and seeming intelligent observations. It's worth at least considering it and thinking it through.
I guess I should try it before dismissing it, but I would be curious to see if it can accurately detect which things we've found workarounds for that need special attention and whatnot.
I think AI can be a tool to understand a cosebase but needs human insight to turn into real docs.
I have used AI to ask specific questions about a codebase and it has been helpful in narrowing the search base. Think of AI as probable cause, not evidence. It speeds up getting to the truth but cant be trusted as the truth.
This is terrific writing and what we lose when we pretend AI can do terrific writing.
The biggest problem we face right now is that the large majority of people are terrible writers and can't recognize why this is awful. It really felt like the moment before chatgpt arrived we were coming into a new world where the craft of writing was surging in popularity and making a difference. That all feels lost.
This kind of post makes me have hope.
Thank you. I'm glad it makes you feel this way.
TIL: ersatz!
If you're upset about these things, the one thing not to do is to empower them by yes-anding the people involved as they debase the meaning of words, in the way the author of this article does:
> I’ve tried it on one of my pet projects and it produced an entire wiki full of dev docs
Did it? No, it didn't. "Wiki" is not a synonym for "project documentation". (You could set up a wiki to manage the documentation for your project. But that's not what any of these things are about.)
These aren't wikis.
You're right. I wrapped "wiki" in quotation marks in the post. Thank you for reminding me. I also added a callout.
Even without introducing LLMs into the equation, I've been brought on as the technical writer for many projects where the team says "oh, we already have a readme, you just need to clean it up" and then all of the readme definitions for parameters or settings or whatever are like:
...which is to say, definitions that just restate whatever's evident from the code or variable names themselves, and that make sense if you're already familiar with the thing being defined, but don't actually explain their purpose or provide context for how to use them (in other words, the main reasons to have documentation).My role as a writer is then to (1) extract net-new information out of the team, (2) figure out how all of that new info fits together, (3) figure out the implications of that info for readers/users, and then (4) assemble it in an attractive manner.
An autogenerated code wiki (or a lazy human) can presumably do the fourth step, but it can't do the first three steps preceding it, and without those preceding steps you're just rearranging known data. There are times where that can be helpful, but it's more often just gloss
This is what I wanted to focus on so thanks for starting the convo. This all feels like 100% coverage = perfectly tested, no bugs possible. Nooooo, there needs to be more than that. I lately had a really good readme for a project in a heavy development phase. Basically everything I'd done, every command, every concept, got documented. That's worry about cleanup later. I did not put in every line of code, I put concepts. So when a new person got brought on and asked stuff like "well but how do I change the config?" Bam, it's in the readme. Over and over, every task I had to do, they had to at least consider or understand, so it's in the readme. Of course I did start with a quick-start "how do I use this repo" and only later did "how do I develop this repo" but still, it was all useful because it's what I needed.
It doesn't seem impossible for an LLM to go "hmmm, the way this repo passes configurations around isn't standard. I should focus more on that." But that's a level of understanding I don't think they currently have
> But that's a level of understanding I don't think they currently have
I think they do, at least in some of the cases, especially if it's something well represented in the dataset. I've been surprised sometimes by the insights it provides, and other times it's completely useless. That's one of the problems, it's unreliable, so you have to treat all info it gives you with doubt. But, anyways, at times it makes very surprising and seeming intelligent observations. It's worth at least considering it and thinking it through.
I guess I should try it before dismissing it, but I would be curious to see if it can accurately detect which things we've found workarounds for that need special attention and whatnot.
Sorry for the tangent but is there a story behind the choice of "brick lock/drink/wink" for your example?
It's so odd and random it seems like there must be more to it.
Previously discussed at https://news.ycombinator.com/item?id=45002092