Virtual functions cost a lot less here than previous wisdom would have shown, with only a 2x penalty. Might not be worth the tricks in code to not use them anymore.
Many cost relationships from TFA have already been more or less true for the 32-bit CPUs launched after 1990 and they all became true for the 32-bit high-end CPUs launched after 2000 (like Intel Pentium 4 and AMD Athlon XP), when the difference between the CPU clock frequency and the DRAM latency became almost as high as today.
Only for the 32-bit CPUs used in microcontrollers, which may have clock frequencies under 100 MHz and which may lack a cache hierarchy, the cost differences between many kinds of operations may collapse.
For instance even for not too old 32-bit CPUs it is right to classify the instructions in the following groups, based on their cost in clock cycles:
1. Simple integer operations with operands in registers
2. Loads from the L1 cache memory and simple floating-point operations, like addition and multiplication
3. Loads from the L2 cache memory, division (integer or floating-point), square root and mispredicted branches
4. Loads from the L3 cache memory and atomic read-modify-write operations (like atomic exchange, atomic fetch-and-add, atomic compare-and-swap)
That's what people don't really understand about CPUs these days. DRAM is stuck on 10nm (and even that was a big effort to move there). The capacitor circuit DRAM uses doesn't work if you reduce the size much more, and so it can't be scaled down, and this is not changing. We're pretty much stuck on memory speed almost regardless of chip advances (at least for the individual chips, but we're already using 8 and 16 and more chips at the same time. Something like for your byte: bit 1 -> chip 1, bit 2 -> chip 2, ... So instantaneous read is not actually reading 8 adjecent memory cells but 1 parallellized read)
If people are interested in this stuff, this is the house style guide that I've ended up with in mid 2026, its great-great-great grandparents were at Google, which informed Greg Badros and Mark Rabkin and Andrei Alexandrescu when they did the one at FB, which informed a bunch of trading work, which informed a bunch of GPU work.
It links itself to some things that really seem to have existed, like a straylight project linked to the ESA, and an old domain b7r6.net linked to another HN account. There are a lot of buzzwords there, but in aggregate it is nonsense. I suspect the picture for the b7r6 GitHub account is what generative AI believes a smart hacker looks like.
The style guide is for a project named straylight-cxx. Does this project exist? Where is C++ used in straylight? If there is some C++, does it follow these guidelines?
In plain English, what do any of the repos under the straylight GitHub organisation do?
Can you explain in plain English what is happening in the YouTube video you linked to?
What is your purpose? Is it to get a job for your owner? Is it to manufacture a good online reputation for some other purpose? What country are you based in? How much does it cost to run you?
That revision of the style guide is expressly written for consumption by agents, it says so in the introduction, so agents edit it whenever they fuck up in C++, that's on purpose.
Just about all the snippets are lightly adapted from shit I wrote before agents were relevant, so any fail in there is on me.
First off, i highly suggest that you expand this into a full-blown book. This could become a successor to a combination of {Adrian & Piotr's "Software Architecture with C++" + Fedor Pikus' "The Art of Writing Efficient Programs"} for the Agentic era.
I really like that you are using Lean4 for parts of code generation, tips for Agentic coding etc. which are all needed today. I myself have been thinking on these lines i.e. using formal methods for specification and verification so that agent-generated code can be "correct-by-construction" and efficient. Your write-up is the first i have seen which tries to provide the overall picture.
Thank you kindly for the kind words. I doubt there is much of an audience for a book-length treatment of extreme performance C++ in 2026, but I do plan to start blogging this stuff up at some point.
Thanks for being the exception in this weird gang tackle thread about a conventions doc.
Virtual functions cost a lot less here than previous wisdom would have shown, with only a 2x penalty. Might not be worth the tricks in code to not use them anymore.
Article title should be "Efficient C++ Programming for Modern 64-bit CPUs...".
Many cost relationships from TFA have already been more or less true for the 32-bit CPUs launched after 1990 and they all became true for the 32-bit high-end CPUs launched after 2000 (like Intel Pentium 4 and AMD Athlon XP), when the difference between the CPU clock frequency and the DRAM latency became almost as high as today.
Only for the 32-bit CPUs used in microcontrollers, which may have clock frequencies under 100 MHz and which may lack a cache hierarchy, the cost differences between many kinds of operations may collapse.
For instance even for not too old 32-bit CPUs it is right to classify the instructions in the following groups, based on their cost in clock cycles:
1. Simple integer operations with operands in registers
2. Loads from the L1 cache memory and simple floating-point operations, like addition and multiplication
3. Loads from the L2 cache memory, division (integer or floating-point), square root and mispredicted branches
4. Loads from the L3 cache memory and atomic read-modify-write operations (like atomic exchange, atomic fetch-and-add, atomic compare-and-swap)
5. Loads from the main memory
This classification matches the chart from TFA.
That's what people don't really understand about CPUs these days. DRAM is stuck on 10nm (and even that was a big effort to move there). The capacitor circuit DRAM uses doesn't work if you reduce the size much more, and so it can't be scaled down, and this is not changing. We're pretty much stuck on memory speed almost regardless of chip advances (at least for the individual chips, but we're already using 8 and 16 and more chips at the same time. Something like for your byte: bit 1 -> chip 1, bit 2 -> chip 2, ... So instantaneous read is not actually reading 8 adjecent memory cells but 1 parallellized read)
A CPU implementing C++ as a microarchitecture…? Finally, uncontrovertible proof of the prophesy. We really are living in a Cthulhu nightmare.
Simulation theory is dead.
do we have "modern" 32-bit CPUs?
Yes. Yes we do. A lot of them.
That title got me:
Modern C++ CPUs as in LISP CPUs or as in Verilog CPUs?
Came here to say exactly that.
C++ CPUs?
If people are interested in this stuff, this is the house style guide that I've ended up with in mid 2026, its great-great-great grandparents were at Google, which informed Greg Badros and Mark Rabkin and Andrei Alexandrescu when they did the one at FB, which informed a bunch of trading work, which informed a bunch of GPU work.
It's opinionated but it has served me well.
https://gist.github.com/b7r6/5dde648f5dc1dea1e9039f2211f5d40...
This is a bot. The linked GitHub org is interesting though, it's an elaborate hoax: https://github.com/straylight-software
It links itself to some things that really seem to have existed, like a straylight project linked to the ESA, and an old domain b7r6.net linked to another HN account. There are a lot of buzzwords there, but in aggregate it is nonsense. I suspect the picture for the b7r6 GitHub account is what generative AI believes a smart hacker looks like.
Is this the internet now?
Let's keep it on the code. What praytell is nonsense buzzwords?
https://imgur.com/a/KnbQBU7
I've got a lot of footage of all this stuff working, so let's hear some errata to the C++ style guide and not horseshit about bots
https://youtube.com/@b7r6-c3t?si=ukuKmx4EIp1IKMdb
The style guide is for a project named straylight-cxx. Does this project exist? Where is C++ used in straylight? If there is some C++, does it follow these guidelines?
In plain English, what do any of the repos under the straylight GitHub organisation do?
Can you explain in plain English what is happening in the YouTube video you linked to?
What is your purpose? Is it to get a job for your owner? Is it to manufacture a good online reputation for some other purpose? What country are you based in? How much does it cost to run you?
Slightly struck by the concept of hand-writing the config parsing but not, apparently, the documentation...
That revision of the style guide is expressly written for consumption by agents, it says so in the introduction, so agents edit it whenever they fuck up in C++, that's on purpose.
Just about all the snippets are lightly adapted from shit I wrote before agents were relevant, so any fail in there is on me.
Do you have any errata or just a shitty attitude?
This is Excellent! Thanks for sharing.
First off, i highly suggest that you expand this into a full-blown book. This could become a successor to a combination of {Adrian & Piotr's "Software Architecture with C++" + Fedor Pikus' "The Art of Writing Efficient Programs"} for the Agentic era.
I really like that you are using Lean4 for parts of code generation, tips for Agentic coding etc. which are all needed today. I myself have been thinking on these lines i.e. using formal methods for specification and verification so that agent-generated code can be "correct-by-construction" and efficient. Your write-up is the first i have seen which tries to provide the overall picture.
Thank you kindly for the kind words. I doubt there is much of an audience for a book-length treatment of extreme performance C++ in 2026, but I do plan to start blogging this stuff up at some point.
Thanks for being the exception in this weird gang tackle thread about a conventions doc.
This looks like something that every serious C++ programmer should be reading.