If you enjoy the science of injecting slowness to determine which component has the largest impact on performance, you would enjoy this work by Emery Berger.
With all the hardware "security" issues discovered in the last few years, CPU designers should provide the possibility to turn off many of hardware features to end up with a brutal in-order basic CPU.
Performance will be destroyed for somewhat more confidence in their "security".
1000x performance loss is what you'd get from turning off the CPU's entire cache hierarchy, not what you'd get from disabling out of order execution. Executing instructions in-order wouldn't make every instruction a cache miss.
>if we step back a few months to Hot Chips 2024, AMD, Intel, and Qualcomm all gave presentations on high performance cores there. All three were eight-wide, meaning their pipelines could handle up to eight micro-ops per cycle in a sustained fashion.
>Zen 5 is the only core out of the three that couldn’t give eight decode slots to a single thread.
If you add Apple and ARM. That is the only core out of the five. I am thinking if Zen 6 will be something different. Right now Intel is iterating like crazy. And Zen 6 is still quite far off.
Will be interesting to see ARM Cortex X5 / X730 with Mediatek Dimensity 9500 on N3 vs Qualcomm Oryon 2 on N3 and also Apple's A19 / M5 on N3 all in 2025.
If you enjoy the science of injecting slowness to determine which component has the largest impact on performance, you would enjoy this work by Emery Berger.
Coz: Finding Code that Counts with Causal Profiling https://arxiv.org/abs/1608.03676
"Performance (Really) Matters" with Emery Berger https://www.youtube.com/watch?v=7g1Acy5eGbE
With all the hardware "security" issues discovered in the last few years, CPU designers should provide the possibility to turn off many of hardware features to end up with a brutal in-order basic CPU.
Performance will be destroyed for somewhat more confidence in their "security".
Yea have a small security processor. I think it makes a lot of sense.
Who would the target audience be? No modern software can run with a 1000x performance loss.
1000x performance loss is what you'd get from turning off the CPU's entire cache hierarchy, not what you'd get from disabling out of order execution. Executing instructions in-order wouldn't make every instruction a cache miss.
Sounds like modern software is broken.
don't worry, AI will fix this
So it's a challenge, then!
OpenBSD
>if we step back a few months to Hot Chips 2024, AMD, Intel, and Qualcomm all gave presentations on high performance cores there. All three were eight-wide, meaning their pipelines could handle up to eight micro-ops per cycle in a sustained fashion.
>Zen 5 is the only core out of the three that couldn’t give eight decode slots to a single thread.
If you add Apple and ARM. That is the only core out of the five. I am thinking if Zen 6 will be something different. Right now Intel is iterating like crazy. And Zen 6 is still quite far off.
Notes to myself:
Will be interesting to see ARM Cortex X5 / X730 with Mediatek Dimensity 9500 on N3 vs Qualcomm Oryon 2 on N3 and also Apple's A19 / M5 on N3 all in 2025.
How is ChipsAndCheese work funded? Their analysis is consistently informative and well-done.
(author here) by free time and curiosity I mean, I have a day job so I'm able to do this as my hobby
That's crazy. Not even a Patreon to fund business expenses and pay for your coffee?!