Nice results! SIMD can be a pain, good to know Zig makes it easy.
However, note that the plot under "Native SIMD Throughput Comparison" is extremely misleading: for an accurate proportional comparison between bar charts, you should start the y-axis at zero. The way the data are presented makes it look like a 10-100x gain, rather than the actual 2x improvement.
I was going to comment the same. I saw the huge difference and went "wow", then read that it was a 2x improvement and had to check the axes properly, thinking "slightly less wow". It reminds me of that barchart of women's average heights in different countries that starts at 5 feet https://preview.redd.it/dohqa8l94kb41.png?auto=webp&s=865180...
It is funny how we often assume we need a graphics card for these kinds of calculations when a standard processor is actually plenty fast. The specific changes to the memory layout seemed to make the biggest difference here by allowing the hardware to actually use its vector capabilities.
Nice results! SIMD can be a pain, good to know Zig makes it easy.
However, note that the plot under "Native SIMD Throughput Comparison" is extremely misleading: for an accurate proportional comparison between bar charts, you should start the y-axis at zero. The way the data are presented makes it look like a 10-100x gain, rather than the actual 2x improvement.
I was going to comment the same. I saw the huge difference and went "wow", then read that it was a 2x improvement and had to check the axes properly, thinking "slightly less wow". It reminds me of that barchart of women's average heights in different countries that starts at 5 feet https://preview.redd.it/dohqa8l94kb41.png?auto=webp&s=865180...
I've never seen SIMD code before, and this is quite a nice little intro into that and Zig.
It is funny how we often assume we need a graphics card for these kinds of calculations when a standard processor is actually plenty fast. The specific changes to the memory layout seemed to make the biggest difference here by allowing the hardware to actually use its vector capabilities.