Considering recent developments in GPU hardware (Tensor Cores for GEMM), another hardware accelerated algorithm is ray tracing for photo-realistic rendering. As far as I understand the Ray Tracing Cores provide an efficient hardware implementation of ray-triangle intersection, pulling data from a Bounded Volume Hierarchy (https://en.wikipedia.org/wiki/Bounding_volume_hierarchy).
Standard ciphers such as AES and SHA comes into my mind. Some processors even have dedicated hardware instructions speed up computations for such ciphers.
You'd be surprised how complex a typical memcpy implementation can get to eke out all the performance out of a platform for all the possible scenarios. And while I agree it might not be considered an algorithm in the strictest sense, in response to OP's question, I think memcpy is an apt comparison.
GEMM has been the workhorse of machine learning. It’s amazing how we’ve ratcheted up the TFLOPs over the years.
I wonder what other algorithms allow hardware optimization like this.
Considering recent developments in GPU hardware (Tensor Cores for GEMM), another hardware accelerated algorithm is ray tracing for photo-realistic rendering. As far as I understand the Ray Tracing Cores provide an efficient hardware implementation of ray-triangle intersection, pulling data from a Bounded Volume Hierarchy (https://en.wikipedia.org/wiki/Bounding_volume_hierarchy).
Standard ciphers such as AES and SHA comes into my mind. Some processors even have dedicated hardware instructions speed up computations for such ciphers.
GEMV, which one could argue is a special case of GEMM.
That's a different beast though: GEMV is memory-bound, since you need one memory access for each operation. GEMM is computation-bound.
memcpy?
Hardly an algorithm...
You'd be surprised how complex a typical memcpy implementation can get to eke out all the performance out of a platform for all the possible scenarios. And while I agree it might not be considered an algorithm in the strictest sense, in response to OP's question, I think memcpy is an apt comparison.