CUDA-l2: Surpassing cuBLAS performance for matrix multiplication through RL

126 points | by dzign a day ago

15 comments