Why LLM decode is memory-bound, not compute-bound

4 points | by harshuljain13 4 hours ago

1 comments