DSpark: Speculative decoding accelerates LLM inference [pdf]

670 points | by aurenvale 10 hours ago

264 comments