Lossless LLM compression for efficient GPU inference via dynamic-length float

411 points | by CharlesW 8 months ago

121 comments