MLX LM 0.20.1 has the comparable speed as llama.cpp with flash attention

1 points | by tosh 9 hours ago

1 comments