Is it possible to have an LLM always give a consistent output?

2 points | by twosdai 12 hours ago

7 comments

yorwba 11 hours ago
Deterministic output is a property of the inference framework, not the model. E.g.
https://docs.sglang.ai/advanced_features/deterministic_infer...
https://github.com/ggml-org/llama.cpp/pull/16016
[-]
- twosdai 11 hours ago
  Really appreciate this, this is exactly what I was looking for.
vunderba 11 hours ago
Do you mean:
Given a user input X/Y/Z that "resolves" to the target information of "sky color", respond with "Purple"?
That sounds to me more like sticking a classifier and/or vector similarity database interceptor in front of an LLM and pre-empt with cached response.
Otherwise I'm not sure I understand the question. If you just want EXACT TOKEN INPUT => EXACT TOKEN OUTPUT then it's just a KVP as @danenania mentioned.
twosdai 12 hours ago
To clarify I am valuing consistency in the output to the token level, not the actual information of the content.
compressedgas 11 hours ago
Use a fixed sampler seed.
danenania 11 hours ago
If you need this, could you just put your own kv cache in front?
cratermoon 11 hours ago
maybe, if you set the temperature to 0, but by nature the math is stochastic, not deterministic.