Given a user input X/Y/Z that "resolves" to the target information of "sky color", respond with "Purple"?
That sounds to me more like sticking a classifier and/or vector similarity database interceptor in front of an LLM and pre-empt with cached response.
Otherwise I'm not sure I understand the question. If you just want EXACT TOKEN INPUT => EXACT TOKEN OUTPUT then it's just a KVP as @danenania mentioned.
Deterministic output is a property of the inference framework, not the model. E.g.
https://docs.sglang.ai/advanced_features/deterministic_infer...
https://github.com/ggml-org/llama.cpp/pull/16016
Really appreciate this, this is exactly what I was looking for.
Do you mean:
Given a user input X/Y/Z that "resolves" to the target information of "sky color", respond with "Purple"?
That sounds to me more like sticking a classifier and/or vector similarity database interceptor in front of an LLM and pre-empt with cached response.
Otherwise I'm not sure I understand the question. If you just want EXACT TOKEN INPUT => EXACT TOKEN OUTPUT then it's just a KVP as @danenania mentioned.
To clarify I am valuing consistency in the output to the token level, not the actual information of the content.
Use a fixed sampler seed.
If you need this, could you just put your own kv cache in front?
maybe, if you set the temperature to 0, but by nature the math is stochastic, not deterministic.