That’s a really fascinating approach! The idea of encoding secret messages into seemingly ordinary text using arithmetic coding and an LLM-based probability model is both clever and innovative. It’s impressive how this method allows for covert communication in public spaces while remaining indistinguishable from typical AI-generated content.
At the same time, it raises interesting discussions about both the ethical and security implications of such technology. Looking forward to seeing how this project evolves!
neat! copying sample messages from the link (they don't have word-wrap so pretty hard to read), those encode "hello world":
> Goodbye 2024! Can't wait to start fresh with a brand new year and a new chance to slay the game in 2025 #NewYearNewMe #ConfidenceIsKey" - @SlayMyGameLife7770 (280 character limit)
> I just ordered my fave coffee from Dunkin' yesterday but I almost spilled it on my shirt, oh no! #DunkinCoffeePlease #FashionBlunders
> Life is just trying to keep up with its favorite gamers rn. Wish I could say I'm coding instead of gaming, but when i have to put down my controller for a sec
Clicked this to tell you that you need to use a bijective arithmetic coder only to find you already were. Good work!
So the next obvious step would be to run the LLM with all exact integer arithmetic so there is no breakage from rounding errors.
The obvious feature gap I see is that it should be possible to provide both the encoder and decoder with a common "context" or prompt to preload the LLM with. The context will help get both the models on the right theme so that their output makes sense in the context that it's shared. These contexts ought to be treated as key material by the users.
So for example, if the users are using an obscure RC boat forum to exchange their hidden messages, they'd add to their context information to get the LLM to produce RC boating content. The context for messages authored by each user can also set out details of the persona for the account they're posting as. And when two parties are carrying on a hidden conversation, they can update the context message by message (adding the prior covertext to it) so that the LLM will carry on a plausible conversation in the covertext.
The extra context material may also serve to help frustrate any statistical analysis that hopes to distinguish the text between human and LLM or between ordinarily sampled llm and
specially sampled LLM. It would be superior for that purpose for the users to use a private fine tune, but that is computationally annoying and the context functionality is needed anyways to allow for coherence in the covertext.
If there is no context provided, it may be useful to hash the user provided password and use that to sample some from the model, then throw away the initial samples and use the remainder as context. The reason to do this is again so the LLM's distribution is not predictable to an attacker. Imagine: The attacker suspect in advance that your system is in use with a particular model. He can run it himself and observe that the output is highly consistent with the distribution the model predicts, and much less consistent with a bigger more accurate model of human text. If the model is pre-primed with secret context its distribution will be less predictable, at least for a while.
You may want to encode the user's data backwards so that it's easier for users to make arbitrary changes to their message to escape bad covertext output. E.g. If a user encodes a message and the LLM output is highly inappropriate and might blow their cover-- the obvious example from INSTRUCT models is idiotic LLM refusals-- they should just vary their message until they find a version where the output is acceptable. But if the encoding is in order they have to vary the beginning of their message if the LLM's screwup is at the beginning. Because you use authenticated encryption it's not like you have an ability to accept partial messages that you'd lose, I think the only downside is some memory overhead.
The user adding the message covertext thread into in context should also help with undesirable LLM output, if you go to encode something and the LLM wants to misbehave, just human author a message-- it will safely not decrypt and then the LLM encode with that new message as part of an updated context may be better behaved.
It might also mitigate against some replay attacks, e.g. where an attacker grabs a earlier message in a discussion and replays it and the user decodes it and thinks its a correct new message...
You should also probably not use instruct models at all-- their probability distributions have MUCH lower entropy than base models on account of the fine tuning, and you can pretty easily detect LLM output vs human output by checking the cross entropy between an arbitrary instruct model and a base model (even ones unrelated to the models used by the users). Instruct models also have obvious tells that are even apparent to humans (delving, refusals, etc). You might have a harder time avoiding NSFW output with a typical base model, especially a small one, but context should help.
You might want to look into using RWKV-LM which has surprisingly good performance at smaller sizes.
Might also be fun to offer to use the an LLM as a compressor for the text before encrypting.
> The obvious feature gap I see is that it should be possible to provide both the encoder and decoder with a common "context" or prompt to preload the LLM with.
There's no option for this in the CLI interface, but you can modify the context in `src/textcoder/coding.py` by changing the `_INITIAL_CONVERSATION` variable. Right now it's configured to ask the LLM to output text in the style of a tweet.
> You may want to encode the user's data backwards so that it's easier for users to make arbitrary changes to their message to escape bad covertext output.
A random 16-byte value is prepended to each message to serve as the KDF salt and the AES-GCM-SIV nonce. This has the additional benefit that every output is always different, even if they encode the same message. So if the covertext is bad, you can just re-run it to get something different.
> Might also be fun to offer to use the an LLM as a compressor for the text before encrypting.
A similar project, https://github.com/harvardnlp/NeuralSteganography does exactly that and achieves very compact results. However, I was a bit weary about compressing before encrypting given the possible security risks associated with that pattern.
That’s a really fascinating approach! The idea of encoding secret messages into seemingly ordinary text using arithmetic coding and an LLM-based probability model is both clever and innovative. It’s impressive how this method allows for covert communication in public spaces while remaining indistinguishable from typical AI-generated content.
At the same time, it raises interesting discussions about both the ethical and security implications of such technology. Looking forward to seeing how this project evolves!
neat! copying sample messages from the link (they don't have word-wrap so pretty hard to read), those encode "hello world":
> Goodbye 2024! Can't wait to start fresh with a brand new year and a new chance to slay the game in 2025 #NewYearNewMe #ConfidenceIsKey" - @SlayMyGameLife7770 (280 character limit)
> I just ordered my fave coffee from Dunkin' yesterday but I almost spilled it on my shirt, oh no! #DunkinCoffeePlease #FashionBlunders
> Life is just trying to keep up with its favorite gamers rn. Wish I could say I'm coding instead of gaming, but when i have to put down my controller for a sec
Clicked this to tell you that you need to use a bijective arithmetic coder only to find you already were. Good work!
So the next obvious step would be to run the LLM with all exact integer arithmetic so there is no breakage from rounding errors.
The obvious feature gap I see is that it should be possible to provide both the encoder and decoder with a common "context" or prompt to preload the LLM with. The context will help get both the models on the right theme so that their output makes sense in the context that it's shared. These contexts ought to be treated as key material by the users.
So for example, if the users are using an obscure RC boat forum to exchange their hidden messages, they'd add to their context information to get the LLM to produce RC boating content. The context for messages authored by each user can also set out details of the persona for the account they're posting as. And when two parties are carrying on a hidden conversation, they can update the context message by message (adding the prior covertext to it) so that the LLM will carry on a plausible conversation in the covertext.
The extra context material may also serve to help frustrate any statistical analysis that hopes to distinguish the text between human and LLM or between ordinarily sampled llm and specially sampled LLM. It would be superior for that purpose for the users to use a private fine tune, but that is computationally annoying and the context functionality is needed anyways to allow for coherence in the covertext.
If there is no context provided, it may be useful to hash the user provided password and use that to sample some from the model, then throw away the initial samples and use the remainder as context. The reason to do this is again so the LLM's distribution is not predictable to an attacker. Imagine: The attacker suspect in advance that your system is in use with a particular model. He can run it himself and observe that the output is highly consistent with the distribution the model predicts, and much less consistent with a bigger more accurate model of human text. If the model is pre-primed with secret context its distribution will be less predictable, at least for a while.
You may want to encode the user's data backwards so that it's easier for users to make arbitrary changes to their message to escape bad covertext output. E.g. If a user encodes a message and the LLM output is highly inappropriate and might blow their cover-- the obvious example from INSTRUCT models is idiotic LLM refusals-- they should just vary their message until they find a version where the output is acceptable. But if the encoding is in order they have to vary the beginning of their message if the LLM's screwup is at the beginning. Because you use authenticated encryption it's not like you have an ability to accept partial messages that you'd lose, I think the only downside is some memory overhead.
The user adding the message covertext thread into in context should also help with undesirable LLM output, if you go to encode something and the LLM wants to misbehave, just human author a message-- it will safely not decrypt and then the LLM encode with that new message as part of an updated context may be better behaved.
It might also mitigate against some replay attacks, e.g. where an attacker grabs a earlier message in a discussion and replays it and the user decodes it and thinks its a correct new message...
You should also probably not use instruct models at all-- their probability distributions have MUCH lower entropy than base models on account of the fine tuning, and you can pretty easily detect LLM output vs human output by checking the cross entropy between an arbitrary instruct model and a base model (even ones unrelated to the models used by the users). Instruct models also have obvious tells that are even apparent to humans (delving, refusals, etc). You might have a harder time avoiding NSFW output with a typical base model, especially a small one, but context should help.
You might want to look into using RWKV-LM which has surprisingly good performance at smaller sizes.
Might also be fun to offer to use the an LLM as a compressor for the text before encrypting.
Thank you for the feedback! Some notes:
> The obvious feature gap I see is that it should be possible to provide both the encoder and decoder with a common "context" or prompt to preload the LLM with.
There's no option for this in the CLI interface, but you can modify the context in `src/textcoder/coding.py` by changing the `_INITIAL_CONVERSATION` variable. Right now it's configured to ask the LLM to output text in the style of a tweet.
> You may want to encode the user's data backwards so that it's easier for users to make arbitrary changes to their message to escape bad covertext output.
A random 16-byte value is prepended to each message to serve as the KDF salt and the AES-GCM-SIV nonce. This has the additional benefit that every output is always different, even if they encode the same message. So if the covertext is bad, you can just re-run it to get something different.
> Might also be fun to offer to use the an LLM as a compressor for the text before encrypting.
A similar project, https://github.com/harvardnlp/NeuralSteganography does exactly that and achieves very compact results. However, I was a bit weary about compressing before encrypting given the possible security risks associated with that pattern.