Why are neural networks and cryptographic ciphers so similar? (2025)

136 points | by jxmorris12 3 days ago

47 comments

ajb a day ago
I would argue that they are not the same, but there is a symmetry between them.
The central problem of cryptology is to prevent inference about either the key or the plaintext, despite the requirement to be able to reconstruct the plaintext from the ciphertext+key. So ciphers have to almost perfectly mix information.
Machine learning is possible because in the absence of perfect mixing, inference is possible (given many input output pairs), even if the information is many decibels down below the noise. So the information about what parameters need changing is present in the output despite many subsequent layers of processing. This means that a lot of mixing can be tolerated, and it's needed because you don't know in advance what the data flow should look like in detail, so the NN has to provide as many options as possible.
[-]
- dathinab 14 hours ago
  > So ciphers have to almost perfectly mix information.
  yesn't
  most modern stream ciphers basically use XOR for encryption with one time use keys per chunk (like. AES-CTR, AES-GCM, AEGIS, ChaCha20, etc.)
  no mixing of bites is needed there just high entropy uniformly distributed one time use keys being generated per block, i.e. you need a "good enough" PRNG
  practically the easiest way to get them is by doing something similar to a hash on the state(key, nonce, index) in some form. Which is likely done by mixing up information, hence the yes in yesn't.
  but any PRNG with sufficient properties would do, and there probably are some which use some clever math which you probably wouldn't describe as "mix information".
  It's just "shuffling bits" + "bad one way function" is often "sufficient" secure and faster then alternatives.
  And historical many ciphers (e.g. AES block cipher) come from a time where we didn't yet had grate frameworks/know-how about how to assess security properties and write cryptography. Hence why they did all kinds of ways of mixing information and chaining which sometimes is quite.. arbitrary.
  It might be easy to assume AES stuck around as it's "just grate" but that is plain wrong. It stuck around because it spread everywhere (including standards/requirements) before we knew how to best do things and due to that then ended up with hardware acceleration support on most chips. But no one would create it that way anymore (it is prone to side channel attacks if you don't have HW accl. xor use bitslicing trickery which makes it slow). But due to everything having AES hw acceleration it became a very fast building block. Hence why most modern cipher still use (part of) it and even some hashes and other algorithms use it... It's another example of how a "good enough" and wide spread technology often wins, not the best.
  [-]
  - ajb 13 hours ago
    Mmm. It's true that stream cyphers do not need to mix information (of the plaintext) and block cyphers do. I'm not sure I fully agree with your comment, but I'm also not quite sure what you intend to say and it's late at night here. I'd suggest that anyone reading the above make sure they fully understand the different security properties of stream cyphers Vs block cyphers, before dismissing the latter.
- ogogmad a day ago
  ChaCha20 got discovered using a computer search testing out resistance to certain attacks. Hence, the architecture came first and then the parameters came next. Any link with NN gradient descent? It would likely be an abstract one.
  [-]
  - tptacek 21 hours ago
    I don't know how true this is? Salsa20 seems like pretty standard ARX design that builds a hash function in counter mode; there's a detailed paper explaining Bernstein's decisions.
bux93 a day ago
Because both of them are optimized for hardware. Neural networks, despite the name, have very little similarity to biologics.
There's a lot of multiplication of numbers in parallel, so it makes sense to try to fit that to matrices.
Cryptography is built bottom-up, but likewise it makes sense to exploit data structures that already exist in silicon.
[-]
- PxldLtd a day ago
  While modern LLMs are a far cry from biological synapses, I do find it fascinating that if you take the highly reciprocal data of a biological connectome and unroll it into a DAG, you suddenly see motifs popping up that look similar to what we find in AI. I found this both looking at temporal unrolling of RNNs or mapping layer activation weights of a Transformer. Totally agree though, the current LLM architecture itself is driven by the need to shove all of this nicely into parallelized compute hardware.
  [-]
  - krisoft 17 hours ago
    > if you take the highly reciprocal data of a biological connectome and unroll it into a DAG, you suddenly see motifs popping up that look similar to what we find in AI
    That sounds interesting. Where have you heard about that? Or is this your own research?
- amoss a day ago
  In addition both have a property similar to dispersion. In crypto each change to an input bit should cascade through as many output bits as possible. In ML each output bit should depend on as much of the input bits (and hidden layers) as possible. So they both feature a similar maximization of entropy.
  [-]
  - winfieldchen 19 hours ago
    > dispersion [...] maximization of entropy
    This is exactly the point. I was disappointed that I had to scroll so far down the page until I saw the word "entropy." There is a deep connection between machine learning and encryption and compression in information theory. As Shannon demonstrated, the one-time pad's encrypted output is maximum entropy, and so would data compressed to the Shannon limit. Such an optimal compressor learns the underlying probability distribution of the data to represent it with the fewest bits possible, which is exactly the goal of machine learning. A trained ML model can be seen as a lossy compression of the training data. Autoencoding models make the link between ML and compression (and thus encryption) explicit.
- gblargg 9 hours ago
  More simply, most interesting operations will involve "mixing up" of data. There's only so much you can do by applying a bunch of operations in series with a single input value.
tptacek a day ago
I called this out to Thomas Pornin a few months ago (that the forward pass of a neural network and a block cipher rhymed in the sense of being an iterated complex linear function punctuated by a nonlinear function that keeps the whole system from collapsing into linearity) and he intimated that it was not the profound or useful observation that I thought it to be. I feel somewhat vindicated.
tryauuum a day ago
Can anyone recommend any good content to learn cryptography? Like, even if I read the algorithm for AES I have zero understanding about why it works this way
I've finished the Cryptography I on Coursera already. Can't recommend it enough
[-]
- coldstartops a day ago
  I've been through Introduction to Modern Cryptography by Katz and Lindell. Can recommend, as it starts with Caesars cipher, one time pads, and builds towards modern cryptography.
- pet_the_bird a day ago
  "Cryptography Made Simple" By Nigel Smart and "A Graduate Course in Applied Cryptography" by Dan Boneh and Victor Shoup are excellent resources for people that have affinity with Math and CS. The second resource can be a tough read, and I would strongly recommend not skipping the first few chapters.
- yason a day ago
  Back in the day, I read Applied Cryptography (by Schneier) and clarity rained upon many things.
  [-]
  - tptacek a day ago
    More damage has been done by that book than by any Herbert Schildt C language book.
    [-]
    - krupan 12 hours ago
      It's honestly been a long time since I've encountered this attitude around cryptography, where not even Schneier can be trusted and you have no hope of actually understanding cryptography so don't even try (yes, I'm referring to your math comments too). Welcome back!
      For those feeling like this guy is being a dick, that's a normal reaction, but try to understand that this attitude was cultivated and used by enlightened individuals back when many people thought they were making secure software without even salting and hashing their users passwords. People thought md5 was good enough, https was barely being used, people had to be convinced to use ssh instead of telnet and ftp, and so forth. Drilling the idea into people that cryptography and security is difficult and that you should listen to experts had to be done. Don't take it personally.
      So yes, as you study cryptography, do keep in mind that it's extremely unlikely that you'll learn enough to come up with something better on your own, that you will very likely make mistakes if you try to write your own implementation of any cryptographic algorithms, and that you should still just use existing libraries and the recommendations of experts on best practices.
      [-]
      - tptacek 12 hours ago
        I understand this sentiment but don't know what to do with it. "Not even Schneier can be trusted" is "not even wrong". Schneier has very little to do with modern cryptography! But a long time ago, someone created a "Bruce Schneier facts" meme site, and now it's like an article of faith that he's a cryptography engineering expert. No, not so much, and I don't think he'd disagree.
        He's a perfectly nice guy with a lot to say about information security and its intersection with public policy. But I think it's been plural decades since he basically declared himself outside of modern cryptography (you could call it at the point where he said he didn't "trust the math" of elliptic curves, which he left out of Practical Cryptography, over 26 years ago).
        It's not so much that you should or shouldn't take "Applied Cryptography is bad" personally; rather: if you think Applied Cryptography is a useful reference or learning tool, it's pretty important to know that it is not.
    - LPisGood a day ago
      Can you elaborate?
      [-]
      - tptacek 21 hours ago
        It's a book that is much more interested in presenting an almanac-esque survey of everything that was happening in cryptography at the time it was written (also unhelpful: it was written at a particularly un-rigorous point in the evolution of cryptography) than it is in teaching readers how to accomplish anything safely.
    - esafak a day ago
      This is news to me. Is it him in general or just that book?
      [-]
      - tptacek a day ago
        Just that book. The followup (Practical Cryptography, now called Cryptography Engineering, though it's the same book) is much, much better --- though it's also totally out of date at this point, and would also get you in trouble.
- philangist 11 hours ago
  https://mit6875.github.io/ - MIT's Foundations of Cryptography is publicly available with full lectures, lecture note pdfs, and 5 problem sets. It's very rigorous and proof-driven which can be hard at times, but the professor's enthusiasm for the subject is infectious and makes the lectures a pleasure to watch.
- zOneLetter a day ago
  I looked at the recommendations under your comment, but I don't think I'm capable of these either lol
  Any recommendations for a technically competent person, but for someone with math knowledge trailing off at Calc 2?
  [-]
  - krupan a day ago
    The math isn't that difficult once you grok mod math. It's like time, like doing addition and subtraction on a clock. What's 10 + 4 on a clock? 4 hours past 10 is 2.
    [-]
    - tptacek a day ago
      The math stays difficult after basic discrete concepts and gets more difficult as you go. :)
      It's straightforward to get yourself to a place where you can do cryptographic things and feel somewhat comfortable with what's happening. Truly understanding it to the point where you can reason safely about it is deceptively harder.
      [-]
      - mswphd 15 hours ago
        yeah I generally would say that learning about the actual schemes (tends to be) doable by a casual enthusiast, but learning about how the SOTA attacks work (which motivate scheme design for sure) is much more difficult.
    - gucci-on-fleek 15 hours ago
      Hmm, I've studied a lot of math, and I disagree. Cryptography is mostly number theory, which always looks simple on the surface, and often only needs "elementary" tools, but I still find it much harder than other areas of math.
      For example, the proof that there are infinitely many primes looks simple [0], but it's still pretty hard to understand, let alone derive yourself independently. And the other important cryptography/number theory theorems like Euler's totient theorem [1] are even trickier.
      [0]: https://en.wikipedia.org/wiki/Euclid%27s_theorem
      [1]: https://en.wikipedia.org/wiki/Euler%27s_theorem
      [-]
      - mswphd 14 hours ago
        yes, deriving all of the math cryptography depends on independently would not be easy. Fortunately, that's not really how anybody learns.
        Along those lines, you do not need to understand the proof of Euler's totient theorem to understand cryptography. It is a distraction. All you need (at most) is to know that the result is true, and even then it's only fundamentally important for RSA, which you likely shouldn't bother learning about these days. RSA simultaneously
        1. looks very simple (though the simple version is horrendously insecure), and 2. does not have particularly good performance, and 3. does not have particularly good security (either post or pre quantum), and 4. has been in the process of being phased out for quite some time now.
        this is not a good combination of properties. The fact that cryptography textbooks cover it is mostly due to historical tradition. I would personally argue it is time to omit it from instruction materials.
        [-]
        gucci-on-fleek 9 hours ago
        > Along those lines, you do not need to understand the proof of Euler's totient theorem to understand cryptography.
        Well, I had to when I learned cryptography, but I learned it from a class offered by the math department, so I guess that's rather unsurprising :).
        > even then it's only fundamentally important for RSA […] this is not a good combination of properties
        Strong agree here.
  - mswphd 14 hours ago
    in general the math is not actually that hard. It will be things you don't know beforehand, but a general undergraduate cryptography class will not assume the undergraduates have that much of a better math background than you. Typically just
    1. comfort with logical operations/arithmetic over F2 2. discrete probability over finite sets 3. some basic complexity theory (mostly to reason about running time, though being familiar with proofs by reduction can help as well if you actually want to do security proofs).
    a decent idea might be to take some "good" undergraduate cryptography class's course resource and use that. For example, Mihir Bellare is an extremely accomplished cryptographer. The course materials for his undergrad course F2018 are
    https://cseweb.ucsd.edu/~mihir/cse107/slides.html
    He's also written a longer series of lecture notes on cryptography that's freely available. I don't know where it is on his webpage these days, but you can find it below
    https://www.cs.tufts.edu/comp/165/papers/Goldwasser-Bellare-...
    the difficult part with this approach is not being able to ask questions that easily. To "fix" this, you can either
    * use AI, though that has its own issues, or * use some community forum, such as crypto.stackexchange.com
    if you want a full book, the typical (undergradute) one that roughly matches the above syllabus is "An Introduction to Modern Cryptograph" by Katz and Lindell.
    I've also heard good things about Mike Roseluk's the joy of cryptography
    https://joyofcryptography.com/
    Boneh and Shoup have a decent (freely available, and very comprehensive) textbook at the graduate level
    https://toc.cryptobook.us/
    but it is following (roughly) the standard undergraduate curriculum, so if the slides I linked too are too sparse at some point, you could look up that topic in Boneh and Shoup (or use Boneh and Shoup as context to ask an LLM more targeted questions).
    That all being said, the main difficulty for someone in your position is likely determining "what to learn" in cryptography. The easy thing would be to follow the standard undergraduate track, but if you're interested in any particular topic there are likely better routes to take.
- PaulStatezny a day ago
  I would highly recommend the free book Crypto 101.
  https://www.crypto101.io
- rramadass a day ago
  1) Understanding Cryptography by Christof Paar et al. I learnt cryptography from the 1st edition. Its very practical and highly recommended - https://www.cryptography-textbook.com/
  2) Cryptography: Theory and Practice by Douglas Stinson et al. This is a more mathematical treatment and hence a nice complement to the Paar book above - https://www.routledge.com/Cryptography-Theory-and-Practice/S...
  3) For understanding how cryptography is used in Networks see the classic Network Security: Private Communication in a Public World by Radia Perlman et al. The 2nd edition is where i started my journey into network security/cryptography needed for my then job. Highly recommended - https://www.amazon.com/Network-Security-Charlie-Kaufman/dp/0...
  The first two books give you the "mechanisms" (and theory) of cryptography i.e. the building blocks. The last book puts everything together to implement "policies" via practical applications (eg. IPSec/SSL etc.) for the real world. They are complementary and hence should be studied together to get the full picture.
- cumshitpiss a day ago
  https://cs.ru.nl/~joan/papers/JDA_VRI_Rijndael_2002.pdf
  A large part of this book is aimed at the readers who want to know why we designed Rijndael in the way we did. For them, we explain the ideas and principles underlying the design of Rijndael, culminating in our wide trail design strategy.
jdw64 a day ago
Ecologically speaking, there is a term called “carcinization” the evolutionary tendency for different organisms to independently evolve crablike forms.
The condition for carcinization is usually described as a kind of “shared condition.” After reading this article, that is what I felt.
In other words, from the perspective of shared conditions, isn’t it possible that systems receive similar pressures when they need to mix information?
1. There is a state space. 2. Each part of the input affects many parts of the output. 3. A simple rule is not enough, so nonlinearity becomes necessary. 4. But the hardware cannot be allowed to stall, so the system evolves toward a structure where simple transformations are repeated many times.
Ultimately, even across different fields, the core question is how to decompose complexity into atomic units. The choice of those units tends to converge under the pressures imposed by the underlying substrate. This seems to be the central thesis of the article.
This feels similar to how humans solve nonlinear differential equations.
If so, perhaps the structure of human cognition itself works in a similar way: when facing nonlinearity, we break it into smaller structures and design around those smaller parts.
Because my academic background is limited, I find it difficult to express this properly in language. But I think this kind of pressure can also be applied to programming and software theory.
When I think about software engineering, it also often starts from the smallest element that does not change easily, and then builds larger systems by composing those elements. In OOP, that unit is the conceptual object. In FP, it is the function. In DOP, it is data.
FP is mathematical. DOP is aligned with the data that computers store and transmit. OOP is connected to our abstract model of the world. That may be why different people are good at different paradigms.
OOP compresses the world into objects and responsibilities. FP compresses the world into functions and composition. DOP compresses the world into data and transformable structures. Utlimately, it is a question of how we cut complexity, what we choose as the minimal unit of decomposition.
Then what should this idea be called? And if we apply this to AI coding, what would it imply?
I have thoughts, but because I did not study enough, I feel frustrated that I cannot express them more fully. I wish I had learned more.
[-]
- mftb 20 hours ago
  This is a good and useful breakdown. There are lots of ways to get an education and continue learning.
  [-]
  - jdw64 19 hours ago
    Thank you. My goal is also to work hard, earn money, and eventually go to graduate school.
- adampunk 17 hours ago
  There are parts of this that I consider a reach but the whole thing—despite that—feels sensible and looks a useful way to hang some things together which are normally separated.
  These are well-expressed thoughts on a really hard subject.
  [-]
  - jdw64 15 hours ago
    thanks, friend!
gobdovan a day ago
I think the underlying explanation is that both fields deal with very large state spaces, so the forms converge somewhat.
I think the contrast is more interesting: exact discrete trajectories in cryptography versus approximate continuous function approximation in neural networks.
In cryptography, you usually want a state space so large that nobody can accidentally find, reconstruct, or predict the same path you took.
In neural networks, you want an immense initial search space because NNs need to model the real world, which is highly complex and contains patterns that appear unpredictably. One aspect I think is often overlooked is that NNs are mostly deletive: they start with a very broad representational space and become progressively more specific by discarding what the NN perceives as irrelevant distinctions.
I think this puts the article's point about complexity and mixing in a clearer light. The same class of procedures achieves almost opposite effects. In neural networks, you want mixing so the model can approximate many possible paths at once. In cryptography, you want mixing so the path taken is unpredictable and hard to trace. The key difference is that, for NNs, an approximate path can be good enough. In cryptography, an approximate path is as useless as a very distant one.
bit1993 a day ago
Ironic that both Shannon and Turing layed the foundation for both cryptography and AI. I think it boils down to information which is related to language and text.
ghstinda a day ago
are they really? seems not accurate to me, the devil is in the details
[-]
- unnouinceput 21 hours ago
  Of course the devil is in details. This entire article read like an alien visiting Earth and concluding "Humans and Dogs have 84% DNA in common, no wonder they make such a good pair."
postalrat a day ago
Maybe its because they are practical solutions for a branch of mathematics we haven't been able to solve.
krackers 19 hours ago
I'm reminded of this jane street puzzle https://news.ycombinator.com/item?id=47146487
dathinab 14 hours ago
Another big reason is that we use familiar terms, models to describe things. Often leading to very different things, being described in much less different ways. Also the simplified models often used for high level/very abstract explanations have the tendency that extrapolations you do based on them often end up wrong, in a subtle but fundamental way(* 1).
E.g. graphs, matrices, linear algebra systems, digital photos (and much more) can be all perfectly transformed into each other, or in other ways this are all different ways to look at the same data. (This is also not just hypothetical * 2)
As a side effect saying things are similar "because they are just matrix algorithms" is meaningless, because most things are "just a matrix algorithm". (It's also meaningful for the same reason, as it means you can transform many problems into such with well understood solutions.)
And the high level abstraction of "encoder -> state -> decoder" structure is another of such "too generic/meaningless" things. As state can be anything and encoding is just "process input to generate state" and decoder is just "generate output from state" wen can model most algorithms with that. Like the identity function is now `encode(input): state=input; decode(state): output=state` (and indeed as long as state is large enough wrt. the input (or unbound) you can train encode decoder network to do exactly that, as meaningless as that seems.
Similar you can treat everything as everything you have in cryptography itself: All of hash, PRNG, stream cipher can be easily(kinda, * 3) build from another and like most algorithms in existence they can be formulated to "consume data and then produce output", i.e. as encoder decoder pattern ;)
So IMHO it's mostly an combination of observer bias due to how we like to model things (in a very high level POV). And a construction bias from that and what computer can compute well (in a slightly less high level POV when looking at somewhat "arbitrary choices").
I know some of the examples here might sound a bit ridiculous, but it's some of the most important insights in CS:
- a lot of algorithms can be transformed (or modeled as) a lot of other algorithms, take advantage of it
- just because something looks alike, it neither means it is alike nor that it being alike has any deeper meaning (e.g. two graphs might look alike, but only for the subset of sample data you happen to use to plot them)
---
(* 1): Anything related to quantum physics seems be especially bad affected by it.
(* 2): you navigation system might navigate by mapping a navigation graph to a matrix and then compute on the matrix, where steps involved might treat it as a linear algebra system solved using some fast approximation.
(* 3): At least some of the conversion directions are easy. Some are a bit less intuitive. Also this assumes you either have perfect properties on the used building block or don't have to estimate how the imperfect properties map to properties of the created construct... Oh and naturally you can use some simple operations in the transformation (add, xor) as long as they are used in a straight forward way. E.g. PRNG(seed, offset) = HASH(encode(seed, offset)), with encode being a bijektive pairing function (e.g. `bytes_128bit_le(seed)+bytes_64bit_le(offset)`). E.g. for encryption you chunk, xor every chunk with a single time used key and generate the key using HASH(encode(key, nonce, chunk_idx)), or PRNG(encode(key, nonce), chunk_idx). That is how AES-CTR/AES-GCM does encryption. It (roughly) uses XOR(AES(key, encode(nonce, offset)), chunk). (Yes AES is more used like a hash then a block cipher in most modern AES ciphers...)