> While playing around with this setup, I tried re-training the network with the activation function for the first layer replaced with sin(x) and it ends up working pretty much the same way.
There is some evidence that the activation functions and weights can be arbitrarily selected assuming you have a way to evolve the topology of the network.
This seems interesting, but I got stuck fairly early on when I read "all 32,385 possible input combinations". There are two 8 bit numbers, 16 totally independent bits. That's 65_536 combinations. 32_285 is close to half that, but not quite. Looking at it in binary it's 01111110_10000001, i.e. two 8 bit words that are the inverse of each other. How was this number arrived at, and why?
Looking later there's also a strange DAC that gives the lowest resistance to the least significant bit, thus making it the biggest contributor to the output. Very confusing.
Is that the number of adds that don’t overflow an 8-bit result?
On that hunch, I just checked and I get 32896.
Edit: if I exclude either input being zero, I get 32385.
You also get the same number when including input zeros but excluding results above 253. But I’d bet on the author’s reason being filtering of input zeros. Maybe the NN does something bad with zeros, maybe can’t learn them for some reason.
Interesting puzzle. 32385 is 255 pick 2. My guess would be, to hopefully make interpretation easier, they always had the larger number on one side. So (1,2) but not (2,1). And also 0 wasn’t included. So perhaps their generation loop looks like [[(i,j) for j (i-1 -> 1) for i (256 -> 1)]
>As I mentioned, before, I had imagined the network learning some fancy combination of logic gates to perform the whole addition process digitally, similarly to how a binary adder operates. This trick is yet another example of neural networks finding unexpected ways to solve problems.
My intuition is that this solution allows for some form of gradient approach to a solution, which is why it's unintuitive. We think about solutions as all or nothing and look for complete solutions.
Right, binary gates are discrete elements but neural networks operate on a continuous domain.
I'm reminded of the Feynman anecdote when he went to work for Thinking Machines and they gave him some task related to figuring out routing in the CPU network of the machine, which is a discrete problem. He came back with a solution that used partial differential equations, which surprised everyone.
> While playing around with this setup, I tried re-training the network with the activation function for the first layer replaced with sin(x) and it ends up working pretty much the same way.
There is some evidence that the activation functions and weights can be arbitrarily selected assuming you have a way to evolve the topology of the network.
https://arxiv.org/abs/1906.04358
This seems interesting, but I got stuck fairly early on when I read "all 32,385 possible input combinations". There are two 8 bit numbers, 16 totally independent bits. That's 65_536 combinations. 32_285 is close to half that, but not quite. Looking at it in binary it's 01111110_10000001, i.e. two 8 bit words that are the inverse of each other. How was this number arrived at, and why?
Looking later there's also a strange DAC that gives the lowest resistance to the least significant bit, thus making it the biggest contributor to the output. Very confusing.
Is that the number of adds that don’t overflow an 8-bit result?
On that hunch, I just checked and I get 32896.
Edit: if I exclude either input being zero, I get 32385.
You also get the same number when including input zeros but excluding results above 253. But I’d bet on the author’s reason being filtering of input zeros. Maybe the NN does something bad with zeros, maybe can’t learn them for some reason.
Interesting puzzle. 32385 is 255 pick 2. My guess would be, to hopefully make interpretation easier, they always had the larger number on one side. So (1,2) but not (2,1). And also 0 wasn’t included. So perhaps their generation loop looks like [[(i,j) for j (i-1 -> 1) for i (256 -> 1)]
You are potentially conflating combinations with permutations.
Original submission: https://news.ycombinator.com/item?id=34399142
>As I mentioned, before, I had imagined the network learning some fancy combination of logic gates to perform the whole addition process digitally, similarly to how a binary adder operates. This trick is yet another example of neural networks finding unexpected ways to solve problems.
My intuition is that this solution allows for some form of gradient approach to a solution, which is why it's unintuitive. We think about solutions as all or nothing and look for complete solutions.
Right, binary gates are discrete elements but neural networks operate on a continuous domain.
I'm reminded of the Feynman anecdote when he went to work for Thinking Machines and they gave him some task related to figuring out routing in the CPU network of the machine, which is a discrete problem. He came back with a solution that used partial differential equations, which surprised everyone.
The more interesting question is is it even possible to learn the logic gates solution through gradient descent?