Communicating Sequential Processes (Hoare),
The Next 700 Programming Languages (Landin),
As We May Think (Bush),
Can Programming Be Liberated from the von Neumann Style (Backus)
And this seems to be a cool course: https://canvas.harvard.edu/courses/34992/assignments/syllabu...
> This course examines papers every computer scientist should have read, from the 1930s to the present. It is meant to be a synthesizing experience for advanced students in computer science: a way for them to see the field as a whole, not through a survey, but by reliving the experience of its creation. The idea is to create a unified view of the field of computer science, for students who already know something about it, by replaying its entire evolution at an accelerated frame rate.
I actually found this to be an odd mix. Are we selecting papers that had an influence on computer science (as in, the theory of computation), or that had an impact on technology? Or are we just using CS as a catch-all for "all things computer"?
The Turing paper is foundational for CS, but without it, would the technology have evolved differently? Probably not. Most software engineers have not read it. Conversely, the IP standard is a technological cornerstone, but there's hardly any science in it. It's just a specification of a fairly simple protocol that you need to know when doing almost anything network-adjacent.
Something doesn't feel quite right to me seeing the PageRank paper in a short list alongside Turing and Shannon's foundational work on computability and information theory. “On Computable Numbers, with an Application to the Entscheidungsproblem” is almost 90 years old at this point and just as true and relevant now as it was then. Is PageRank even as relevant today as it was in 1998, let alone another 50 years from now?
We can’t estimate what will be relevant 50 years from now.
If quantum or biological computing find some success, none of these are going to be relevant.
That said, pagerank is important to understand the modern tech world. Search is something we take for granted, and it’s great there’s a deeply mathematical (and somewhat counterintuitive) theory behind it.
I wonder how pagerank was influential to CS as a field?
even mapreduce is more a rally clever technique than a boundary pushing or boundary identifying extension of the field. unlike, say, CSP --- which is missing in the list.
still unlike the conciseness and structure of the list. it could evolve into a nice book :-D
My understanding of the term "history of CS" would be "how CS evolved", as a field. How do we think about "processing 'data' with computers". what can we compute? what are limitations in how fast we can compute? how to we talk about and present algorithms, prescriptions for these computations? how do we talk about data and how we structure it (leading to SQL, and sgml/xml/JSON)?
Pagerank in contrast is a very specific type of breakthrough, a breakthrough for it's application domain. But it's not a breakthrough for how we compute or how we think about computation.
The big insight of the PageRank paper is that you can use MC integration to approximate PR, not PR itself. Hence making the problem much easier to distribute.
I've read five of of the seven papers on the list. The two I haven't read are Cerf and Kahn's, and Berner-Lee's.
Turing's paper on computability was particularly hard to follow, for me, because he used these gothic-font upper-chase characters to name all sorts of objects, and all those characters looked kinda the same to me! I had to use auxiliary materials to be able to make my way through the paper. Today, I would recommend reading it with Charles Petzold's easy-to-follow book on the paper: https://www.amazon.com/Annotated-Turing-Through-Historic-Com...
Cook's paper on NP-completeness was also hard to follow (again, for me). As with Turing's paper, I had to use auxiliary materials to make my way. Today, I would recommend reading instead an introductory book on computational complexity that works through Cook's proof.
Shannon's paper is a work of art, clearly articulated and beautifully written. It's just not casual reading, to put it mildly.
Brin and Page's paper, and Codd's paper, are not hard to follow, at least as I remember them, but understanding Brin and Page's work requires some knowledge of Linear Algebra.
Where does the Brin-and-Page paper require linear algebra? It mentions "eigenvector" once, in a tangential remark. The "simple iterative algorithm" is how you find the fixed point of any contraction mapping, linear or not. Knowing that it is also an eigenvector is just a distraction -- you aren't going to use Gaussian elimination, not if you know what is good for you.
It doesn't require linear algebra to understand the paper or how the algorithm works, but it does require linear algebra to understand why the algorithm works. In general, since the induced 1-norm of a stochastic matrix S is exactly equal to 1 but not smaller than 1, the mapping x↦Sx is NOT a contraction. Neither convergence of the power method nor uniqueness of fixed point are guaranteed. (If there are multiple fixed points, there are multiple inconsistent rankings.)
In the paper, the significance of the so-called "damping factor" is not clear. However, with linear algebra, we know that the damping factor makes the stochastic matrix positive rather than merely nonnegative. Hence the Perron eigenvalue is "simple" (i.e. of multiplicity one), the Perron vector is unique and the power iteration must converge to it.
That's easier explained using fixed-point theory: The damping factor makes the mapping x↦Sx into an actual contraction (on the space of probability distributions). Not to mention that it has a simple common-sense justification (you don't want to get stuck in a subnetwork that only links to itself, or drown out a subnetwork that has more outlinks than inlinks).
There is probably some gain from understanding the algorithm specifically as a Markov chain iteration (if nothing else, it provides a great example for Markov chain iteration), but I think it's perfectly possible -- and easier -- to understand it as a fixed-point iteration on a compact space. And I am someone who does algebra for a living and normally explains everything algebraically if ever possible...
Yes, but the paper blathers on and on in prose about history and related work and goals and future work. It might only mention eigenvector once in a tangential remark, but that remark is like 80% of the algorithm content of the paper.
Oh man, if you think Shannon's A Mathematical Theory of Communication is his most foundational contribution to computer science, you haven't seen his master's thesis from a decade prior.
> He sketches out a hypothetical “Turing Machine,” proving that, if something is computable at all, a machine (in principle) can handle it.
That's not what Turing proved. Instead, what he proved in his paper was that there are some problems which aren't solvable by Turing Machines (and therefore presumably by any machine). That's the Entscheidungsproblem (decision problem) referenced in the title.
What TFA references is the so-called Church-Turing-Thesis, which is exactly that, a thesis. It can't really be proven although we have very strong reason to believe it given that in almost 100 years nobody has found a system of computation more powerful than Turing Machines.
And in 100 years we have gone completely the opposite direction in terms of where we think artificial intelligence lies. Rather than constructing a machine with all the known truths, modern searching for artificial intelligence using machines is mostly sifting through human commentary to create an organized feuilleton machine versus Leibniz's axiomatic approach
No, Turing had precisely that approach of feeding the machine with learning material in mind[1], but you have to build a machine apt to consume generic knowledge body before you throw at it anything.
There have also been attempts to canonicalize knowledge on the web (semantic web) and lots of things inside of computers are in fact deterministic.
But that is not the direction that AI seems to be taking, it is "smart" because it does a good job of parroting humans that sound "smart", not because it "knows" truths.
I beleive you are wrong that "nobody has found a system of computation more powerful than Turing Machines". A turing machine can not perform indeterminacy, however, the actor model can.
Non-deterministic Turing machines [1] are the standard way to define Non-deterministic complexity classes like NP or NEXP, so there are definitely Turing machines with indeterminacy.
I read that sentiment here a few years ago but couldn't get anything more out of it than actors can race, but a turing machine is determistic. I could very well have it wrong.
If you were computing with actors, and you also had a sufficiently-detailed spec about the actor model, is there some particular algorithm you could not compute by just executing a TLA+ spec of your actor algorithm using Turing-ish software?
Since everyone likes chiming in with their own additions to the list, here's mine:
While Cook was the first to introduce NP-completeness, Karp's paper presenting 21 problems that could be reduced polynomially to 3SAT was also an enormeous cornerstone that helped kick off a more general interest in Cook's theory.
It's not papers but I would give special mention to Why Software Is Eating the World by Marc Andreessen and Amazon's original 1997 letter to shareholders.
"Companies in every industry need to assume that a software revolution is coming. This includes even industries that are software-based today."
Just wrote a blog about the explosion of papers titled after Attention Is All You Need [1]. Also figured out the name probably didn’t originate from one of the authors.
To my understanding, the underlying issue is the way to structure code in a maintenance friendly way. It’s just very easy to go awry with unrestricted wild goto. There are more often than not some alternatives control flows which are easier to mentally follow. And things like label in Java[1] already capture most of relevant cases in which a generic goto statements might feel like a valid approach. This doesn’t mean that there is absolutely no case where a goto might be the most elegant easiest way to implement something, but that few cases are exceptional.
I mean, no one feels like using a laser is a proper way to cut butter, but using lasers is sometime the best cutting accurate option.
"Modern" goto (well, there aren't many modern languages with goto, but anyway) is semi-structured. Most importantly, it's local: you cannot jump into or out of subroutines, which was Dijkstra's major peeve. Using goto for backwards jumps is also usually discouraged post-Dijkstra.
Solid list. Two that influenced me personally were:
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE transactions on evolutionary computation, 1(1), 67-82.
And the corresponding search paper. Got me started in search and optimization (and Prolog).
Licklider, J. C. (1960). Man-computer symbiosis. IRE transactions on human factors in electronics, (1), 4-11.
More of a philosophical outlook but the thought of man-computer symbiosis instead of "computer solves it" has stuck with me (and is quite relevant in this day and age).
I would also add J. Ziv and A. Lempel, "A Universal Algorithm for Sequential Data Compression", 1977 [1]. LZ (Lempel-Ziv) is the foundation of many data compression algorithms that are still in use today.
Does anyone actually use Paxos in real life? And let me ask the same question for all other academic distributed algorithms right away. I recall seeing a study some 10 years ago checking the major cloud providers for Byzantine fault tolerance and finding none of them to exhibit the qualities that the known algorithms would guarantee; apparently they would just rely on timing and hoping things don't get too twisted.
Yes, it's very widely used at Google through Chubby, which underpins many core pieces of infrastructure, including name resolution. (It used to be common practice to depend more directly on Chubby for synchronization of state via shared global files, but that fell out of favor about 6 years ago due to reliability risks associated with just blasting out changes globally without any canarying.)
> I recall seeing a study some 10 years ago checking the major cloud providers for Byzantine fault tolerance and finding none of them to exhibit the qualities that the known algorithms would guarantee.
I'm curious which study you're referring to. I could believe that while Paxos might be used as a building block, it might not be used in a consistent manner throughout.
Also, note that not all of the variations of Paxos handle Byzantine faults.
Yes, if you’re using any sort of multi master db, you’re using a variant of paxos, raft or something which you shouldn’t really trust until aphyr blasts it into the low earth orbit.
The paper itself is very approachable and worth spending an hour or so on.
I remember trying to read the paper and giving up somewhere early in the proof. I certainly don't think it gives a great intuition why the algorithm holds, so "approachability" is a matter of definition (does it tell a fun story? sure yeah).
It would be interesting to see the opposite of this; which papers are really interesting and look useful, but did not end up having a significant impact?
I guess if we all add our favourite papers, we’ll end up with a very long list indeed. My own additions:
As we may think by Vannevar Bush, and then something on Project Xanadu. There doesn’t seem to be any foundational paper on the latter, but there is Ted Nelson’s book Literary Machines.
I read Shannon’s paper every year and it gives me new insights every single time. It transcends computer science. And Claude was one of the coolest “nerds” I’ve met.
This is the arch-rival of Unix+C, and also one of his best CS friends (GNU Emacs it's still there, and it was widely used on Unixen, among reusing GNU (Unix clone) tools).
I'd argue both as well as McCulloch & Pitts. Maybe Boltzmann or Rummelhart (Backprop) as well. Honestly, I wouldn't know where to stop, there are so many cool papers.
Yeah. But before AlexNet GPUs were only for graphics and esoteric papers in scientific computing. The realization that conv layers map well to cuda cores has led to GPU production being a national security issue.
If this is up your alley, Ideas That Created the Future[1] has like 40 more, decent introductions to their ideas, and provides a sort of thematic throughline from Aristotle to the modern computer science.
I think any list like this has to include:
"Time, clocks, and the ordering of events in a distributed system" By Lamport also, in almost any networked system having strict ordering is fantastically useful. And of course how could we also forget "The Byzantine generals problem".
We have Shannon's "Communication Theory of Secrecy Systems" as arguably the beginning of modern cryptography and then Diffie & Hellman's "New Directions in Cryptography" which first introduced public-key cryptography.
BTC is just combining all the past research into an application, which has it's own place but sadly not here. You might wanna read this [0] for all the past ideas that satoshi took
I'm not so sure. In my (under-read) mental model, blockchain takes you from fail-stop fault-tolerance (a la Paxos) to Byzantine fault-tolerance, i.e. how do you compute in a massively distributed system when no node has any reason to trust any other node.
Especially the Codd paper (what if Codd had died in World War 2?) makes me wonder: What are the non-obvious (kind of arbitrary even) yet extremely helpful concepts that no one has thought of or that didn't find the resonance that they should have?
I mean widely useful things, maybe not very obvious things, that no one has thought of yet or that are rotting in obscurity. If Codd hadn't basically invented relational databases, would we even know what we're missing?
For example: Rust took a bunch of ideas from some research languages that one guy did 1-3 decades ago at the time. These ideas have probably always been good and useful, but they weren't put to use, not all of them at least.
Not a single woman?
No Barbara Liskov - Programming with abstract data types?
No Grace Hopper - The education of a computer?
No Frances Allen - Program optimization?
Should the work of women be on that list for the sole reason that they are women? There are many more men who have written papers far influential than the ones you've mentioned yet they didn't make the list. If you believe in equality, then you have to believe that the work of people who happen to be women can compete on their own merit. The absence of women in that list isn't necessarily evidence of bias as implied in your remark.
I'd put Liskov's Programming with abstract data types up against any of them. Fran Allen's work was so fundamental it's hard to find compiler stuff that doesn't build on her work.
You asked for "citations", the thread is literally filled with references to them. How is it not bad faith to have to prove to you things that you can easily check for yourself?
You misunderstood the request. Your original comment was claiming that there were many papers far more influential than any of the papers named that were by women. I was requesting evidence of this influence. In response you say that what, all of the references filling this thread are more influential than say Liskov or Allen? If not all, which ones?
The original comment you were responding to was pointing out that none of the papers listed were by women, and suggested several that were that are undeniably influential. Perhaps you think they aren't because you haven't read them, or presumably even heard of them?
I don’t think representation needs to be a thing for a personal list on a blog. Government? Absolutely. Corporate? maybe. That said, of course there have been many critical female contributions in the field. However, it’s also a numbers game since CS academia has been very gender/sex lopsided to this day. So production would represent that (sad) reality.
I would add "On the Criteria to Be Used in Decomposing Systems into Modules" (1972, ~7800 citations) by Parnas, though more software engineering than computer science in a strict sense.
Ha, me too! When the first title included Entscheidungsproblem I thought this would be intellectual edgelording. But, this is a legit list.
Of course, you can't do justice to the entire field with such a short list. Two papers on web but none on graphics, e.g. But it's a fine reading list for sure.
Come on, we now have systems that can believably answer arbitrary questions in human language. This is literally what I dreamed of when I got into computing like 25 years ago, and would be considered science fiction only 5 years ago. As a side effect, entire tasks as important as machine translation and summarization have pretty much been solved for major languages.
Regardless of whether you buy the full hype or you think they're just stochastic parrots, I think it more than qualifies to make the second list (and probably the first, but I get that there's no perspective to be so sure about that).
The paper itself (as a paper, i.e. an explanation of the underlying results) is quite bad, by the way. It's better to learn about Transformers from one of the many good blog posts. But that doesn't detract from its influence.
Communicating Sequential Processes (Hoare), The Next 700 Programming Languages (Landin), As We May Think (Bush), Can Programming Be Liberated from the von Neumann Style (Backus)
And this seems to be a cool course: https://canvas.harvard.edu/courses/34992/assignments/syllabu... > This course examines papers every computer scientist should have read, from the 1930s to the present. It is meant to be a synthesizing experience for advanced students in computer science: a way for them to see the field as a whole, not through a survey, but by reliving the experience of its creation. The idea is to create a unified view of the field of computer science, for students who already know something about it, by replaying its entire evolution at an accelerated frame rate.
Seems like you need to have a Harvard account to see the lectures(?)
I actually found this to be an odd mix. Are we selecting papers that had an influence on computer science (as in, the theory of computation), or that had an impact on technology? Or are we just using CS as a catch-all for "all things computer"?
The Turing paper is foundational for CS, but without it, would the technology have evolved differently? Probably not. Most software engineers have not read it. Conversely, the IP standard is a technological cornerstone, but there's hardly any science in it. It's just a specification of a fairly simple protocol that you need to know when doing almost anything network-adjacent.
Something doesn't feel quite right to me seeing the PageRank paper in a short list alongside Turing and Shannon's foundational work on computability and information theory. “On Computable Numbers, with an Application to the Entscheidungsproblem” is almost 90 years old at this point and just as true and relevant now as it was then. Is PageRank even as relevant today as it was in 1998, let alone another 50 years from now?
We can’t estimate what will be relevant 50 years from now.
If quantum or biological computing find some success, none of these are going to be relevant.
That said, pagerank is important to understand the modern tech world. Search is something we take for granted, and it’s great there’s a deeply mathematical (and somewhat counterintuitive) theory behind it.
The foundations of computation won't change with quantum/biological computers. Turing machines stay as relevant as ever.
indeed this was itching me, too.
I wonder how pagerank was influential to CS as a field?
even mapreduce is more a rally clever technique than a boundary pushing or boundary identifying extension of the field. unlike, say, CSP --- which is missing in the list.
still unlike the conciseness and structure of the list. it could evolve into a nice book :-D
If you haven’t read the pagerank paper, you should. It’s not an obvious thing.
Agreed about map reduce though.
thanks for the nudge :-)
My understanding of the term "history of CS" would be "how CS evolved", as a field. How do we think about "processing 'data' with computers". what can we compute? what are limitations in how fast we can compute? how to we talk about and present algorithms, prescriptions for these computations? how do we talk about data and how we structure it (leading to SQL, and sgml/xml/JSON)?
Pagerank in contrast is a very specific type of breakthrough, a breakthrough for it's application domain. But it's not a breakthrough for how we compute or how we think about computation.
does that make sense?
Hoare's paper is in the list, no?
surprisingly it's not, indeed.
This is why I never read the links, and always go comment first.
What joke of a list :-)
An important part of historiography is considering documents and artifacts in the context of their time.
We don't use cuneiform these days, but back in its day, it was as close to a standard writing system as it was possible to get.
Will it be directly influential in 50 years from now on? Maybe no.
But, indirectly looking, it influenced establishment of Google, which influenced many thousands of innovations in this field.
So yes, PageRank is hugely influential in my opinion.
The big insight of the PageRank paper is that you can use MC integration to approximate PR, not PR itself. Hence making the problem much easier to distribute.
Great list of papers.
I've read five of of the seven papers on the list. The two I haven't read are Cerf and Kahn's, and Berner-Lee's.
Turing's paper on computability was particularly hard to follow, for me, because he used these gothic-font upper-chase characters to name all sorts of objects, and all those characters looked kinda the same to me! I had to use auxiliary materials to be able to make my way through the paper. Today, I would recommend reading it with Charles Petzold's easy-to-follow book on the paper: https://www.amazon.com/Annotated-Turing-Through-Historic-Com...
Cook's paper on NP-completeness was also hard to follow (again, for me). As with Turing's paper, I had to use auxiliary materials to make my way. Today, I would recommend reading instead an introductory book on computational complexity that works through Cook's proof.
Shannon's paper is a work of art, clearly articulated and beautifully written. It's just not casual reading, to put it mildly.
Brin and Page's paper, and Codd's paper, are not hard to follow, at least as I remember them, but understanding Brin and Page's work requires some knowledge of Linear Algebra.
Thank you for sharing this on HN.
Where does the Brin-and-Page paper require linear algebra? It mentions "eigenvector" once, in a tangential remark. The "simple iterative algorithm" is how you find the fixed point of any contraction mapping, linear or not. Knowing that it is also an eigenvector is just a distraction -- you aren't going to use Gaussian elimination, not if you know what is good for you.
It doesn't require linear algebra to understand the paper or how the algorithm works, but it does require linear algebra to understand why the algorithm works. In general, since the induced 1-norm of a stochastic matrix S is exactly equal to 1 but not smaller than 1, the mapping x↦Sx is NOT a contraction. Neither convergence of the power method nor uniqueness of fixed point are guaranteed. (If there are multiple fixed points, there are multiple inconsistent rankings.)
In the paper, the significance of the so-called "damping factor" is not clear. However, with linear algebra, we know that the damping factor makes the stochastic matrix positive rather than merely nonnegative. Hence the Perron eigenvalue is "simple" (i.e. of multiplicity one), the Perron vector is unique and the power iteration must converge to it.
That's easier explained using fixed-point theory: The damping factor makes the mapping x↦Sx into an actual contraction (on the space of probability distributions). Not to mention that it has a simple common-sense justification (you don't want to get stuck in a subnetwork that only links to itself, or drown out a subnetwork that has more outlinks than inlinks).
There is probably some gain from understanding the algorithm specifically as a Markov chain iteration (if nothing else, it provides a great example for Markov chain iteration), but I think it's perfectly possible -- and easier -- to understand it as a fixed-point iteration on a compact space. And I am someone who does algebra for a living and normally explains everything algebraically if ever possible...
Yes, but the paper blathers on and on in prose about history and related work and goals and future work. It might only mention eigenvector once in a tangential remark, but that remark is like 80% of the algorithm content of the paper.
If you can get .tex source files, ask GPT to rename the variables to something nicer, including more than one letter long names. Helps.
LaTex didn't exist when Turing wrote that paper in the 1930's.
I don't known if anyone has gone through the trouble to re-typeset the original paper in LaTex.
Oh man, if you think Shannon's A Mathematical Theory of Communication is his most foundational contribution to computer science, you haven't seen his master's thesis from a decade prior.
https://en.wikipedia.org/wiki/A_Symbolic_Analysis_of_Relay_a...
He outlined how you could use switching elements in circuits (read: transistors) to define boolean logic.
(That's not to downplay the importance of, well, establishing the foundations of the entire field of information theory.)
> He sketches out a hypothetical “Turing Machine,” proving that, if something is computable at all, a machine (in principle) can handle it.
That's not what Turing proved. Instead, what he proved in his paper was that there are some problems which aren't solvable by Turing Machines (and therefore presumably by any machine). That's the Entscheidungsproblem (decision problem) referenced in the title.
What TFA references is the so-called Church-Turing-Thesis, which is exactly that, a thesis. It can't really be proven although we have very strong reason to believe it given that in almost 100 years nobody has found a system of computation more powerful than Turing Machines.
And in 100 years we have gone completely the opposite direction in terms of where we think artificial intelligence lies. Rather than constructing a machine with all the known truths, modern searching for artificial intelligence using machines is mostly sifting through human commentary to create an organized feuilleton machine versus Leibniz's axiomatic approach
No, Turing had precisely that approach of feeding the machine with learning material in mind[1], but you have to build a machine apt to consume generic knowledge body before you throw at it anything.
https://philsci-archive.pitt.edu/9085/1/SterrettBringingUpTu...
There have also been attempts to canonicalize knowledge on the web (semantic web) and lots of things inside of computers are in fact deterministic.
But that is not the direction that AI seems to be taking, it is "smart" because it does a good job of parroting humans that sound "smart", not because it "knows" truths.
I beleive you are wrong that "nobody has found a system of computation more powerful than Turing Machines". A turing machine can not perform indeterminacy, however, the actor model can.
This is a misconception.
The actor model is a way to describe that a program exists, not if it’s possible to ‘compute’ that program.
You’re right that it can describe more things than a Turing machine, but doesn’t provide a constructive way to compute them.
Non-deterministic Turing machines [1] are the standard way to define Non-deterministic complexity classes like NP or NEXP, so there are definitely Turing machines with indeterminacy.
[1] https://en.wikipedia.org/wiki/Nondeterministic_Turing_machin...
I read that sentiment here a few years ago but couldn't get anything more out of it than actors can race, but a turing machine is determistic. I could very well have it wrong.
If you were computing with actors, and you also had a sufficiently-detailed spec about the actor model, is there some particular algorithm you could not compute by just executing a TLA+ spec of your actor algorithm using Turing-ish software?
Where's the infamous "Evolution of Unix time-sharing systems" by Dennis Ritchie?
https://www.bell-labs.com/usr/dmr/www/cacm.pdf
How about "SEQUEL: A Structured English Query Language" Don Chamberlin and Raymond Boyce?
https://web.archive.org/web/20070926212100/http://www.almade...
Nice work!
I was actually doing something similar on my own, so I might recommend some papers
- RSA: A Method for Obtaining Digital Signatures and Public-Key Cryptosystems (1978)
- PageRank: The PageRank Citation Ranking: Bringing Order to the Web (1999)
- MapReduce: MapReduce: simplified data processing on large clusters (2008)
- Bitcoin: Bitcoin: A Peer-to-Peer Electronic Cash System (2008)
- BackProp: Learning representations by back-propagating errors (1986)
- Hoare Logic: An Axiomatic Basis for Computer Programming (1969)
Oh, never mind about the PageRank paper, it was already in the list
Since everyone likes chiming in with their own additions to the list, here's mine:
While Cook was the first to introduce NP-completeness, Karp's paper presenting 21 problems that could be reduced polynomially to 3SAT was also an enormeous cornerstone that helped kick off a more general interest in Cook's theory.
https://en.wikipedia.org/wiki/Karp%27s_21_NP-complete_proble...
It's not papers but I would give special mention to Why Software Is Eating the World by Marc Andreessen and Amazon's original 1997 letter to shareholders.
"Companies in every industry need to assume that a software revolution is coming. This includes even industries that are software-based today."
https://a16z.com/why-software-is-eating-the-world/
"But this is Day 1 for the Internet and, if we execute well, for Amazon.com."
https://www.aboutamazon.com/news/company-news/amazons-origin...
Diffie, Whitfield; Hellman, Martin E. (November 1976). "New Directions in Cryptography" ???
Just wrote a blog about the explosion of papers titled after Attention Is All You Need [1]. Also figured out the name probably didn’t originate from one of the authors.
[1]https://open.substack.com/pub/0x7f1/p/is-all-you-need-is-all...
Here's another foundational paper:
Hewitt, Carl; Bishop, Peter; Steiger, Richard (1973). "A Universal Modular Actor Formalism for Artificial Intelligence". IJCAI.
https://www.ijcai.org/Proceedings/73/Papers/027B.pdf
>Go To Statement Considered Harmful” (1968) – Edsger Dijkstra
This is outdated and does not apply to modern goto.
It is often misunderstood which causes people to avoid goto even when it is very valid, even better than alternatives solution
To my understanding, the underlying issue is the way to structure code in a maintenance friendly way. It’s just very easy to go awry with unrestricted wild goto. There are more often than not some alternatives control flows which are easier to mentally follow. And things like label in Java[1] already capture most of relevant cases in which a generic goto statements might feel like a valid approach. This doesn’t mean that there is absolutely no case where a goto might be the most elegant easiest way to implement something, but that few cases are exceptional.
I mean, no one feels like using a laser is a proper way to cut butter, but using lasers is sometime the best cutting accurate option.
[1] https://www.geeksforgeeks.org/g-fact-64/
How does a modern goto differ from a traditional goto?
"Modern" goto (well, there aren't many modern languages with goto, but anyway) is semi-structured. Most importantly, it's local: you cannot jump into or out of subroutines, which was Dijkstra's major peeve. Using goto for backwards jumps is also usually discouraged post-Dijkstra.
Solid list. Two that influenced me personally were:
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE transactions on evolutionary computation, 1(1), 67-82.
And the corresponding search paper. Got me started in search and optimization (and Prolog).
Licklider, J. C. (1960). Man-computer symbiosis. IRE transactions on human factors in electronics, (1), 4-11.
More of a philosophical outlook but the thought of man-computer symbiosis instead of "computer solves it" has stuck with me (and is quite relevant in this day and age).
Definitely missing from this list:
J. Cooley and J. Tukey, “An Algorithm for the Machine Calculation of Complex Fourier Series,” 1965
I would also add J. Ziv and A. Lempel, "A Universal Algorithm for Sequential Data Compression", 1977 [1]. LZ (Lempel-Ziv) is the foundation of many data compression algorithms that are still in use today.
[1] https://courses.cs.duke.edu/spring03/cps296.5/papers/ziv_lem...
Love this, agree w/ all.
Probably just needs to be a bigger list.
Unix paper.
Hinton on deep learning (pick one).
Map Reduce + GFS from Google.
Paxos from dist systems.
PGP paper; RSA paper
+1 on map reduce, thats a classic systems paper.
Surprised no one has mentioned The Unreasonable Effectiveness of Data. Data is more important than complex domain specific algorithms.
Does anyone actually use Paxos in real life? And let me ask the same question for all other academic distributed algorithms right away. I recall seeing a study some 10 years ago checking the major cloud providers for Byzantine fault tolerance and finding none of them to exhibit the qualities that the known algorithms would guarantee; apparently they would just rely on timing and hoping things don't get too twisted.
> Does anyone actually use Paxos in real life?
Yes, it's very widely used at Google through Chubby, which underpins many core pieces of infrastructure, including name resolution. (It used to be common practice to depend more directly on Chubby for synchronization of state via shared global files, but that fell out of favor about 6 years ago due to reliability risks associated with just blasting out changes globally without any canarying.)
https://en.wikipedia.org/wiki/Paxos_(computer_science)#Produ... lists a bunch of other use cases (notably, Google's Spanner and Amazon's DynamoDB).
> And let me ask the same question for all other academic distributed algorithms right away.
Raft (designed as a more understandable alternative to Paxos) is more commonly used, as I understand it.
https://en.wikipedia.org/wiki/Raft_(algorithm)#Production_us...
> I recall seeing a study some 10 years ago checking the major cloud providers for Byzantine fault tolerance and finding none of them to exhibit the qualities that the known algorithms would guarantee.
I'm curious which study you're referring to. I could believe that while Paxos might be used as a building block, it might not be used in a consistent manner throughout.
Also, note that not all of the variations of Paxos handle Byzantine faults.
Ah, so Raft is what everyone uses, and Paxos is used by the very big providers nowadays. Good to know!
I can't find the study any more, though I'm pretty sure I saw it on HN...
Yes, if you’re using any sort of multi master db, you’re using a variant of paxos, raft or something which you shouldn’t really trust until aphyr blasts it into the low earth orbit.
The paper itself is very approachable and worth spending an hour or so on.
I remember trying to read the paper and giving up somewhere early in the proof. I certainly don't think it gives a great intuition why the algorithm holds, so "approachability" is a matter of definition (does it tell a fun story? sure yeah).
Lisp from McArthy.
Plan9+CSP.
Both, maybe, polar opposites, but complementary.
It would be interesting to see the opposite of this; which papers are really interesting and look useful, but did not end up having a significant impact?
I could mail you my dissertation if you're interested
Probably the papers about VLIW design
Plan9
I guess if we all add our favourite papers, we’ll end up with a very long list indeed. My own additions: As we may think by Vannevar Bush, and then something on Project Xanadu. There doesn’t seem to be any foundational paper on the latter, but there is Ted Nelson’s book Literary Machines.
I read Shannon’s paper every year and it gives me new insights every single time. It transcends computer science. And Claude was one of the coolest “nerds” I’ve met.
http://www-formal.stanford.edu/jmc/recursive.html
https://paulgraham.com/rootsoflisp.html
This is the arch-rival of Unix+C, and also one of his best CS friends (GNU Emacs it's still there, and it was widely used on Unixen, among reusing GNU (Unix clone) tools).
I think AlexNet was more influential than Attention is All You Need
I'd argue both as well as McCulloch & Pitts. Maybe Boltzmann or Rummelhart (Backprop) as well. Honestly, I wouldn't know where to stop, there are so many cool papers.
Yeah. But before AlexNet GPUs were only for graphics and esoteric papers in scientific computing. The realization that conv layers map well to cuda cores has led to GPU production being a national security issue.
If this is up your alley, Ideas That Created the Future[1] has like 40 more, decent introductions to their ideas, and provides a sort of thematic throughline from Aristotle to the modern computer science.
[1]: https://mitpress.mit.edu/9780262045308/ideas-that-created-th...
Ivan Sutherland's 1963 PhD thesis really was seminal for the graphical user interface.
"Sketchpad: A Man-machine Graphical Communication System."
The Part-Time Parliament by Leslie Lamport, written in such a style it is its own complementary material. (This is the Paxos paper.)
I think any list like this has to include: "Time, clocks, and the ordering of events in a distributed system" By Lamport also, in almost any networked system having strict ordering is fantastically useful. And of course how could we also forget "The Byzantine generals problem".
For pratical purposes, perhaps this:
Andrew Tridgell and Paul Mackerras, The rsync algorithm, June 1996, https://www.andrew.cmu.edu/course/15-749/READINGS/required/c...
I wonder how someone could make such a list and completely ignore cryptography. No, it is not enough to mention NP-completeness.
Is there a particular paper to point to?
We have Shannon's "Communication Theory of Secrecy Systems" as arguably the beginning of modern cryptography and then Diffie & Hellman's "New Directions in Cryptography" which first introduced public-key cryptography.
And FHE, MPC, ZK, among breakthroughs. Easy to check on the Wikipedia Turing Awards page [1]. Use Gödel prize as a "helper" [2].
[1] https://en.wikipedia.org/wiki/Turing_Award
[2] https://en.wikipedia.org/wiki/G%C3%B6del_Prize
diffie-hellman should be on that list... maybe instead of he pagerank one imho.
Surprised the Bitcoin paper isn't on here.
BTC is just combining all the past research into an application, which has it's own place but sadly not here. You might wanna read this [0] for all the past ideas that satoshi took
[0] - https://queue.acm.org/detail.cfm?id=3136559
It doesn't really add anything to computer science, but then again the Sergey-Brin paper probably doesn't match that rigidity either.
I'm not so sure. In my (under-read) mental model, blockchain takes you from fail-stop fault-tolerance (a la Paxos) to Byzantine fault-tolerance, i.e. how do you compute in a massively distributed system when no node has any reason to trust any other node.
Merkles work probably more important. And progenitor papers on representation of branch streams implementing reversible editors
Ken Thompson's talk (later a paper) "reflections on trusting trust."
https://dl.acm.org/doi/pdf/10.1145/358198.358210
Especially the Codd paper (what if Codd had died in World War 2?) makes me wonder: What are the non-obvious (kind of arbitrary even) yet extremely helpful concepts that no one has thought of or that didn't find the resonance that they should have?
Binary search and efficient sorting algorithms is pretty important.
I mean widely useful things, maybe not very obvious things, that no one has thought of yet or that are rotting in obscurity. If Codd hadn't basically invented relational databases, would we even know what we're missing?
For example: Rust took a bunch of ideas from some research languages that one guy did 1-3 decades ago at the time. These ideas have probably always been good and useful, but they weren't put to use, not all of them at least.
I´m pretty fond of ¨Ending the anomaly¨: https://www.usenix.org/system/files/conference/atc17/atc17-h...
"The UNIX TimeSharing System", by Dennis M. Ritchie and Ken Thompson.
Not a single woman? No Barbara Liskov - Programming with abstract data types? No Grace Hopper - The education of a computer? No Frances Allen - Program optimization?
Should the work of women be on that list for the sole reason that they are women? There are many more men who have written papers far influential than the ones you've mentioned yet they didn't make the list. If you believe in equality, then you have to believe that the work of people who happen to be women can compete on their own merit. The absence of women in that list isn't necessarily evidence of bias as implied in your remark.
> papers far influential than the ones you've mentioned
Citation needed
Don't act in bad faith, the entirety of this thread is filled with examples.
I'd put Liskov's Programming with abstract data types up against any of them. Fran Allen's work was so fundamental it's hard to find compiler stuff that doesn't build on her work.
> Don't act in bad faith
This sounds like projection to me
You asked for "citations", the thread is literally filled with references to them. How is it not bad faith to have to prove to you things that you can easily check for yourself?
You misunderstood the request. Your original comment was claiming that there were many papers far more influential than any of the papers named that were by women. I was requesting evidence of this influence. In response you say that what, all of the references filling this thread are more influential than say Liskov or Allen? If not all, which ones?
The original comment you were responding to was pointing out that none of the papers listed were by women, and suggested several that were that are undeniably influential. Perhaps you think they aren't because you haven't read them, or presumably even heard of them?
I don’t think representation needs to be a thing for a personal list on a blog. Government? Absolutely. Corporate? maybe. That said, of course there have been many critical female contributions in the field. However, it’s also a numbers game since CS academia has been very gender/sex lopsided to this day. So production would represent that (sad) reality.
Did you just assume their gender?
Nice list!
I would add "On the Criteria to Be Used in Decomposing Systems into Modules" (1972, ~7800 citations) by Parnas, though more software engineering than computer science in a strict sense.
Nothing earlier than 1936? For example George Boole's boolean arithmetic or Charles Babbage's computers?
I went with some cynicism. But completely wrong. These are it. Most every one recognisable as the root of many things we do.
Ha, me too! When the first title included Entscheidungsproblem I thought this would be intellectual edgelording. But, this is a legit list.
Of course, you can't do justice to the entire field with such a short list. Two papers on web but none on graphics, e.g. But it's a fine reading list for sure.
See also the book or the course by Harry R. Lewis at Harvard:
- Ideas That Created the Future https://www.amazon.com/Ideas-That-Created-Future-Computer/dp...
- "Classics of CS" https://canvas.harvard.edu/courses/34992/assignments/syllabu...
Louis Pouzin's work underpins/predates much of Cerf/Kahn.
I think the also ran list should be in the main list.
I suggest to add at least Hoare CSP and something about quick sort
9front/plan9 sites have that paper and it's magical and even an HSer would understand it.
Please add Satoshi Nakamoto, for Bitcoin and the money revolution
Idk why it is downvoted tbf
Bitcoin paper is really interesting from CS perspective
And also had HUGE influence on the world
No, sorry, you are deluded. Compared to the ones from Shannon, Knuth, Ritchie, John McArthy... BTC barely maked a dent.
It literally creates trilion dollar "industry", calling it barely a dent is weirsld as hell
And no, I dont own btc
Money backed by nothing. Remember 1929.
J.C.R. Licklider, "Man-Computer Symbiosis" from 1960.
Lambda calculus and Lisp are missing.
I thought we were going to get through without any LLM bullshit, but no, they snuck it in at the end.
Come on, we now have systems that can believably answer arbitrary questions in human language. This is literally what I dreamed of when I got into computing like 25 years ago, and would be considered science fiction only 5 years ago. As a side effect, entire tasks as important as machine translation and summarization have pretty much been solved for major languages.
Regardless of whether you buy the full hype or you think they're just stochastic parrots, I think it more than qualifies to make the second list (and probably the first, but I get that there's no perspective to be so sure about that).
The paper itself (as a paper, i.e. an explanation of the underlying results) is quite bad, by the way. It's better to learn about Transformers from one of the many good blog posts. But that doesn't detract from its influence.