I learned about DFA (Data Flow Analysis) optimizations back in the early 1980s. I eagerly implemented them for my C compiler, and it was released as "Optimum C". Then came the C compiler roundup benchmarks in the programming magazines. I breathlessly opened the issue, and was faced with the reviewers' review that Optimum C was a bad compiler because it deleted the code in the benchmarks. (The reviewer wrote Optimum C was cheating by recognizing the specific benchmark code and deleting it.)
I was really, really angry that the review had not attempted to contact me about this.
But the other compiler venders knew what I'd done, and the competition implemented DFA as well by the next year, and the benchmarks were updated.
The benchmarks were things like:
void foo() { int i,x = 1; for (i = 0; i < 1000; ++i) x += 1; }
If you claim an IR makes things harder, just skip it.
Compilers do have an essential complexity that makes them "hard" [...waffle waffle waffle...]
The primary data [...waffle...] represents the computation that the compiler needs to preserve all the way to the output program. This data structure is usually called an IR (intermediate representation). The primary way that compilers work is by taking an IR that represents the input program, and applying a series of small transformations all of which have been individually verified to not change the meaning of the program (i.e. not miscompile). In doing so, we decompose one large translation problem into many smaller ones, making it manageable.
There we go. The section header should be updated to:
Why compilers are manageable – the IR data structure
In the D compiler, I realized that while loops could be rewritten as for loops, and so implemented that. The for loops are then rewritten using goto's. This makes the IR a list of expression trees connected by goto's. This data structure makes Data Flow Analysis fairly simple.
An early function inliner I implemented by inlining the IR. When I wrote the D front end, I attempted to do this in the front end. This turned out to be a significantly more complicated problem, and in the end not worth it.
The difficulty with the IR versions is, for error messages, it is impractical to try and issue error messages in the context of the original parse trees. I.e. it's the ancient "turn the hamburger into a cow" problem.
This resonates with how compiler work looks outside textbooks. Most of the hard problems aren’t about inventing new optimizations, but about making existing ones interact safely, predictably, and debuggably. Engineering effort often goes into tooling, invariants, and diagnostics rather than the optimizations themselves.
I don't feel his overflow miscompilation example is a good one. A 64bit multiplication converted back to 32bit has the same overflow behavior as if the computation was in 32bit (assuming nobody depends on the overflow indication, which is rare). And in high level programming languages you typically can't tell the difference.
“Compiler Engineering in Practice” is a blog series intended to pass on wisdom that seemingly every seasoned compiler developer knows, but is not systematically written down in any textbook or online resource. Some (but not much) prior experience with compilers is needed.
I learned about DFA (Data Flow Analysis) optimizations back in the early 1980s. I eagerly implemented them for my C compiler, and it was released as "Optimum C". Then came the C compiler roundup benchmarks in the programming magazines. I breathlessly opened the issue, and was faced with the reviewers' review that Optimum C was a bad compiler because it deleted the code in the benchmarks. (The reviewer wrote Optimum C was cheating by recognizing the specific benchmark code and deleting it.)
I was really, really angry that the review had not attempted to contact me about this.
But the other compiler venders knew what I'd done, and the competition implemented DFA as well by the next year, and the benchmarks were updated.
The benchmarks were things like:
In the D compiler, I realized that while loops could be rewritten as for loops, and so implemented that. The for loops are then rewritten using goto's. This makes the IR a list of expression trees connected by goto's. This data structure makes Data Flow Analysis fairly simple.
An early function inliner I implemented by inlining the IR. When I wrote the D front end, I attempted to do this in the front end. This turned out to be a significantly more complicated problem, and in the end not worth it.
The difficulty with the IR versions is, for error messages, it is impractical to try and issue error messages in the context of the original parse trees. I.e. it's the ancient "turn the hamburger into a cow" problem.
This resonates with how compiler work looks outside textbooks. Most of the hard problems aren’t about inventing new optimizations, but about making existing ones interact safely, predictably, and debuggably. Engineering effort often goes into tooling, invariants, and diagnostics rather than the optimizations themselves.
I don't feel his overflow miscompilation example is a good one. A 64bit multiplication converted back to 32bit has the same overflow behavior as if the computation was in 32bit (assuming nobody depends on the overflow indication, which is rare). And in high level programming languages you typically can't tell the difference.
> What is a compiler?
Might be worth skipping to the interesting parts that aren’t in textbooks
Always interested in compiler testing, so I look forward to what he has to say on that.
skimmed through the article and the structure just hints at being not written by a human
have to disagree. maybe read a paragraph, its dense with context imo. i find slop to be light on context and wordy, this is not.
The compiler part of a language is actually a piece of cake compared to designing a concurrent garbage collector.
A good enough one isn't so bad. Here's an example concurrent collector that was used with the Inferno OS:
https://github.com/inferno-os/inferno-os/blob/master/libinte...
There's lots of room to improve it, but it worked well enough to run on telephony equipment in prod.
This has gotta be one of the most dunning-kruger comments on hn
“Compiler Engineering in Practice” is a blog series intended to pass on wisdom that seemingly every seasoned compiler developer knows, but is not systematically written down in any textbook or online resource. Some (but not much) prior experience with compilers is needed.