Errors before terrors

Why I'm focusing on getting compiler diagnostics right while my parser is still young

Dec 28, 2020

When you make a mistake on a piece of writing, an English teacher may mark the mistake in red and offer an explanation on what’s wrong and how to fix it. Similarly, when you make a mistake coding, the compiler will flag it, typically by outputting colored messages in a terminal window.

I went on an exploration the past few weeks striving for the best possible error reporting in the Poly programming language. I ended up trying out one approach, designing a user interface to test it out, realizing the error messages were not ideal, and rewriting a new approach entirely.

Though error reporting may seem like an afterthought in the programming language creation process, I firmly believe anything that has the potential to improve a future developer’s experience significantly should be prioritized early on. So, this is a story about a topic not too often reflected on: compiler diagnostics.

Good compiler diagnostics

Good error reporting provides a clear view of what mistakes were made and how to fix them. I thought about my own goals with compiler messages and made a list of what I look for in good error reporting:

concise: only output errors as needed; don’t overwhelm the user with excessive messaging
informative: identify exactly where an error occurred and what the problem is
helpful: if possible, provide a potential fix to make the error go away

If you survey existing programming languages, you can see how error messaging varies by language. I used the online tool repl.it to try compiling a simple statement with an error in it — “int x = 2 + ;” — in six different programming languages:

There’s an interesting dynamic at play here. The interpreted languages I sampled (Python, Ruby, and JavaScript) simply say there’s a syntax error. The compiled languages (C, Java, and Swift) are more informative, describing that the error occurred while processing an expression. But overall they are remarkably consistent, reporting a single error in the same position.

My initial, flawed approach to error reporting

The parser, which I talked about in the last post, iterates through source code tokens and builds up an abstract syntax tree (AST) representing the program. For instance, the AST of “let x = 2 + 3;” might look like this:

My original idea for error reporting was:

report errors as they occur
continue parsing from the same place when an error occurs
when the AST is incomplete as a result of an error, store a placeholder error node in the AST to represent the missing information
use the collected information to inform intelligent compiler error messages

To see this approach in action, let’s say you were trying to parse the following statement:

let x = 2 + , + , + 5;

In this case, there are commas in two places where there should be numbers. These are both errors, as the comma does not work in the middle of an expression, so to represent them the AST will have error nodes:

I implemented this error handling technique, thinking it would work quite well. Then I designed a visual interface where I could type in code and see resulting compiler messages update in real-time. The error reporting worked just as planned — but I soon realized there were shortcomings.

The main problem: small typos could balloon into myriad error messages on the same line. Some of the errors were reported twice, since they were handled separately by expression parsing functions and statement parsing functions. Some of the errors were wrong since the parser made incorrect assumptions of how the AST should look.

My key takeaway was that a developer may not actually be interested in hearing about every possible error, for the following reasons:

a single typo can propagate multiple errors
there can be multiple ways to interpret a given mistake
an error can result in an incorrect understanding of what follows

In my mind, I likened the compiler diagnostics to an overzealous English teacher. Red pen in hand, the teacher notices a missing word in a sentence and marks the mistake three times, each for a reason that is technically correct:

A better approach to error handling

I began researching error handling and came across an established approach that seemed to address the shortcomings of my previous attempt. The programming language creation handbook Crafting Interpreters, by veteran language designer Bob Nystrom, lays it out in detail:

Of all the recovery techniques devised in yesteryear, the one that best stood the test of time is called—somewhat alarmingly—panic mode. As soon as the parser detects an error, it enters panic mode.

The gist of panic mode is that once a parser encounters an error, it gives up trying to finish parsing the current statement. Instead, the parser reports the error and skips tokens until it identifies what is likely the start of the next statement.

Although skipping tokens may seem like a waste of potentially useful information, Nystrom offers reassurance:

Any additional real syntax errors hiding in those discarded tokens aren’t reported, but it also means that any mistaken cascaded errors that are side effects of the initial error aren’t falsely reported either, which is a decent trade-off.

Essentially, this method does an admirable job fixing the problems we identified above:

a single typo will only result in a single error
only a single interpretation of a given mistake will be given (it may be the wrong way to interpret the given mistake, but it is impossible to perfectly hypothesize a user’s intention in every error situation)
an error will not result in an incorrect understanding of what follows, since the parser synchronizes itself to the next statement

Other perks are that the error handling routine is efficient, and the AST no longer needs to store error nodes.

I implemented this approach using ReasonML’s exception mechanism, which provides a shortcut to pop out of the recursive descent parsing stack when an error occurs. In the exception handler, I call a synchronization method that skips the parser forward until it finds a token that is likely the start of the next statement (e.g. “let” or “if” or the token after a semicolon).

I could test how coding with this new approach felt immediately after revising my architecture, using the visual system I had designed previously.

The same example code that previously resulted in five errors — “let x = 2 + , + , + 5 ,” — this time only produced a single error. Much better!

Multiple statements containing syntax errors highlight that the synchronization routine works as well (only one error per statement is produced):

Overall, this error handling approach seems robust — it is simple, fast, and can be iteratively improved with new error messages and additional diagnostics like suggested fixes. I would not be surprised if all the languages surveyed above more-or-less use this same method.

Summary

I realized Poly’s initial error reporting mechanism was flawed, but only with the help of a visual error-reporting system I created. Now, I am using a more established error reporting technique that involves stopping after the first error and synchronizing — and I have better intuition as to why this approach makes sense.

Optimizing for developer ergonomics — how a language feels to code in — led to an error reporting system that I believe will scale with the language.