- August, an assembler and linker, which is yet to be started
- October, a symbolic mathematics environment, which is even more yet to be started, but which I’m really excited about
Autumn months are great, especially because august is the best song on folklore which is a lyrical and aural masterpiece by the queen of pop Taylor Swift herself, but that’s for me to go on about on a different website. Back to the point:
a := 2 b := 3 log(a + b) log('Hello, World!')
let a = 2; let b = 3; log(__as_ink_string(a + b)); log(__Ink_String(`Hello, World!`))
See September on GitHub →
It's a good question! Someone asked this yesterday:— Linus (@thesephist) August 10, 2020
Big things are
- lack of native GC support
- JS is semantically closer, compilation is less work
- I'll probably want to access DOM APIs directly from Inkhttps://t.co/OHjWP8OTkZ
A few projects in particular inspired my work on September.
How it works
I’ve written an in-depth guide to how September works in the readme to the project. Here, I’ll just give you a taste with a small but illustrative example.
Let’s take this simple Ink program.
a := 3 b := 17 log(a + b)
a = 3; b = 17; log(__as_ink_string(a + b))
The first step in the September compiler is scanning, handled by the tokenizer. The tokenizer scans through the program text as a string, and produces a list of tokens, or symbols. In the Ink tokenizer, these tokens are also sometimes tagged with their type, like “number literal” or “string literal” or “addition operator”. For our small program, September yields the following tokens. I’ve added some blank lines and comments to make the output easier to read.
# a := 3 Ident(a) @ 1:1 DefineOp(()) @ 1:3 NumberLiteral(3) @ 1:5 Separator(()) @ 1:6 # b := 7 Ident(b) @ 2:1 DefineOp(()) @ 2:3 NumberLiteral(17) @ 2:5 Separator(()) @ 2:6 # log(a + b) Ident(log) @ 3:1 LParen(()) @ 3:4 Ident(a) @ 3:5 AddOp(()) @ 3:7 Ident(b) @ 3:9 Separator(()) @ 3:10 RParen(()) @ 3:10 Separator(()) @ 3:11
We can see that the token stream is a straightforward list of symbols that we see in the program. These tokens are also annotated with the line and column numbers in the source code where they occur, like
@ 3:9 to mean
line 3, column 9. This is useful for debugging and emitting useful syntax error messages.
You might be wondering where the
Separator token came from. This is an implicit detail of the Ink language syntax, and functions like the semicolon in most C-style languages, as an expression terminator. It’s not necessary most of the time, and inferred by the interpreter or compiler. Here, our tokenizer has inferred where the implicit Separator tokens should be and added them for us. This makes the next step easier. If this bit about the
Separator token doesn’t make sense, don’t worry – it’s not important to the compilation process.
Next up, we need to group these tokens into meaningful hierarchies. We want to know, for example, that the
a + b expression is a single expression, while
log(a is not. This work is done by the parser, which builds up a recursive data structure called the abstract syntax tree. The AST for our program looks like this.
BinExpr(Ident(a) DefineOp Lit(3)) BinExpr(Ident(b) DefineOp Lit(17)) Call(Ident(log) (BinExpr(Ident(a) AddOp Ident(b))))
We can see our parser has grouped tokens into meaningful hierarchies. This representation of our program is meaningful enough for the rest of the compiler to draw good conclusions about what the program does.
The analyzer traverses the syntax tree from top-down, and makes small annotations or transformations on the nodes of the tree that help us generate better code.
const. The analyzer combs through variables declared in each scope, and sets a flag called
node.decl? if the expression should be a
BinExpr(Ident(a) MulOp Ident(b)), which is
a * b, gets translated into
a = 3; b = 17; log(__as_ink_string(a + b))
… nearly. You might be wondering what the
__as_ink_string() function is doing in our code. This is an example of a runtime library function.
The runtime library
~ in Ink, which negates a number or negates a boolean, depending on the type of the operand.
# Ink ~true `` -> false ~1 `` -> negative 1 `` same operator! # JS !true // false -1 // negative 1 // differenet operators!
__ink_negate(x), which is a runtime library function that just does the right thing. Ink’s runtime also implements various Ink built-in functions like
len() for sizing an object or string, and the Ink
__as_ink_string() runtime function we see in our toy example ensures that a string value is represented in a way that’s consistent with Ink’s string type throughout the generated program.
At time of writing, I’ve been hacking on September for two days and change. Today, September is something between a proof-of-concept and an alpha. It can compile moderately large Ink programs (including itself and the Ink standard library) correctly, but doesn’t implement all of the Ink language for the resulting program to work correctly all of the time.
One way I’ve been tracking the progress of September is by testing the compiler against the test suite I wrote for the original Ink interpreter. This can give us better confidence that an Ink program compiled with September behaves identically to one that runs on the original interpreter.
So far, September passes something like 283 of the 370 tests in the test suite. It would likely pass far more, but I haven’t had time to translate dependencies of the test outside of the standard libraries yet.
As I’ve been careful to mention before, September is a work in progress. Besides tail call elimination and better general optimizations, There are a few other ideas I want to explore with September going forward.
- Self-hosting. While September can currently compile itself, the runtime isn’t sufficiently complete for the compiled code to compile itself again – i.e. September isn’t strictly self-hosting. This is because things like the filesystem APIs aren’t implemented yet. I hope to get to a point where September can produce code that can compile itself again, and be truly self-hosting, independently of the original Go-based Ink interpreter.
Although I’ve been reading deeply into the design of compilers lately, September is young (as am I) and building September is a continual learning process. There are some design decisions I really like about the compiler, some that I regret, and some that are just carried over from the design of the original interpreter.
September is the first time I’ve written a semantic analysis algorithm of any sort, and the first time I’ve written a compiler with a code generation backend of any kind. So it seems like there’s a lot of space for me to improve there and dig deeper into literature with some preliminary knowledge of what kinds of problems I want to solve. And I’m excited to go do exactly that.
If you enjoyed this piece, you might also enjoy my next post, Syntax highlighting Ink programs with the September toolchain.