Writing interpreters and compilers - An introduction to folds and algebras

So lets say we are building our own programming language because we're cool like that.

For now our language will support simple arithmatic.

So lets define an abstract syntax tree for it.

data Expr = Val Int -- an expression is a value
          | Expr :+: Expr -- Or two expressions added together
          | Expr :-: Expr -- Or two expressions subtracted

We can now formulate abstract syntax trees:

expr = Val 3 :+: Val 4
expr2 = expr :-: Val 2
expr3 = Val 3 :+: (Val 2 :-: val 4)

Now we can create an interpreter that evaluates our abstract syntax trees
using direct recursion:

interpret :: Expr -> Int

-- interpreting a value is simply interpreting the value itself

interpret (Val x) = x

-- interpreting addition is interpreting the left and righthand side and
-- adding them together
interpret (x :+: y) = interpret x + interpret y

-- similarary for subtraction
interpret (x :-: y) = interpet x - interpret y

ghci:

> interpret expr
7
> interpret expr2
5
> interpret expr3
1

Interpreters are cool and all. but I heard compilers are way cooler
Say we have a simple stack machine that supports the following operators:

data Instr   = PUSH Int -- pushes an integer to the stack
            | ADD      -- pops two integers from the stack, adds them and pushes
                        -- the result
            | SUB      -- similrary but subtracts

Then a compiler is simply:

compile :: Expr -> [Instr]

compile (Val x) = PUSH x
compile (x :+: y) = compile x ++ compile y ++ [ADD]
compile (x :-: y) = compile x ++ compile y ++ [SUB]

ghci:

> compile expr
[PUSH 3, PUSH 4, ADD]
> compile expr2
[PUSH 3, PUSH 4, ADD, PUSH 2, SUB]
> compile expr3
[PUSH 3, PUSH 2, PUSH 4, SUB, ADD]

Someone who has worked with lists before knows that we tend to avoid explicit
recursion and favor maps and folds because we can easily reason about resuable blocks

The question that arrises is, if we can abstract recursion on lists, can we abstract recursion on our custom abstract syntax tree?

The answer is yes.

We begin by defining a corresponding Algebra for our AST

data ExprAlgebra  e= ExprAlgebra
  { val :: Int -> e
  , add :: e -> e -> e
  , sub :: e -> e -> e
  }

What an algebra does is encapsulate the evaluation strategies of each component of our AST in a datatype.

For example. An interpreter would look like this:

interpreter :: ExprAlgebra Int
interpreter = ExprAlegbra
  { val = id
  , add = (+)
  , sub = (-)
  }

An algebra by itself isn't very useful. But once you recurse over
your Abstract syntax tree, you can apply the algebra on it. You're recursing over your abstract syntax tree with a provided evaluation strategy.

foldExpr :: ExprAlgebra a -> Expr -> a
foldExpr alg (Val i)     = (val alg) i
foldExpr alg (e1 :+: e2) = (add alg) (foldExpr e1) (foldExpr e2)
foldExpr alg (e1 :-: e2) = (sub alg) (foldExpr e1) (foldExpr e2)

The result is that if you feed our algebra/ evaluation strategy to this recursion scheme, you get back
an interpreter!

interpret' :: Expr -> Int
interpret' = foldExpr interpreter

ghci:

> interpret' expr1
7
> interpret' expr2
5
> interpret' expr3
1

So what did we gain by splitting up our recurive function into
an evaluation strategy and a recursive part?

Well! we can now write a compiler without writing another recursive function! Just define a new evaluation strategy!

compiler :: ExprAlgebra [Instr]
compiler = ExprAlgebra
  { val = [PUSH]
  , add = \x y -> x++y++[ADD]
  , sub = \x y -> x++y++[ADD]
  }

See how we can define the function compile by reusing foldExpr?
We have eliminated quite some code duplication!

compile' :: Expr -> [Instr]
compile' = foldExpr compiler

ghci:

> compile' expr
[PUSH 3, PUSH 4, ADD]
> compile' expr2
[PUSH 3, PUSH 4, ADD, PUSH 2, SUB]
> compile' expr3
[PUSH 3, PUSH 2, PUSH 4, SUB, ADD]

Of course we've made a tradeoff. Now we don't have duplicate recursion for operations on the same datatype. But say we now want to build a new language with different features and thus a different abstract syntax tree. Do we have to write a new fold for that? Can we automatically deduce folds for abstract syntax trees?

The answer is: for simple languages, Yes! For less-toy-examply-languages (lets say C#) the answer is (as far as I know), no.

But that's something for next time. I will then introduce F-algebras and catamorphisms to make our live even easier. They basically formalize the steps we have taken to transform explicit recursion into a fold. Allowing us to easily deduce folds on any datatype.
They're really interesting but as stated have their limitations, which I will also cover.