domingo, 4 de octubre de 2015

Macros in Lux

Hello, and welcome to the first entry in the Lux Devlog, wherein the posts will touch the design and implementation of the Lux Programming Language, an upcoming functional language in the Lisp tradition.

To begin this series of posts, I'll talk about a subject I've been asked before by curious programmers: why is it that Lux macros are monadic?

I'll expand on that subject a bit and talk about all of the machinery that Lux offers in terms of macros.

First, I've got a disclaimer to make: this is not a monads tutorial. There are many tutorials out there and this blog post isn't the right place to introduce the subject. With that in mind, let's move on...

The reason why Lux macros are monadic has to do with the fact that they operate inside a special context in the compiler in which they get to have access to its state. That is one of the privileges of running during compile-time, and it gives them access to a lot of really useful information regarding modules, types, global definitions and much more.

To understand how everything is wired together, it's a good idea to check out the types of the things we are going to be dealing with:

 (deftype (Lux a)  
   (-> Compiler (Either Text (, Compiler a))))  

The type above is the type of functions that work with access to the state of the compiler.

For those of you unfamiliar with the syntax (pretty much everybody, as Lux is fairly new), the -> sign is a macro for specifying functions; the , sign serves to specify tuples; and (Lux a) specifies a polymorphic type with 1 parameter (named a).

 (deftype Macro  
   (-> (List AST) (Lux (List AST))))  

Macros are functions that, given a list of syntactic tokens (as instances of the AST type), they must return a function that can have access to the state of the compiler and must return an updated list of tokens.

But... what does Lux ASTs look like?
Well... they're data-structures like any other:

## This isn't the way it's defined, but it's good for explaining it...
(deftype #rec AST  
   (Meta Cursor  
      (| (#BoolS Bool)  
         (#IntS Int)  
         (#RealS Real)  
         (#CharS Char)  
         (#TextS Text)  
         (#SymbolS Text Text)  
         (#TagS Text Text)  
         (#FormS (List AST))  
         (#TupleS (List AST))  
         (#RecordS (List (, AST AST))))))  

Again, for those of you unfamiliar with the syntax, the hash (#) sign is used to specify the tags for variants (a.k.a. sum types). The pipe (|) sign is what declares that the following cases belong to a variant type. Also, the #rec tag that is added before AST declares that it's a recursive type, so that you can refer to it by name inside it's own definition.

Also, you might wonder what Meta & Cursor are for. Well, the answer is very simple: AST nodes don't come alone, as they carry with them some meta-data (that's what Meta is for), which, in this case, just happens to be a cursor that points to the place in the source code where said AST has it's text representation.

What does that cursor look like?
Let's take a look:

 (deftype Cursor (, Text Int Int))  

It's just a 3-tuple where the first element corresponds to the name of the file, the second element corresponds to the line, and the third element corresponds to the column.

Finally, let's take a look at what the compiler offers:

 (deftype #rec Compiler  
   (& #source    Source  
      #cursor    Cursor  
      #modules   (List (, Text (Module Compiler)))  
      #envs      (List (Env Text (Meta (, Type Cursor) Analysis)))  
      #type-vars (Bindings Int Type)  
      #expected  Type  
      #seed      Int  
      #eval?     Bool  
      #host      Void  

So... it's a fairly complex data-structure with yet more new type-syntax to deal with.

We can already recognize the #rec tag, but what's up with the ampersand?
Well... that ampersand (&) is what signifies that this type correspond to records. The tags being matched to their respective types do not require to be put inside parenthesis, and are instead matched as pairs.

While there are many new types being mentioned, they will be left for future blog-posts, as right now we're going to focus more on macros per se. Suffice it to say that the Lux compiler accumulates a lot information and that's what the Compiler type embodies.


Some of you might have noticed that macros are not expected to return one AST token, but can return an arbitrary amount, as their return type involves (List AST).

This is one of the places where Lux differs from more traditional lisps, where macros always return one piece of syntax.

While in most cases that is all you'll need to return from a macro, sometimes you do want to return more than 1 elements (or, as is the case for the comment macro, no elements).

So... what does a macro definition in Lux look like?
Let's check out how the @list macro is defined inside the lux/data/list module:

 (defmacro #export (@list xs state)  
  (#;Right state (#;Cons (foldL (: (-> AST AST AST)  
                                   (lambda [tail head] (` (#;Cons (~ head) (~ tail)))))  
                                (` #;Nil)  
                                (reverse xs))  

There's a lot of new stuff in that snippet.

Besides seeing that there's already an easy way to define macros and an even easier way to export definitions, we can also see that macros receive the state of the compiler as an argument (we already expected that from the type definition).

We can also notice what's the syntax for creating variants in Lux (just put the tag at the beginning of a form and the data afterwards), the syntax for adding type annotations to code (:), and the syntax for quasi-quotation (`) and unquoting (~).

Some lispers might be wondering why quotation code is written in this style:
(` (#;Cons (~ head) (~ tail)))
instead of in this style:
`(#;Cons ~head ~tail)

The reason for that is this: quasi-quotation in Lux is done via a regular macro, instead of via a reader-macro, as is done on most lisps.

Since quotation was going to have that kind of syntax in Lux, there wasn't any reason for unquoting to be different, and that's why it all looks like that.

It's different to what most lispers are accustomed to, but it doesn't take long for someone to get used to it.

Ok... so, the @list example is nice, but there's no analysis of the incoming syntax tokens. How do I write complex macros that actually do something with the arguments?

Well... let's check out another example:

 (defmacro #export (do-template tokens)  
   (case tokens  
     (#Cons [_ (#TupleS bindings)]  
            (#Cons [_ (#TupleS templates)]  
     (case [(map% Maybe/Monad get-name bindings)  
            (map% Maybe/Monad tuple->list data)]  
       [(#Some bindings') (#Some data')]  
       (let [apply (: (-> RepEnv (List AST))  
                      (lambda [env]  
                        (map (apply-template env)  
         (|> data'  
             (join-map (. apply (make-env bindings')))  
       (fail "Wrong syntax for do-template"))  
     (fail "Wrong syntax for do-template")))  

The example above comes from the lux module (the prelude), and is the definition of the (very useful) do-template macro (which I'll touch on briefly before the end of this blog post).

As you can see, the syntax tokens are being decomposed through pattern-matching, extracting, in this case, a pair of tuples and some "data" that is processed later.

Also, some of you might notice that do-template is not receiving the state of the compiler, as was the case for @list. What's happening here is that do-template is making use of 2 functions (return & fail), that generate functions that do take the state of the compiler (return takes the state and then returns whatever data you gave to it, while fail takes the state but ignores it to just send back an error message).

Again, there are many details here that will be left for future blog-posts, but you can already see that extracting data from the AST tokens is not the nicest thing in the world.

Just consider for a moment the hassle that it would be to extract many different arguments, some of them being optional or repetitive, or embedded inside larger syntactic structures.

How could someone write complex macros in a simple way using Lux?

The answer turns out to be very easy, but it's something that many lispers might not have experience with: monadic parsing.

What!? More monads!?

Calm down. There's no need to panic. Monadic parsing is a very simple technique for writing parsers as functions that take tokens of some type and return tokens of another, higher-level type.

By using monadic parsers inside of macros, it's possible to specify very complex parsing rules in very simple ways that can even be reused across macros.

Another really cool benefit, is that you can return custom data-structures from these parsers, further elevating your macros into a more domain-specific level and taking into consideration optional and repetitive elements of the AST tokens being parsed.

An example of how that's supposed to look like is this macro, taken from the lux/host/jvm module:

 (defsyntax #export (defclass [name local-symbol^] [super local-symbol^] [interfaces (tuple^ (*^ local-symbol^))]  
                              [annotations annotations^]  
                              [fields (*^ field-decl^)]  
                              [methods (*^ method-def^)])  
   (emit (@list (` (;_jvm_class (~ (text$ name)) (~ (text$ super))  
                     [(~@ (map text$ interfaces))]  
                     [(~@ (map gen-annotation annotations))]  
                     [(~@ (map gen-field-decl fields))]  
                     [(~@ (map gen-method-def methods))])))))  

You'll notice that now I'm using defsyntax instead of defmacro. You'll also notice that the arguments that the macro takes are specified in a very different manner, with the names of the arguments being paired with parsers (the name of each being suffixed with a caret ^) and some being prefixed by a strange symbol (*^), which turns out to be a parser combinator that means 0 or more.

Also, you'll see that I'm using unquote-splice (~@) to handle the output of the map function being invoked several times. Finally, the emit function you see is the same as return in the example of do-template. (I won't go into detail here as to why the names are different).

This macro is still complex, but only because what it tries to do is complex in and out of itself. This macro provides a more pleasant syntax to class declaration than what Lux natively exposes through the _jvm_class special form.

However, by abstracting out how annotations, fields and methods are parsed, not only have I simplified the code of this macro, but I can even reuse those parsing mechanisms in other macros, such as the macro for making anonymous classes or the macro for making interfaces.

Not only that, but each of those parsers returns a different data-type, custom made for the information that it represents. That way, my macros are not confined to only work with the raw data that the AST tokens provide, but can instead work with more refined and domain-specific representations of that data.

While monadic parsers on macros might be a new thing for many lispers out there, after using them for a while for writing some macros, of both low & medium complexity, I must say the they are probably the best technique I've found for managing complexity when writing macros. They're a pleasure to use and it doesn't take much effort to become proficient in their use.


Finally, I would like to touch on something I promised a while ago...

Sometimes, you want to generate code in a very limited manner and creating a macro to do it seems a bit like overkill. Maybe you're just looking for a way to repeat a particular pattern in code that doesn't require any serious processing of the arguments. For those kinds of situations, you'll want to use the do-template macro.

For those of you who are familiar with it's Clojure counterpart, the do-template macro works the same way, with a small difference being the way it takes its arguments. What do I mean? Let's take a look at an example:

 (do-template [<name> <diff>]  
   [(def #export <name>  
      (-> Int Int)  
      (i+ <diff>))]  

   [inc 1]  
   [dec -1])  

Here we see code for generating the inc & dec functions (both available in the lux module). As you can see, the form for the pattern is inside a tuple. The reason for that is that there can be more than 1 form in there, so the tuple delimits all the forms that must be generated for every group of parameters.

You'll also see that each group of parameters is also delimited by tuple syntax. The goal of that is to make it better ordered when you're passing arguments to the template.

By the way, the fact that the arguments to the template are surrounded by <angle> <brackets> is just a stylistic choice of mine and not a requirement of do-template.


That's all that I wanted to say for now concerning macros in Lux.

There are more things that could be explained about this subject, but I'll leave those for future blog posts.