Macros

Nemerle type-safe macros

What exactly is a macro?

Basically every macro is a function, which takes a fragment of code as parameter(s) and returns some other code. On the highest level of abstraction it doesn't matter if parameters are function calls, type definitions or just a sequence of assignments. The most important fact is that they are not common objects (e.g. instances of some types, like integer numbers), but their internal representation in the compiler (i.e. syntax trees).

A macro is defined in the program just like any other function, using common Nemerle syntax. The only difference is the structure of the data it operates on and the way in which it is used (executed at compile-time).

A macro, once created, can be used to process some parts of the code. It's done by calling it with block(s) of code as parameter(s). This operation is in most cases indistinguishable from a common function call (like f(1)), so a programmer using a macro would not be confused by unknown syntax. The main concept of our design is to make the usage of macros as transparent as possible. From the user point of view, it is not important if particular parameters are passed to a macro, (which would process them at the compile-time and insert some new code in their place), or to an ordinary function.

Defining a new macro

Writing a macro is as simple as writing a common function. It looks the same, except that it is preceded by a keyword macro and it lives at the top level (not inside any class). This will make the compiler know about how to use the defined method (i.e. run it at the compile-time in every place where it is used).

Macros can take zero (if we just want to generate new code) or more parameters. They are all elements of the language grammar, so their type is limited to the set of defined syntax objects. The same holds for a return value of a macro.

Example:

macro generate_expression ()
{
  MyModule.compute_some_expression ();
}

This example macro does not take any parameters and is used in the code by simply writing generate_expression ();. The most important is the difference between generate_expression and compute_some_expression - the first one is a function executed by the compiler during compilation, while the latter is just some common function that must return syntax tree of expressions (which is here returned and inserted into program code by generate_expression).

Compiling a simplest macro

In order to create and use a macro you have to write a library, which will contain its executable form. You simply create a new file mymacro.n, which can contain for example

macro m () {
  Nemerle.IO.printf ("compile-time\n");
  <[ Nemerle.IO.printf ("run-time\n") ]>;
}

and compile it with command

 ncc -r Nemerle.Compiler.dll -t:dll mymacro.n -o mymacro.dll

Now you can use m() in any program, like here

module M {
  public Main () : void {
    m ();
  }
}

You must add a reference to mymacro.dll during compilation of this program. It might look like

 ncc -r mymacro.dll myprog.n -o myprog.exe

Exercise

Write a macro, which, when used, should slow down the compilation by 5 seconds (use System.Timers namespace) and print the version of the operating system used to compile program (use System.Environment namespace).

Operating on syntax trees

Definition of function compute_some_expression might look like:

using Nemerle.Compiler.Parsetree;

module MyModule 
{
  public mutable debug_on : bool;

  public compute_some_expression () : PExpr 
  {
    if (debug_on) 
      <[ System.Console.WriteLine ("Hello, I'm debug message") ]>
    else
      <[ () ]>
  }
}

The examples above show a macro, which conditionally inlines expression printing a message. It's not quite useful yet, but it has introduced the meaning of compile-time computations and also some new syntax used only in writing macros and functions operating on syntax trees. We have written here the <[ ... ]> constructor to build a syntax tree of expression (e.g. '()').

Quotation operator

<[ ... ]> is used to both construction and decomposition of syntax trees. Those operations are similar to quotation of code. Simply, everything which is written inside <[ ... ]>, corresponds to its own syntax tree. It can be any valid Nemerle code, so a programmer does not have to learn internal representation of syntax trees in the compiler.

macro print_date (at_compile_time)
{                   
  match (at_compile_time) {
    | <[ true ]> => MyModule.print_compilation_time ()
    | _ => <[ WriteLine (DateTime.Now.ToString ()) ]>
  }
}

The quotation alone allows using only constant expressions, which is insufficient for most tasks. For example, to write function print_compilation_time we must be able to create an expression based on a value known at the compile-time. In next sections we introduce the rest of macros' syntax to operate on general syntax trees.

Matching subexpressions

When we want to decompose some large code (or more precisely, its syntax tree), we must bind its smaller parts to variables. Then we can process them recursively or just use them in an arbitrary way to construct the result.

We can operate on entire subexpressions by writing $( ... ) or $ID inside the quotation operator <[ ... ]>. This means binding the value of ID or the interior of parenthesized expression to the part of syntax tree described by corresponding quotation.

macro for (init, cond, change, body)
{
  <[ 
    $init;
    def loop () : void {
      if ($cond) { $body; $change; loop() } 
      else ()
    };
    loop ()
  ]>
}

The above macro defines function for, which is similar to the loop known from C. It can be used like this

for (mutable i = 0, i < 10, i++, printf ("%d", i))

Later we show how to extend the language syntax to make the syntax of for exactly as in C.

Base elements of grammar

Sometimes quoted expressions have literals inside of them (like strings, integers, etc.) and we want to operate on their value, not on their syntax trees. It is possible, because they are constant expressions and their runtime value is known at the compile-time.

Let's consider the previously used function print_compilation_time.

using System;
using Nemerle.Compiler.Parsetree;

module MyModule {
  public print_compilation_time () : PExpr
  {                   
    <[ System.Console.WriteLine ($(DateTime.Now.ToString () : string)) ]>
  }
}

Here we see some new extension of splicing syntax where we create a syntax tree of string literal from a known value. It is done by adding : string inside the $(...) construct. One can think about it as of enforcing the type of spliced expression to a literal (similar to common Nemerle type enforcement), but in the matter of fact something more is happening here - a real value is lifted to its representation as syntax tree of a literal.

Other types of literals (int, bool, float, char) are treated the same. This notation can be used also in pattern matching. We can match constant values in expressions this way.

There is also a similar schema for splicing and matching variables of a given name. $(v : name) denotes a variable, whose name is contained by object v (of special type Name). There are some good Macros#Problem_with_names_capture for encapsulating a real identifier within this object.

Constructs with variable amount of elements

You might have noticed, that Nemerle has a few grammar elements, which are composed of a list of subexpressions. For example, a sequence of expressions enclosed with { .. } braces may contain zero or more elements.

When splicing values of some expressions, we would like to decompose or compose such constructs in a general way - i.e. obtain all expressions in a given sequence. It is natural to think about them as if a list of expressions and to bind this list to some variable in meta-language. It is done with special syntax ..:

mutable exps = [ <[ printf ("%d ", x) ]>, <[ printf ("%d ", y) ]> ];
exps = <[ def x = 1 ]> :: <[ def y = 2 ]> :: exps;
<[ {.. $exps } ]>

We have used { .. $exps } here to create the sequence of expressions from list exps : list[Expr]. A similar syntax is used to splice the content of tuples (( .. $elist )) and other constructs, like array []:

using Nemerle.Collections;

macro castedarray (e) {
 match (e) {
  | <[ array [.. $elements ] ]> =>
     def casted = List.Map (elements, fun (x) { <[ ($x : object) ]> });
     <[ array [.. $casted] ]>
  | _ => e
 }
}

If the exact number of expressions in tuple/sequence is known during writing the quotation, then it can be expressed with

<[ $e_1; $e_2; $e_3; x = 2; f () ]>

The .. syntax is used when there are e_i : Expr for 1 <= i <= n.

Exercise

Write a macro rotate, which takes two parameters: a pair of floating point numbers (describing a point in 2D space) and an angle (in radians). The macro should return a new pair -- a point rotated by the given angle. The macro should use as much information as is available at the compile-time, e.g. if all numbers supplied are constant, then only the final result should be inlined, otherwise the result must be computed at runtime.

Adding new syntax to the compiler

After we have written the for macro, we would like the compiler to understand some changes to its syntax. Especially the C-like notation

for (mutable i = 0; i < n; --i) {
  sum += i;
  Nemerle.IO.printf ("%d\n", sum);
}

In order to achieve that, we have to define which tokens and grammar elements may form a call of for macro. We do that by changing its header to

macro for (init, cond, change, body)
syntax ("for", "(", init, ";", cond, ";", change, ")", body)

The syntax keyword is used here to define a list of elements forming the syntax of the macro call. The first token must always be an unique identifier (from now on it is treated as a special keyword triggering parsing of defined sequence). It is followed by tokens composed of operators or identifiers passed as string literals or names of parameters of macro. Each parameter must occur exactly once.

Parsing of syntax rule is straightforward - tokens from input program must match those from definition, parameters are parsed according to their type. Default type of a parameter is Expr, which is just an ordinary expression (consult Nemerle grammar in Reference). All allowed parameter types will be described in the extended version of reference manual corresponding to macros.

Exercise

Add a new syntactic construct forpermutation to your program. It should be defined as the macro

macro forp (i, n : int, m : int, body)

and introduce syntax, which allows writing the following program

mutable i = 0;
forpermutation (i in 3 to 10) Nemerle.IO.printf ("%d\n", i)

It should create a random permutation p of numbers x_j, m <= x_j <= n at the compile-time. Then generate the code executing body of the loop n - m + 1 times, preceding each of them with assignment of permutation element to i.

Macros in custom attributes

Executing macros on type declarations

Nemerle macros are simply plugins to the compiler. We decided not to restrict them only to operations on expressions, but allow them to transform almost any part of program. Macros can be used within custom attributes written near methods, type declarations, method parameters, fields, etc. They are executed with those entities passed as their parameters.

As an example, let us take a look at Serializable macro. Its usage looks like this:

[Serializable]
class S {
  public this (v : int, m : S) { a = v; my = m; }
  my : S;
  a : int;
}

From now on, S has additional method Serialize and it implements interface ISerializable. We can use it in our code like this

def s = S (4, S (5, null));
s.Serialize ();

And the output is

<a>4</a>
<my>
  <a>5</a>
  <my>
    <null/>
  </my>
</my>

The macro modifies type S at compile-time and adds some code to it. Also inheritance relation of given class is changed, by making it implement interface ISerializable

public interface ISerializable {
  Serialize () : void;
}

Manipulating type declarations

In general, macros placed in attributes can do many transformations and analysis of program objects passed to them. To see Serializable macro's internals and discuss some design issues, let's go into its code.

[Nemerle.MacroUsage (Nemerle.MacroPhase.BeforeInheritance, Nemerle.MacroTargets.Class,
                     Inherited = true)]
macro Serializable (t : TypeBuilder)
{
  t.AddImplementedInterface (<[ ISerializable ]>)
}

First we have to add interface, which given type is about to implement. But more important thing is the phase modifier BeforeInheritance in macro's custom attribute. In general, we separate three Macros_tutorial#Execution_stages for attribute macros. BeforeInheritance specifies that the macro will be able to change subtyping information of the class it operates on.

So, we have added interface to our type, we now have to create Serialize () method.

[Nemerle.MacroUsage (Nemerle.MacroPhase.WithTypedMembers, Nemerle.MacroTargets.Class,
                     Inherited = true)]
macro Serializable (t : TypeBuilder)
{
  /// here we list its fields and choose only those, which are not derived
  /// or static
  def fields = t.GetFields (BindingFlags.Instance | BindingFlags.Public %|
                            BindingFlags.NonPublic | BindingFlags.DeclaredOnly);

  /// now create list of expressions which will print object's data  
  mutable serializers = [];

  /// traverse through fields, taking their type constructors  
  foreach (x : IField in fields) {
    def tc = x.GetMemType ().TypeInfo;
    def nm = Macros.UseSiteSymbol (x.Name);
    if (tc != null)
      if (tc.IsValueType)
        /// we can safely print value types as strings        
        serializers = <[
                         printf ("<%s>", $(x.Name : string));
                         System.Console.Write ($(nm : name));
                         printf ("</%s>\n", $(x.Name : string));
                       ]>
                       :: serializers
      else
        /// we can try to check, if type of given field also implements ISerializable
        if (x.GetMemType ().Require (<[ ttype: ISerializable ]>))
          serializers = <[
                           printf ("<%s>\n", $(x.Name : string));      
                           if ($(nm : name) != null)
                             $(nm : name).Serialize ()
                           else
                             printf ("<null/>\n");
                           printf ("</%s>\n", $(x.Name : string));
                         ]>
                         :: serializers
        else
          /// and finally, we encounter case when there is no easy way to serialize 
          /// given field
          Message.FatalError ("field `" + x.Name + "' cannot be serialized")
    else
      Message.FatalError ("field `" + x.Name + "' cannot be serialized")
  };
  // after analyzing fields, we create method in our type, to execute created
  // expressions
  t.Define (<[ decl: public Serialize () : void
                     implements ISerializable.Serialize {
                       .. $serializers
                     }
            ]>);
}

Execution stages

Analysing object-oriented hierarchy and class members is a separate pass of the compilation. First it creates inheritance relation between classes, so we know exactly all base types of given type. After that every member inside of them (methods, fields, etc.) is being analysed and added to the hierarchy and its type annotations are resolved. After that also the rules regarding implemented interface methods are checked.

For the needs of macros we have decided to distinguish three moments in this pass at which they can operate on elements of class hierarchy. Every macro can be annotated with a stage, at which it should be executed.

BeforeInheritance stage is performed after parsing whole program and scanning declared types, but before building subtyping relation between them. It gives macro a freedom to change inheritance hierarchy and operate on parse-tree of classes and members
BeforeTypedMembers is when inheritance of types is already set. Macros can still operate on bare parse-trees, but utilize information about subtyping.
WithTypedMembers stage is after headers of methods, fields are already analysed and in bound state. Macros can easily traverse entire class space by reflecting type constructors of fields, method parameters, etc. Original parse-trees are no longer available and signatures of class members cannot be changed.

Parameters of attribute macros

Every executed attribute macro operates on some element of class hierarchy, so it must be supplied with an additional parameter describing the object, on which macro was placed. This way it can easily query for properties of that element and use compiler's API to reflect or change the context in which it was defined.

For example a method macro declaration would be

[Nemerle.MacroUsage (Nemerle.MacroPhase.WithTypedMembers,
                     Nemerle.MacroTargets.Method)]
macro MethodMacro (t : TypeBuilder, f : MethodBuilder, expr)
{
  // use 't' and 'f' to query or change class-level elements
  // of program
}

Macro is annotated with additional attributes specifying respectively the stage in which macro will be executed and the macro target.

The available parameters contain references to class hierarchy elements that given macro operates on. They are automatically supplied by compiler and they vary on the target and stage of given macro. Here is a little table specifying valid parameters for each stage and target of attribute macro.

Attribute macro targets and parameters
MacroTarget	MacroPhase.BeforeInheritance	MacroPhase.BeforeTypedMembers	MacroPhase.WithTypedMembers
Class	TypeBuilder	TypeBuilder	TypeBuilder
Method	TypeBuilder, ParsedMethod	TypeBuilder, ParsedMethod	TypeBuilder, MethodBuilder
Field	TypeBuilder, ParsedField	TypeBuilder, ParsedField	TypeBuilder, FieldBuilder
Property	TypeBuilder, ParsedProperty	TypeBuilder, ParsedProperty	TypeBuilder, PropertyBuilder
Event	TypeBuilder, ParsedEvent	TypeBuilder, ParsedEvent	TypeBuilder, EventBuilder
Parameter	TypeBuilder, ParsedMethod, ParsedParameter	TypeBuilder, ParsedMethod, ParsedParameter	TypeBuilder, MethodBuilder, ParameterBuilder
Assembly	(none)	(none)	(none)

The intuition is that every macro has parameter holding its target and additionally objects containing it (like TypeBuilder is available in most of the attribute macros).

After those implicitly available parameters there come standard parameters explicitly supplied by user. They are the same as for expression level macros.

Reference to more advanced aspects

Hygiene and alpha-renaming of identifiers

Problem with names capture

Identifiers in quoted code (object code) must be treated in a special way, because we usually do not know in which scope they would appear. Especially they should not mix with variables with the same names from the macro-use site.

Consider the following macro defining a local function f

macro identity (e) { <[ def f (x) { x }; f($e) ]> }

Calling it with identity (f(1)) might generate confusing code like

def f (x) { x }; f (f (1))

To preserve names capture, all macro generated variables should be renamed to their unique counterparts, like in

def f_42 (x_43) { x_43 }; f_42 (f (1))

Hygiene of macros

The idea of separating variables introduced by a macro from those defined in the plain code (or other macros) is called `hygiene' after Lisp and Scheme languages. In Nemerle we define it as putting identifiers created during a single macro execution into a unique namespace. Variables from different namespaces cannot bind to each other.

In other words, a macro cannot create identifiers capturing any external variables or visible outside of its own generated code. This means, that there is no need to care about locally used names.

The Hygiene is obtained by encapsulating identifiers in special Name class. The compiler uses it to distinguish names from different macro executions and scopes (for details of implementation consult paper about macros). Variables with appropriate information are created automatically by quotation.

def definition = <[ def y = 4 ]>;
<[ def x = 5; $definition; x + y ]>

When a macro creates the above code, identifiers y and x are tagged with the same unique mark. Now they cannot be captured by any external variables (with a different mark). We operate on the Name class, when the quoted code is composed or decomposed and we use <[ $(x : name) ]> construct. Here x is bound to am object of type Name, which we can use in other place to create exactly the same identifier.

An identifier can be also created by calling method Macros.NewSymbol(), which returns Name with an unique identifier, tagged with a current mark.

def x = Macros.NewSymbol ();
<[ def $(x : name) = 5; $(x : name) + 4 ]>

Controlled breaking hygiene

Sometimes it is useful to generate identifiers, which bind to variables visible in place where a macro is used. For example one of macro's parameters is a string with some identifiers inside. If we want to use these as real identifiers, then we need to break automatic hygiene. It is especially useful in embedding domain-specific languages, which reference symbols from the original program.

As an example consider a Nemerle.IO.sprint (string literal) macro (which have the syntax shortcut $"some text $id "). It searches given string literal for $var and creates a code concatenating text before and after $var to the value of var.ToString ().

def x = 3;
System.Console.WriteLine ($"My value of x is $x and I'm happy");

expands to

def x = 3;
System.Console.WriteLine ({ 
  def sb = System.Text.StringBuilder ("My value of x is "); 
  sb.Append (x.ToString ()); 
  sb.Append (" and I'm happy"); 
  sb.ToString () 
});

Breaking of hygiene is necessary here, because we generate code (reference to x), which need to have the same context as variables from invocation place of macro.

To make given name bind to the symbols from macro usesite, we use Nemerle.Macros.UseSiteSymbol (name : string) : Name function, or special splicing target usesite in quotations. Their use would be like in this simplified implementation of macro

macro sprint (lit : string) 
{
  def (prefix, symbol, suffix) = Helper.ExtractDollars (lit);
  def varname = Nemerle.Macros.UseSiteSymbol (symbol);
  <[ 
    def sb = System.Text.StringBuilder ($(prefix : string)); 
    sb.Append ($(varname : name).ToString ()); 
    // or alternatively  $(symbol : usesite)
    sb.Append ($(suffix : string)); 
    sb.ToString () 
  ]>
}

Note that this operations is 'safe', that is it changes context of variable to the place where macro invocation was created (see paper for more details).

Unhygienic variables

Sometimes it is useful to completely break hygiene, where programmer only want to experiment with new ideas. From our experience, it is often hard to reason about correct contexts for variables, especially when writing class level macros. In this case it is useful to be able to easily break hygine.

Nemerle provides it with <[ $("id" : dyn) ]> construct. It makes produced variable break hygiene rules and always bind to the nearest definition with the same name.