Nemerle type-safe macros
Basically every macro is a function, which takes a fragment
of code as parameter(s) and returns some other code. On the highest
level of abstraction it doesn't matter if parameters are function calls,
type definitions or just a sequence of assignments. The most important fact is
that they are not common objects (e.g. instances of some types, like
integer numbers), but their internal representation in the compiler (i.e.
syntax trees).
A macro is defined in the program just like any other function, using
common Nemerle syntax. The only difference is the structure of the data
it operates on and the way in which it is used (executed at compile-time).
A macro, once created, can be used to process some parts of the code.
It's done by calling it with block(s) of code as parameter(s).
This operation is in most cases indistinguishable from a common function
call (like f(1)), so a programmer using a macro would not be confused
by unknown syntax. The main concept of our design is to make the usage of macros
as transparent as possible. From the user point of view, it is not
important if particular parameters are passed to a macro,
(which would process them at the compile-time and insert some new
code in their place), or to an ordinary function.
Writing a macro is as simple as writing a common function. It looks
the same, except that it is preceded by a keyword macro
and it lives at the top level (not inside any class).
This will make the compiler know about how to use the defined method
(i.e. run it at the compile-time in every place where it is used).
Macros can take zero (if we just want to generate new code)
or more parameters. They are all elements of the language
grammar, so their type is limited to the set of defined
syntax objects. The same holds for a return value of a macro.
Example:
macro generate_expression ()
{
MyModule.compute_some_expression ();
}
This example macro does not take any parameters and is used in the
code by simply writing generate_expression ();
.
The most important is the difference between generate_expression
and compute_some_expression
- the first one is a function
executed by the compiler during compilation, while the latter is just
some common function that must return syntax tree of expressions
(which is here returned and inserted into program code by
generate_expression
).
In order to create and use a macro you have to write a
library, which will contain its executable form. You simply
create a new file mymacro.n
, which can contain for
example
macro m () {
Nemerle.IO.printf ("compile-time\n");
<[ Nemerle.IO.printf ("run-time\n") ]>;
}
and compile it with command
ncc -r Nemerle.Compiler.dll -t:dll mymacro.n -o mymacro.dll
Now you can use m()
in any program, like here
module M {
public Main () : void {
m ();
}
}
You must add a reference to mymacro.dll
during
compilation of this program. It might look like
ncc -r mymacro.dll myprog.n -o myprog.exe
Write a macro, which, when used, should slow down the compilation by 5 seconds
(use System.Timers
namespace) and print the version of the operating
system used to compile program (use System.Environment
namespace).
Definition of function compute_some_expression
might look
like:
using Nemerle.Compiler.Parsetree;
module MyModule
{
public mutable debug_on : bool;
public compute_some_expression () : PExpr
{
if (debug_on)
<[ System.Console.WriteLine ("Hello, I'm debug message") ]>
else
<[ () ]>
}
}
The examples above show a macro, which conditionally inlines expression
printing a message. It's not quite useful yet, but it has introduced the
meaning of compile-time computations and also some new syntax used only
in writing macros and functions operating on syntax trees.
We have written here the <[ ... ]>
constructor to
build a syntax tree of expression (e.g. '()
').
<[ ... ]>
is used to both construction and
decomposition of syntax trees. Those operations are similar to
quotation of code. Simply, everything which is written inside
<[ ... ]>
, corresponds to its own syntax tree.
It can be any valid Nemerle code, so a programmer does not have to
learn internal representation of syntax trees in the compiler.
macro print_date (at_compile_time)
{
match (at_compile_time) {
| <[ true ]> => MyModule.print_compilation_time ()
| _ => <[ WriteLine (DateTime.Now.ToString ()) ]>
}
}
The quotation alone allows using only constant expressions, which
is insufficient for most tasks. For example, to write function
print_compilation_time
we must be able to create an expression
based on a value known at the compile-time. In next sections we introduce
the rest of macros' syntax to operate on general syntax trees.
When we want to decompose some large code (or more precisely,
its syntax tree), we must bind its smaller parts to variables.
Then we can process them recursively or just use them in an
arbitrary way to construct the result.
We can operate on entire subexpressions by writing
$( ... )
or $ID
inside the quotation operator
<[ ... ]>
. This means binding the value of
ID
or the interior of parenthesized expression to the part of
syntax tree described by corresponding quotation.
macro for (init, cond, change, body)
{
<[
$init;
def loop () : void {
if ($cond) { $body; $change; loop() }
else ()
};
loop ()
]>
}
The above macro defines function for
, which is
similar to the loop known from C. It can be used like this
for (mutable i = 0, i < 10, i++, printf ("%d", i))
Later we show how to extend the language syntax to make the syntax
of for
exactly as in C.
Sometimes quoted expressions have literals inside of them
(like strings, integers, etc.) and we want to operate on
their value, not on their syntax trees. It is possible,
because they are constant expressions and their runtime
value is known at the compile-time.
Let's consider the previously used function print_compilation_time
.
using System;
using Nemerle.Compiler.Parsetree;
module MyModule {
public print_compilation_time () : PExpr
{
<[ System.Console.WriteLine ($(DateTime.Now.ToString () : string)) ]>
}
}
Here we see some new extension of splicing syntax where we
create a syntax tree of string literal from a known value.
It is done by adding : string
inside the
$(...)
construct. One can think about it as of
enforcing the type of spliced expression to a literal (similar
to common Nemerle type enforcement), but in the matter
of fact something more is happening here - a real value
is lifted to its representation as syntax tree of a literal.
Other types of literals (int
, bool
, float
,
char
) are treated the same.
This notation can be used also in pattern matching. We can
match constant values in expressions this way.
There is also a similar schema for splicing and matching
variables of a given name. $(v : name)
denotes a
variable, whose name is contained by object v
(of special type Name
). There are some good
Macros#Problem_with_names_capture for encapsulating a real identifier
within this object.
You might have noticed, that Nemerle has a few grammar elements,
which are composed of a list of subexpressions. For example, a sequence
of expressions enclosed with {
.. }
braces may
contain zero or more elements.
When splicing values of some expressions, we would like to decompose
or compose such constructs in a general way - i.e. obtain all expressions
in a given sequence. It is natural to think about them as if a list of
expressions and to bind this list to some variable in meta-language.
It is done with special syntax ..
:
mutable exps = [ <[ printf ("%d ", x) ]>, <[ printf ("%d ", y) ]> ];
exps = <[ def x = 1 ]> :: <[ def y = 2 ]> :: exps;
<[ {.. $exps } ]>
We have used { .. $exps }
here to create the sequence of
expressions from list exps : list[Expr]
.
A similar syntax is used to splice the content of tuples (( .. $elist )
)
and other constructs, like array []
:
using Nemerle.Collections;
macro castedarray (e) {
match (e) {
| <[ array [.. $elements ] ]> =>
def casted = List.Map (elements, fun (x) { <[ ($x : object) ]> });
<[ array [.. $casted] ]>
| _ => e
}
}
If the exact number of expressions in tuple/sequence is known during
writing the quotation, then it can be expressed with
<[ $e_1; $e_2; $e_3; x = 2; f () ]>
The ..
syntax is used when there are e_i : Expr
for
1 <= i <= n
.
Write a macro rotate
, which takes two parameters: a pair of
floating point numbers (describing a point in 2D space) and an angle (in
radians). The macro should return a new pair -- a point rotated by the given
angle. The macro should use as much information as is available at the
compile-time, e.g. if all numbers supplied are constant, then only the final
result should be inlined, otherwise the result must be computed at runtime.
After we have written the for
macro, we would like the compiler
to understand some changes to its syntax. Especially the C-like notation
for (mutable i = 0; i < n; --i) {
sum += i;
Nemerle.IO.printf ("%d\n", sum);
}
In order to achieve that, we have to define which tokens and grammar
elements may form a call of for
macro. We do that by changing
its header to
macro for (init, cond, change, body)
syntax ("for", "(", init, ";", cond, ";", change, ")", body)
The syntax
keyword is used here to define a list of elements forming
the syntax of the macro call. The first token must always be an unique identifier
(from now on it is treated as a special keyword triggering parsing of
defined sequence). It is followed by tokens composed of operators or
identifiers passed as string literals or names of parameters of macro.
Each parameter must occur exactly once.
Parsing of syntax rule is straightforward - tokens from input
program must match those from definition, parameters are parsed
according to their type. Default type of a parameter is
Expr
, which is just an ordinary expression (consult Nemerle
grammar in Reference). All allowed parameter types
will be described in the extended version of reference manual corresponding
to macros.
Add a new syntactic construct forpermutation
to your program.
It should be defined as the macro
macro forp (i, n : int, m : int, body)
and introduce syntax, which allows writing the following program
mutable i = 0;
forpermutation (i in 3 to 10) Nemerle.IO.printf ("%d\n", i)
It should create a random permutation p
of numbers
x_j, m <= x_j <= n
at the compile-time.
Then generate the code executing body of the loop
n - m + 1
times, preceding each of them with assignment of
permutation element to i
.
Nemerle macros are simply plugins to the compiler. We decided
not to restrict them only to operations on expressions, but
allow them to transform almost any part of program.
Macros can be used within custom attributes written near methods,
type declarations, method parameters, fields, etc. They are
executed with those entities passed as their parameters.
As an example, let us take a look at Serializable
macro.
Its usage looks like this:
[Serializable]
class S {
public this (v : int, m : S) { a = v; my = m; }
my : S;
a : int;
}
From now on,
S
has additional method
Serialize
and it implements interface
ISerializable
. We can use
it in our code like this
def s = S (4, S (5, null));
s.Serialize ();
And the output is
<a>4</a>
<my>
<a>5</a>
<my>
<null/>
</my>
</my>
The macro modifies type S at compile-time and adds some code to it.
Also inheritance relation of given class is changed, by making it
implement interface ISerializable
public interface ISerializable {
Serialize () : void;
}
In general, macros placed in attributes can do many transformations
and analysis of program objects passed to them. To see
Serializable
macro's internals and discuss some design
issues, let's go into its code.
[Nemerle.MacroUsage (Nemerle.MacroPhase.BeforeInheritance, Nemerle.MacroTargets.Class,
Inherited = true)]
macro Serializable (t : TypeBuilder)
{
t.AddImplementedInterface (<[ ISerializable ]>)
}
First we have to add interface, which given type is about to
implement. But more important thing is the phase modifier
BeforeInheritance
in macro's custom attribute. In general,
we separate three
Macros_tutorial#Execution_stages for attribute macros.
BeforeInheritance
specifies that the macro will be able to change
subtyping information of the class it operates on.
So, we have added interface to our type, we now have to create
Serialize () method.
[Nemerle.MacroUsage (Nemerle.MacroPhase.WithTypedMembers, Nemerle.MacroTargets.Class,
Inherited = true)]
macro Serializable (t : TypeBuilder)
{
/// here we list its fields and choose only those, which are not derived
/// or static
def fields = t.GetFields (BindingFlags.Instance | BindingFlags.Public %|
BindingFlags.NonPublic | BindingFlags.DeclaredOnly);
/// now create list of expressions which will print object's data
mutable serializers = [];
/// traverse through fields, taking their type constructors
foreach (x : IField in fields) {
def tc = x.GetMemType ().TypeInfo;
def nm = Macros.UseSiteSymbol (x.Name);
if (tc != null)
if (tc.IsValueType)
/// we can safely print value types as strings
serializers = <[
printf ("<%s>", $(x.Name : string));
System.Console.Write ($(nm : name));
printf ("</%s>\n", $(x.Name : string));
]>
:: serializers
else
/// we can try to check, if type of given field also implements ISerializable
if (x.GetMemType ().Require (<[ ttype: ISerializable ]>))
serializers = <[
printf ("<%s>\n", $(x.Name : string));
if ($(nm : name) != null)
$(nm : name).Serialize ()
else
printf ("<null/>\n");
printf ("</%s>\n", $(x.Name : string));
]>
:: serializers
else
/// and finally, we encounter case when there is no easy way to serialize
/// given field
Message.FatalError ("field `" + x.Name + "' cannot be serialized")
else
Message.FatalError ("field `" + x.Name + "' cannot be serialized")
};
// after analyzing fields, we create method in our type, to execute created
// expressions
t.Define (<[ decl: public Serialize () : void
implements ISerializable.Serialize {
.. $serializers
}
]>);
}
Analysing object-oriented hierarchy and class members is a separate pass of the compilation.
First it creates inheritance relation between classes, so we know exactly all base types of
given type. After that every member inside of them (methods, fields, etc.)
is being analysed and added to the hierarchy and its type annotations are resolved.
After that also the rules regarding implemented interface methods are checked.
For the needs of macros we have decided to distinguish three moments in
this pass at which they can operate on elements of class hierarchy.
Every macro can be annotated with a stage, at which it should be executed.
- BeforeInheritance stage is performed after parsing whole program and scanning declared types, but before building subtyping relation between them. It gives macro a freedom to change inheritance hierarchy and operate on parse-tree of classes and members
- BeforeTypedMembers is when inheritance of types is already set. Macros can still operate on bare parse-trees, but utilize information about subtyping.
- WithTypedMembers stage is after headers of methods, fields are already analysed and in bound state. Macros can easily traverse entire class space by reflecting type constructors of fields, method parameters, etc. Original parse-trees are no longer available and signatures of class members cannot be changed.
Every executed attribute macro operates on some element of class hierarchy,
so it must be supplied with an additional parameter describing the object, on which macro was placed.
This way it can easily query for properties of that element and use compiler's API to reflect or change the
context in which it was defined.
For example a method macro declaration would be
[Nemerle.MacroUsage (Nemerle.MacroPhase.WithTypedMembers,
Nemerle.MacroTargets.Method)]
macro MethodMacro (t : TypeBuilder, f : MethodBuilder, expr)
{
// use 't' and 'f' to query or change class-level elements
// of program
}
Macro is annotated with additional attributes specifying respectively the
stage in which macro will be executed and the macro target.
The available parameters contain references to class hierarchy elements that given macro operates on.
They are automatically supplied by compiler and they vary on the target and stage of given macro.
Here is a little table specifying valid parameters for each stage and target of attribute macro.
Attribute macro targets and parameters MacroTarget | MacroPhase.BeforeInheritance | MacroPhase.BeforeTypedMembers | MacroPhase.WithTypedMembers |
---|
Class | TypeBuilder | TypeBuilder | TypeBuilder |
---|
Method | TypeBuilder, ParsedMethod | TypeBuilder, ParsedMethod | TypeBuilder, MethodBuilder |
---|
Field | TypeBuilder, ParsedField | TypeBuilder, ParsedField | TypeBuilder, FieldBuilder |
---|
Property | TypeBuilder, ParsedProperty | TypeBuilder, ParsedProperty | TypeBuilder, PropertyBuilder |
---|
Event | TypeBuilder, ParsedEvent | TypeBuilder, ParsedEvent | TypeBuilder, EventBuilder |
---|
Parameter | TypeBuilder, ParsedMethod, ParsedParameter | TypeBuilder, ParsedMethod, ParsedParameter | TypeBuilder, MethodBuilder, ParameterBuilder |
---|
Assembly | (none) | (none) | (none) |
---|
The intuition is that every macro has parameter holding its target and additionally objects containing it (like TypeBuilder is available in most of the attribute macros).
After those implicitly available parameters there come standard parameters explicitly supplied by user. They are the same as for expression
level macros.
Identifiers in quoted code (object code) must be treated
in a special way, because we usually do not know in which
scope they would appear. Especially they should not mix
with variables with the same names from the macro-use site.
Consider the following macro defining a local function f
macro identity (e) { <[ def f (x) { x }; f($e) ]> }
Calling it with
identity (f(1))
might generate
confusing code like
def f (x) { x }; f (f (1))
To preserve names capture, all macro generated variables
should be renamed to their unique counterparts, like in
def f_42 (x_43) { x_43 }; f_42 (f (1))
The idea of separating variables introduced by a macro from
those defined in the plain code (or other macros) is called
`hygiene' after Lisp and Scheme languages. In Nemerle
we define it as putting identifiers created during a single
macro execution into a unique namespace. Variables from
different namespaces cannot bind to each other.
In other words, a macro cannot create identifiers capturing
any external variables or visible outside of its own
generated code. This means, that there is no need to care
about locally used names.
The Hygiene is obtained by encapsulating identifiers in special
Name
class. The compiler uses it to distinguish names
from different macro executions and scopes (for details of
implementation consult paper about macros).
Variables with appropriate information are created
automatically by quotation.
def definition = <[ def y = 4 ]>;
<[ def x = 5; $definition; x + y ]>
When a macro creates the above code, identifiers
y
and
x
are tagged with the same unique mark. Now they
cannot be captured by any external variables (with a
different mark). We operate on the
Name
class, when the
quoted code is composed or decomposed and we use
<[ $(x : name) ]>
construct. Here
x
is bound to am object of type
Name
, which we can use
in other place to create exactly the same identifier.
An identifier can be also created by calling method
Macros.NewSymbol()
, which returns Name
with an unique identifier, tagged with a current mark.
def x = Macros.NewSymbol ();
<[ def $(x : name) = 5; $(x : name) + 4 ]>
Sometimes it is useful to generate identifiers, which
bind to variables visible in place where a macro is used. For example one of macro's parameters is a string with some identifiers inside. If we want to use these as real identifiers, then we need to break automatic hygiene. It is especially useful
in embedding domain-specific languages, which reference symbols from
the original program.
As an example consider a Nemerle.IO.sprint (string literal)
macro (which have the syntax shortcut $"some text $id "
).
It searches given string literal for $var and creates a code concatenating text before and after $var to the value of var.ToString ()
.
def x = 3;
System.Console.WriteLine ($"My value of x is $x and I'm happy");
expands to
def x = 3;
System.Console.WriteLine ({
def sb = System.Text.StringBuilder ("My value of x is ");
sb.Append (x.ToString ());
sb.Append (" and I'm happy");
sb.ToString ()
});
Breaking of hygiene is necessary here, because we generate code (reference to x), which need to have the
same context as variables from invocation place of macro.
To make given name bind to the symbols from macro usesite, we use Nemerle.Macros.UseSiteSymbol (name : string) : Name
function, or
special splicing target usesite in quotations. Their use would be like in this simplified implementation of macro
macro sprint (lit : string)
{
def (prefix, symbol, suffix) = Helper.ExtractDollars (lit);
def varname = Nemerle.Macros.UseSiteSymbol (symbol);
<[
def sb = System.Text.StringBuilder ($(prefix : string));
sb.Append ($(varname : name).ToString ());
// or alternatively $(symbol : usesite)
sb.Append ($(suffix : string));
sb.ToString ()
]>
}
Note that this operations is 'safe', that is it changes context of variable to the place where macro invocation was created
(see paper for more details).
Sometimes it is useful to completely break hygiene, where programmer
only want to experiment with new ideas. From our experience, it is often
hard to reason about correct contexts for variables, especially when
writing class level macros. In this
case it is useful to be able to easily break hygine.
Nemerle provides it with <[ $("id" : dyn) ]>
construct. It makes produced variable break hygiene rules and always
bind to the nearest definition with the same name.