Nim Metaprogramming - Macro Tutorial
This tutorial aims to be a step-by-step introduction to the metaprogramming features of the Nim Language and to provide as much detail as possible to kickstart your craziest projects. There are already many resources on the Web, but I strive to provide more thorough details on the development process and to gather them all in one place.
⚠️ This tutorial is still under heavy development.
Table of Contents:
Introduction
Four levels of abstraction
There are four levels of abstraction in metaprogramming that are each a special kind of procedure:
- Ordinary procedures/iterators (No metaprogramming)
- Generic procedures/iterators and typedescs (Type level)
- Template (Copy-paste mechanism)
- Macro (AST substitution)
It is recommended to start to program one's procedure with the lowest level of metaprogramming possible. As more metaprogramming features are used, the compilation process takes longer and error debugging gets harder.
Generics
We often program to perform repetitive tasks easily. Programs must adapt themselves to many cases and might be redundant in a first approach. To limit the scope for debugging, we like to avoid redundancy and let the compiler do code duplication for us. Code duplication means that the generated assembly code has very similar or identical block instructions.
One common example is linear algebra. Imagine you want to perform an addition. Your input data is very general and may as well be integers, floating-point numbers. You do not want to write twice your addition function.
# What to not do!
proc add(x, y: int): int =
return x + y
proc add(x, y: float): float =
return x + y
echo add 2 3
echo add 3.7 4.5
Indeed, what if you want to add a function for other types like int32
or float16
?
You will have to copy-paste your function, and change the type. Not a problem?
There is nothing in the code telling you how many add
functions there is in total.
Whenever a code slip in one of your function, you will have to track all the add
functions and fix the bug in all of them.
Generics bring a solution to this:
proc add[T](x,y: T): T =
return x + y
Let us start with template
s and untyped
parameters.
To run each snippet of code in this tutorial, you will need to import the std/macros
package.
import std/macros
Templates
We can see templates as procedures that modify code through a copy-paste mechanism. Pieces of code are given to (and outputted by) the template with a special type : untyped
.
For those familiar with preprocessing in the C family of languages (C, C++, C#), it does the same than the #define
or #if
, #endif
macros and much more.
Nim's language defines boolean operator like !=
with templates. You can even look at Nim's source code, that's almost the same code. See the documentation.
## Example from std/manual
template `!=` (a, b: untyped): untyped =
not (a == b)
doAssert(4 != 5)
We can easily repeat code under a custom block. Here duplicate, just duplicate code and repeat takes an additional parameter, an int, as a generalisation of the duplicate
template.
Notice that duplicate is not smart. It will repeat any assignment twice in the code's block.
template duplicate(statements: untyped) =
statements
statements
duplicate:
echo 5
5 5
## Example from Nim In Action
from std/os import sleep
template repeat(count: int, statements: untyped) =
for i in 0 ..< count:
statements
repeat 5:
echo("Hello Templates!")
sleep(100)
Hello Templates! Hello Templates! Hello Templates! Hello Templates! Hello Templates!
Do-While keyword
In Nim, there are few restricted keywords and special control-flow mechanisms, as to incite us to create our own constructs (and keep the language simple). Nothing restrains us from defining a doWhile
construct similar to languages like C
or Javascript
.
For those only knowing Nim, this construct enables to run a loop once before testing the condition.
This C code always print Hello World
at least once independantly from the start value of the variable i
.
int i = 10;
do{
printf("Hello World\n");
i += 1;
}while(i < 10);
template doWhile(conditional, loop: untyped) =
loop
while conditional:
loop
var i = 10
doWhile i < 10:
echo "Hello World"
i.inc
Hello World
Notice though that syntaxically the resulting source code is fairly different than the C++ code.
In the C source code, appear in this order:
- the
do
keyword - the block of instruction
- the
while
keyword - the conditional (boolean expression)
In Nim, we have in this order:
- the
doWhile
indent - the conditional
- block of instruction
There is no way to modify Nim's syntax as to match C's syntax.
Benchmark example
Another example is benchmarking code in Nim. It suffices to put our bench code inside a special block.
import std/[times, monotimes]
template benchmark(benchmarkName: string, code: untyped) =
block:
let t0 = getMonoTime()
code
let elapsed = getMonoTime() - t0
echo "CPU Time [", benchmarkName, "] ", elapsed
benchmark "test1":
sleep(100)
CPU Time [test1] 100 milliseconds, 146 microseconds, and 850 nanoseconds
The code inside the benchmark
code block will be enclosed by our template code.
Since the code replacement is done at compile time, this transformation does not add additional runtime to our benchmarked code. On the contrary, a function or procedure for benchmarking would have add runtime due to the nested function calls.
Macros
Template uses untyped
parameters as lego bricks. It can not break it down into smaller pieces.
We can not check untyped parameters in a template. If our template works when given an object as argument, nothing restrics an user to give a function as argument.
Macros can be seen as an empowered template procedure. While template substitute code, macros do introspection. The main difference is that a template can not look inside an untyped parameter. This means that we can not check the input we get as to verify that the user did not give a function when we expect a type.
One can parse untyped parameters with macros. We can even act something conditionally to informations given in these parameters. We can also inject variables into scopes.
macro throwAway(statements: untyped): untyped =
result = newStmtList()
throwAway:
while true:
echo "If you do not throw me, I'll spam you indefinitely!"
AST Manipulation
In Nim, the code is read and transformed in an internal intermediate representation called an Abstract Syntax Tree (AST). To get a representation of the AST corresponding to a code, we can use the macro
dumpTree
.
# Don't forget to import std/macros!
# You can use --hints:off to display only the AST tree
dumpTree:
type
myObject {.packed.} = ref object of RootObj
left: seq[myObject]
right: seq[myObject]
This code outputs the following AST tree (it should not change among Nim versions).
StmtList
TypeSection
TypeDef
PragmaExpr
Ident "myObject"
Pragma
Ident "packed"
Empty
RefTy
ObjectTy
Empty
OfInherit
Ident "RootObj"
RecList
IdentDefs
Ident "left"
BracketExpr
Ident "seq"
Ident "myObject"
Empty
IdentDefs
Ident "right"
BracketExpr
Ident "seq"
Ident "myObject"
Empty
We can better visualize the tree structure of the AST with the following picture.
Multiply by two macro
This example of macro is taken from this Youtube video made by Fireship.
macro timesTwo(statements: untyped): untyped =
for s in result:
for node in s:
if node.kind == nnkIntLit:
node.intVal = node.intVal*2
timesTwo:
echo 1 # 2
echo 2 # 4
echo 3 # 6
This macro multiplies each integer values by two before plotting! Let us breakdown this macro, shall we ? To understand how a macro work, we first may look at the AST given as input.
dumpTree:
echo 1
By compiling this code, you will get the corresponding AST. This simple AST is made of four nodes:
StmtList
Command
Ident "echo"
IntLit 1
StmtList
stands for statements list. It groups together all the instructions in your block.
The Command
node indicates that you use a function whose name is given by its child Ident
node. An Ident
can be any variable, object, procedure name.
Our integer literal whose value is 1 has the node kind IntLit
.
Notice that the order of the nodes in the AST is crucial. If we invert the two last nodes, we would get the AST of the code 1 echo
which does not compile.
StmtList
Command
IntLit 1
Ident "echo"
StmtList
, Command
, IntLit
and Ident
are the NodeKind of the code's AST.
Inside your macro, they are denoted with the extra prefix nnk
, e.g. nnkIdent
.
You can get the full list of node kinds at the std/macros source code.
macro timesTwoAndEcho(statements: untyped): untyped =
for s in result:
for node in s:
if node.kind == nnkIntLit:
node.intVal = node.intVal*2
echo repr result
timesTwoAndEcho:
echo 1
echo 2
echo 3
The output of a macro is an AST, and we can try to write it for a few examples:
StmtList
Command
Ident "echo"
IntLit 2
Command
Ident "echo"
IntLit 4
Command
Ident "echo"
IntLit 6
Please note that line breaks are not part of the Nim's AST!
Here, the output AST is almost the same as the input. We only change the integer literal value.
Our root node in the input AST is a statement list.
To fetch the Command
children node, we may use the list syntax.
A Node contains the list of its childrens. To get the first children, it suffices to write statements[0]
.
To loop over all the child nodes, one can use a for statement in statements
loop.
We need to fetch the nodes under a Command
instruction that are integer literals.
So for each node in the statement, we test if the node kind is equal to nnkIntLit
. We get their value with the attribute node.intVal
.
I present down my first macro as an example. I want to print the memory layout of a given type. My goal is to find misaligned fields making useless unocuppied memory in a type object definition. This happens when the attributes have types of different sizes. The order of the attributes then changes the memory used by an object. To deal with important chunks of memory, the processor stores an object and its attributes with some rules.
It likes when adresses are separated by powers of two. If it is not, it inserts a padding (unoccupied memory) between two attributes.
We can pack a structure with the pragma {.packed.}
, which removes this extra space. This has the disadvantage to slow down memory accesses.
We would like to detect the presence of holes in an object.
The first step is to look at the AST of the input code we want to parse.
One can look first at the most basic type definition possible, before trying to complexify the AST to get a feel of all the edge cases.
dumpTree:
type
Thing = object
a: float32
StmtList
TypeSection
TypeDef
Ident "Thing"
Empty
ObjectTy
Empty
Empty
RecList
IdentDefs
Ident "a"
Ident "float32"
Empty
We have to get outputs as much complex as possible to detect edge cases, while keeping the information to the minimum to easily read the AST and locate errors. I present here first some samples of type definition on which I will run my macro.
typeMemoryRepr:
type
Thing2 = object
oneChar: char
myStr: string
type
Thing = object of RootObj
a: float32
b: uint64
c: char
Type with pragmas aren't supported yet
when false: # erroneous code
typeMemoryRepr:
type
Thing {.packed.} = object
oneChar: char
myStr: string
It is not easy (if even possible) to list all possible types. Yet by adding some other informations we can get a better picture of the general AST of a type.
dumpTree:
type
Thing {.packed.} = object of RootObj
a: float32
b: string
StmtList
TypeSection
TypeDef
PragmaExpr
Ident "Thing"
Pragma
Ident "packed"
Empty
ObjectTy
Empty
OfInherit
Ident "RootObj"
RecList
IdentDefs
Ident "a"
Ident "float32"
Empty
IdentDefs
Ident "b"
Ident "string"
Empty
Notice how the name of the type went under the PragmaExpr section. We have to be careful about this when trying to parse the type.
A macro does always the same steps:
- Search for a node of a specific kind, inside the input AST or check that the given node is of the expected kind.
- Fetch properties of the selected node.
- Form AST output in function of these input node's properties.
- Continue exploring the AST.
Your macros will require a long docstring and many comments both with thorough details.
I present now my macro typeMemoryRepr
inspired from the nim memory guide on memory representation.
In this guide, we manually print types fields address, to get an idea of the memory layout and the space taken by each variable and its fields.
type Thing = object
a: uint32
b: uint8
c: uint16
var t: Thing
echo "size t.a ", t.a.sizeof
echo "size t.b ", t.b.sizeof
echo "size t.c ", t.c.sizeof
echo "size t ", t.sizeof
echo "addr t.a ", t.a.addr.repr
echo "addr t.b ", t.b.addr.repr
echo "addr t.c ", t.c.addr.repr
echo "addr t ", t.addr.repr
All these echo's are redundant and have to be changed each time we change the type field. For types with more than four or five fields, this becomes not manageable.
I have split this macro into different procedures.
The echoSizeVarFieldStmt
will take the name of a variable, let us say a
and of its field field
and return the code:
echo a.field.sizeof
We create a NimNode of kind StmtList
(a statement list), that contains IdentNode
s.
The first IdentNode
is the command echo
.
We do not represent spaces in the AST. Each term separated by a dot is an Ident and part of a nnkDotExpr
.
It suffices to output the above code under a dumpTree
block, to understand the AST we have to generate.
dumpTree:
echo a.field.sizeof
proc echoSizeVarFieldStmt(variable: string, nameOfField: string): NimNode =
## quote do:
## echo `variable`.`nameOfField`.sizeof
newStmtList(nnkCommand.newTree(
newIdentNode("echo"),
nnkDotExpr.newTree(
nnkDotExpr.newTree(
newIdentNode(variable),
newIdentNode(nameOfField) # The name of the field is the first ident
),
newIdentNode("sizeof")
)
))
The echoAddressVarFieldStmt
will take the name of a variable, let us say a
and of its field field
and return its address:
echo a.field.addr.repr
proc echoAddressVarFieldStmt(variable: string, nameOfField: string): NimNode =
## quote do:
## echo `variable`.`nameOfField`.addr.repr
newStmtList(nnkCommand.newTree(
newIdentNode("echo"),
nnkDotExpr.newTree(
nnkDotExpr.newTree(
nnkDotExpr.newTree(
newIdentNode(variable),
newIdentNode(nameOfField)
),
newIdentNode("addr")
),
newIdentNode("repr")
)
))
macro typeMemoryRepr(typedef: untyped): untyped =
## This macro takes a type definition as an argument and:
## * defines the type (outputs typedef as is)
## * initializes a variable of this type
## * echoes the size and address of the variable
## Then, for each field:
## * echoes the size and address of the variable field
# We begin by running the type definition.
result = quote do:
`typedef`
# Parse the type definition to find the TypeDef section's node
# We create the output's AST along parsing.
# We will receive a statement list as the root of the AST
for statement in typedef:
# We select only the type section in the StmtList
if statement.kind == nnkTypeSection:
let typeSection = statement
for i in 0 ..< typeSection.len:
if typeSection[i].kind == nnkTypeDef:
var tnode = typeSection[i]
# The name of the type is the first Ident child. We can get the ident's string with strVal or repr
let nameOfType = typeSection[i].findChild(it.kind == nnkIdent)
## Generation of AST:
# We create a variable of the given type definition (hopefully not already defined) name for the "myTypenameVar"
let nameOfTestVariable = "my" & nameOfType.strVal.capitalizeAscii() & "Var"
let testVariable = newIdentNode(nameOfTestVariable)
result = result.add(
quote do:
var `testVariable`:`nameOfType` # instanciate variable with type defined in typedef
echo `testVariable`.sizeof # echo the total size
echo `testVariable`.addr.repr # gives the address in memory
)
# myTypeVar.field[i] memory size and address in memory
tnode = tnode[2][2] # The third child of the third child is the fields's AST
assert tnode.kind == nnkRecList
for i in 0 ..< tnode.len:
# myTypeVar.field[i].sizeof
result = result.add(echoSizeVarFieldStmt(nameOfTestVariable, tnode[i][0].strVal))
# myTypeVar.field[i].addr.repr
result = result.add(echoAddressVarFieldStmt(nameOfTestVariable, tnode[i][0].strVal))
echo result.repr
typeMemoryRepr:
type
Thing = object of RootObj
a: float32
b: string
32 ptr Thing(a: 0.0, b: "") 4 ptr 0.0 16 ptr ""
Trying to parse a type ourselve is risky, since there are numerous easily forgettable possibilities (due to pragma expressions, cyclic types, and many kind of types: object, enum, type alias, etc..., case of fields, branching and conditionals inside the object, … ).
There is actually already a function to do so and this will be the object of a future release of this tutorial.
The following macro enables to create enums with power of two values.
import std/[enumerate, math]
# jmgomez on Discord
macro power2Enum(body: untyped): untyped =
let srcFields = body[^1][1..^1]
var dstFields = nnkEnumTy.newTree(newEmptyNode())
for idx, field in enumerate(srcFields):
dstFields.add nnkEnumFieldDef.newTree(field, newIntLitNode(pow(2.0, idx.float).int))
body[^1] = dstFields
echo repr body
body
type Test {.power2Enum.} = enum
a, b, c, d
A macro is not always the best alternative. A simple set and a cast gives the same result.
# Rika
type
Setting = enum
a, b, c
Settings = set[Setting]
let settings: Settings = {a, c}
echo cast[uint8](settings)
5
References and Bibliography
Press Ctrl
+ Click
to open following links in a new tab.
First, there are four official resources at the Nim's website:
- Nim by Example
- Nim Tutorial (Part III)
- Manual section about macros
- The Standard Documentation of the std/macros library The 2. and 3. documentations are complementary learning resources while the last one will be your up-to-date exhaustive reference. It provides dumped AST (explained later) for all the nodes.
Many developers have written their macro's tutorial:
- Nim in Y minutes
- Jason Beetham a.k.a ElegantBeef's dev.to tutorial. This tutorial contains a lot of good first examples.
- Pattern matching (sadly outdated) in macros by DevOnDuty
- Tomohiro's FAQ section about macros
- The Making of NimYAML's article of flyx
There are plentiful of posts in the forum that are good references:
- What is "Metaprogramming" paradigm used for ?
- Custom macro inserts macro help
- See generated code after template processing
- Fast array assignment
- Variable injection
- Proc inspection
- etc … Please use the forum search bar with specific keywords like
macro
,metaprogramming
,generics
,template
, …
Last but no least, there are three Nim books:
- Nim In Action, ed. Manning and github repo
- Mastering Nim, auto-published by A. Rumpf/Araq, Nim's creator.
- Nim Programming Book, by S.Salewski
We can also count many projects that are macro- or template-based:
-
genny and benchy. Benchy is a template based library that benchmarks your code snippet under bench blocks. Genny is used to export a Nim library to other languages (C, C++, Node, Python, Zig). In general, treeform projects source code are good Nim references
-
My favorite DSL : the neural network domain specific language (DSL) of the tensor library Arraymancer mratsim develops this library, and made a list of all his DSL in the forum.
-
Jester library is a HTML DSL, where each block defines a route in your web application.
-
nimib with which this blog post has been written.
-
Nim4UE. You can develop Nim code for the Unreal Engine 5 game engine. The macro system parses your procs and outputs DLL for UE.