Nimrod: A New Systems Programming Language

Image

A language with extensive metaprogramming support, generics and exception tracking built in, optional garbage collection, and rivals C in performance. And it can compile to C, C++, Objective-C, or JavaScript.

Nimrod is a statically typed, imperative programming language that tries to give the programmer ultimate power without compromising runtime efficiency. This means it focuses on compile-time mechanisms in all their various forms.

It uses a syntax that is a reminiscent of both Python and Pascal, has an AST-based clean macro system that is ideal for metaprogramming. It supports soft realtime GC on thread-local heaps and uses asynchronous message passing between threads, so no “stop the world” mechanism is necessary. An unsafe shared memory heap is also provided for the increased efficiency that results from that model.

It compiles to commented C code, which attains consistently outstanding results on benchmarks. It can also be made to output JavaScript. The entire Nimrod toolchain (compiler, library, build tool, and so on) is written in Nimrod.

This article gives a quick overview of Nimrod’s many features. After an introduction to Nimrod’s syntax, I show how the language allows for common procedural, functional, OO, and metaprogramming techniques while remaining simple and efficient.

Introduction to Nimrod’s Syntax

Nimrod uses a conventional infix-based syntax. Like Python or Haskell, it uses indentation rather than braces to group statements. The usual control flow statements such as if, case, while, and for are provided.

Slightly unusual are the many ways in which function applications can be written: There is the traditional prefix notation f(x, y). If the call is a statement, the parentheses may be omitted: f x, y. This is called the command notation and this version of “hello world” makes use of it:

echo "hello world!"

echo "abc" is an alias for write stdout, "abc" with the notable difference that it abstracts away the output stream, which makes it easier to emulate for the JavaScript target or to make the compiler evaluate it at compile time.

Another notation for invoking functions is the so-called method invocation syntax: x.f(y, z). If there is only one argument, the () can be left out: x.f. So you can write x.len instead of x.len() or len(x). This way, there is no need for special getters or read-only properties.

The language clearly distinguishes between f and f() because functions are first-class citizens and can be passed around like in functional programming languages.

Finally, there are “generalized string literals” that introduce yet another piece of syntactical sugar: Instead of f("abc"), you can write f"abc" and then backslashed escape sequences like \n are not interpreted. This is designed for easy embedding of mini-languages like regular expressions: re"\w+" is much easier to write and read than re("\\w+").

Functions are called procedures in Nimrod and are declared with the proc keyword. Unlike in C and C++, parameters are read-only unless they are declared as var, in which case “pass by reference” is used (pass by reference is implemented with a hidden pointer).

Similar to Haskell, operators in Nimrod are simply sugar for functions. The following example declares a procedure named ++ that can take 1, 2, or 3 arguments. The value of y defaults to 1, and z defaults to 0. ++ modifies x and adds y and z to it.

1
2
3
4
5
6
7
8
proc `++`(x: var int; y: int = 1; z: int = 0) =
  x = x + y + z
 
var g = 70
# ++ can then be used like this:
++g
g ++ 7
g.`++`(10, 20)

Nimrod features the concept of a routine abstraction. A routine in Nimrod can be a procedure, method, template, macro, iterator, or a converter. All routines are invoked with the same syntax; thus, you cannot tell from the invocation which kind of routine it is. Similar to Lisp, Nimrod consciously decouples the syntax from the semantics to allow for powerful metaprogramming:

1
2
3
4
template `!=`(x, y: expr): expr = not (x == y)
 
# invocation is as if '!=' were a proc:
echo 34 != 33

A template is a simple form of a macro; the example shows how the unequals operator is defined in Nimrod. For metaprogramming, the type system is weakened and very general types like expr (expression), stmt (statement), or typedesc (type descriptor) are available. Note how the template is invoked like an operator.

Functional Programming with Nimrod

As I mentioned earlier, parameters that are not var are read-only, so Nimrod has a notion of immutability. Immutability is not deep, however: As soon as any kind of pointer is involved, the location that the pointer points to can be modified:

1
2
proc modify(n: ref Node) =
  n.value = 45

There are two kinds of pointers in Nimrod: ref and ptr. A ref is a pointer that is considered by the garbage collector  (traced), while a ptr is not (untraced). In general, ptr is used for interfacing with C/C++ or to implement weak references or simply for manual memory management. Nimrod is, after all, a systems programming language.

Similar to parameters are let variables. A let can be assigned only once. Of course, Nimrod also has variables, which use the var keyword:

1
2
3
4
let lv = stdin.readline
var vv = stdin.readline
vv = "abc" # valid, reassignment allowed
lv = "abc" # fails at compile time

let has been designed to emulate parameter passing semantics so that proc square(x: int): int = x*x can be emulated with:

1
2
3
4
template square(x: int): int =
  # ensure 'x' is only evaluated once:
  let y = x
  y * y

Finally, there is also const, which declares true constants. Constants can’t be assigned at all, not even once. Instead, their value has to be known at compile time. Nimrod has a sophisticated compile-time evaluation engine, so the following works:

1
2
3
4
5
6
7
8
9
proc mostSignificantBit(n: int): int =
  # naive algorithm:
  var m = n
  while m != 0:
    m = m shr 1  # 'shr' means "shift right"
    result += 1
  result -= 1
 
const msb3999 = mostSignificantBit(3999)

 

Procs that return a value have an implicitly declared result variable that represents the return value, so there is no need to write return result. result is Nimrod’s way to guarantee what is called return value optimization in C++.

Most variables are initialized implicitly in Nimrod and the initial value is a binary 0. Hence, result starts with 0, which is the natural start for counting. shr is Nimrod’s shift-right operator, a keyword has been chosen to avoid confusions as >> has the same precedence as >. The reason for this is that Nimrod supports user-defined operators and thus needs a simple rule of how operator precedence should be handled. The (simplified) rule is that the first character of the operator determines the precedence.

Now let’s get back to functional programming. Since procs are first-class citizens, defining map and filter is straightforward:

1
2
3
4
5
6
7
8
proc filter[T](a: openarray[T]; predicate: proc (x: T): bool): seq[T] =
  result = @[] # @[] constructs the empty seq
  for x in a:
    if predicate(x): result.add(x)
 
proc map[T, S](a: openarray[T]; fn: proc (x: T): S): seq[S] =
  newSeq(result, a.len)
  for i in 0 .. >a.len: result[i] = fn(a[i])

openarray is a special type that is only valid for parameters, it is compatible with arrays and sequences. A sequence (seq) is a growable array. openarray is implemented as a pointer to the first element and a length. Lists instead of arrays are, of course, possible, too, but relatively uncommon. (In this, Nimrod shows its imperative roots.)

Pattern Matching and Metaprogramming

Nimrod supports product and sum types with some twists: A sum type (also called an algebraic data type) is supported by Nimrod with a classical enum plus a so-called object variant. Let’s say we want to create a library for working with mathematical expressions such as x^2 + 5*x. It’s natural to define our data types like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
type
  FormulaKind = enum
    fkVar,        ## element is a variable like 'X'
    fkLit,        ## element is a literal like 0.1
    fkAdd,        ## element is an addition operation
    fkMul,        ## element is a multiplication operation
    fkExp         ## element is an exponentiation operation
 
type
  Formula = ref object
    case kind: FormulaKind
    of fkVar: name: string
    of fkLit: value: float
    of fkAdd, fkMul, fkExp: left, right: Formula

Nimrod’s enum is an old-school typesafe enum, as in Ada, without any fields. To avoid name clashes, it’s common to prefix the enum values with a two-letter abbreviation. The case part in the object declaration introduces a checked union. So the access of f.name will raise an exception if f.kind != fkVar. Everything in Nimrod, including object, is a value type, but I prefer reference semantics here for easier manipulation of formulas.

 

The obvious thing to do with a formula is to evaluate it. The following proc does just that. It requires a mapping varToVal from the variable name to its value:

1
2
3
4
5
6
7
8
9
from math import pow
 
proc evaluate(n: Formula; varToVal: proc (name: string): float): float =
  case n.kind
  of fkVar: varToVal(n.name)
  of fkLit: n.value
  of fkAdd: evaluate(n.left, varToVal) + evaluate(n.right, varToVal)
  of fkMul: evaluate(n.left, varToVal) * evaluate(n.right, varToVal)
  of fkExp: pow(evaluate(n.left, varToVal), evaluate(n.right, varToVal))

Now, to check whether a formula is a polynomial (to see if we can easily differentiate it, for instance), we can use the following code:

1
2
3
4
5
6
7
proc isPolyTerm(n: Formula): bool =
  n.kind == fkMul and n.left.kind == fkLit and (let e = n.right;
    e.kind == fkExp and e.left.kind == fkVar and e.right.kind == fkLit)
 
proc isPolynomial(n: Formula): bool =
  isPolyTerm(n) or
    (n.kind == fkAdd and isPolynomial(n.left) and isPolynomial(n.right))

isPolyTerm is quite ugly. Pattern matching would be much nicer. While Nimrod does not support elaborate pattern matching beyond case out-of-the-box, it’s quite easy to implement it thanks to the sophisticated macro system: For pattern matching, we define a macro =~ that constructs the and expression at compile time. Then the code can look like this:

1
proc isPolyTerm(n: Formula): bool = n =~ fkMul(fkLit, fkExp(fkVar, fkLit))

But this is still not as nice as it could be: The point of Nimrod’s macros is that they enable DSLs that make use of Nimrod’s lovely infix syntax. In fact, that’s a conscious design decision: Macros do not affect Nimrod’s syntax — they can only affect the semantics. This helps readability.

So here is what we really want to support:

1
proc isPolyTerm(n: Formula): bool = n =~ c * x^c

Where c matches any literal , x matches any variable, and the operators their corresponding formula kinds:

1
2
3
4
5
6
7
8
proc pat2kind(pattern: string): FormulaKind =
  case pattern
  of "^": fkExp
  of "*": fkMul
  of "+": fkAdd
  of "x": fkVar
  of "c": fkLit
  else:   fkVar # no error reporting for reasons of simplicity

Note that for reasons of simplicity, we don’t implement any kind of variable binding, so 1 * x^2 matches c * x^c as c is not bound to the literal 1 in any way. This form of variable binding is called unification. Without unification, the pattern matching support is still quite primitive. However, unification requires a notion of equality, and since many useful but different equality relations exist, pattern matching is not baked into the language.

So here’s the implementation of the =~ macro in all its glory:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import macros
 
proc matchAgainst(n, pattern: PNimrodNode): PNimrodNode {.compileTime.} =
  template `@`(current, field: expr): expr =
   newDotExpr(current, newIdentNode(astToStr(field)))
 
 template `==@`(n, pattern: expr): expr =
   newCall("==", n@kind, newIdentNode($pat2kind($pattern.ident)))
 
 case pattern.kind
 of CallNodes:
   result = newCall("and",
     n ==@ pattern[0],
     matchAgainst(n@left, pattern[1]))
   if pattern.len == 3:
     result = newCall("and", result.copy,
       matchAgainst(n@right, pattern[2]))
 of nnkIdent:
   result = n ==@ pattern
 of nnkPar:
   result = matchAgainst(n, pattern[0])
 else:
   error "invalid pattern"
 
macro `=~` (n: Formula; pattern: expr): bool =
 result = matchAgainst(n, pattern)

In Nimrod, a template is a declarative form of a macro, while a macro is imperative. It constructs the AST with the help of an API that can be found in the macros module, so that’s what line 1 imports. The final macro definition is in line 25 and it follows a fairly common approach: It delegates all of its work to a helper proc called matchAgainst, which constructs the resulting AST recursively. PNimrodNode is the type the Nimrod AST consists of. The Nimrod AST is structured quite similar to how we implemented Formula, except that every node can have a variable number of children. n[i] is the ith child.

The various function application syntaxes (prefix, infix, command) all map to the same AST structure kind(callee, arg1, arg2, ...), where kind describes the particular syntax. In matchAgainst, we treat every call syntax the same with the help of macros.CallNodes. We allow for a(b) and a(b, c) (line 15) call syntaxes and construct the AST representing an and expression with the help of two @ and ==@ templates.

n@field constructs the AST that corresponds to n.field and a ==@ b constructs a.kind == pat2kind(b). Line 18 deals with the case when the pattern only consists of a single identifier (nnkIdent), and line 20 supports () (nnkPar) so that grouping in a pattern is allowed.

As this example shows, metaprogramming is a good way to transform two lines of long ugly code to a short beautiful one-liner at the cost of 30 lines of ugly code dealing with AST transformations. However, the DSL we created here pays off as soon as there are more patterns to match against. It’s also reasonably easy to abstract the =~ pattern-match operator so that it operates on more than just the Formula data type. In fact, a library solution that also supports unification is in development.

Conclusion

Nimrod is open source software that runs on Windows, Linux, Mac OS, and BSD. In addition to generating C and JavaScript, it can generate C++ or Objective-C. The compiler can optionally enforce all kinds of error checking (bounds checking, overflow, etc.) and it can perform extensive optimizations.

It has an extensive standard library and many ported libraries. In addition, it has wrappers for most of the premier C libraries (including OLE, X, Glib/GTK, SQLite, etc.) and C-based languages (Lua and Tcl). If you’re searching for a systems programming language that provides higher-level constructs and extensive metaprogramming, but boils down to C, Nimrod might well be what you’re looking for.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s