First-class Function(Closure)

Naive Implementation

Create a closure(a.k.a. first-class function) requires a place to store captured variables. In LLVM, the best way is create a structure for such case:

m := ir.NewModule()

zero := constant.NewInt(types.I32, 0)
one := constant.NewInt(types.I32, 1)

captureStruct := m.NewTypeDef("id_capture", types.NewStruct(
	types.I32,
))
captureTyp := types.NewPointer(captureStruct)
idFn := m.NewFunc("id", types.I32, ir.NewParam("capture", captureTyp))
idB := idFn.NewBlock("")
v := idB.NewGetElementPtr(captureStruct, idFn.Params[0], zero, zero)
idB.NewRet(idB.NewLoad(types.I32, v))
idClosureTyp := m.NewTypeDef("id_closure", types.NewStruct(
	captureTyp,
	idFn.Type(),
))

mainFn := m.NewFunc("main", types.I32)
b := mainFn.NewBlock("")
// define a local variable `i`
i := b.NewAlloca(types.I32)
b.NewStore(constant.NewInt(types.I32, 10), i)
// use alloca at here to simplify code, in real case should be `malloc` or `gc_malloc`
captureInstance := b.NewAlloca(captureStruct)
ptrToCapture := b.NewGetElementPtr(captureStruct, captureInstance, zero, zero)
// capture variable
b.NewStore(b.NewLoad(types.I32, i), ptrToCapture)
// prepare closure
idClosure := b.NewAlloca(idClosureTyp)
ptrToCapturePtr := b.NewGetElementPtr(idClosureTyp, idClosure, zero, zero)
b.NewStore(captureInstance, ptrToCapturePtr)
ptrToFuncPtr := b.NewGetElementPtr(idClosureTyp, idClosure, zero, one)
b.NewStore(idFn, ptrToFuncPtr)
// assuming we transfer closure into another context
accessCapture := b.NewGetElementPtr(idClosureTyp, idClosure, zero, zero)
accessFunc := b.NewGetElementPtr(idClosureTyp, idClosure, zero, one)
result := b.NewCall(b.NewLoad(idFn.Type(), accessFunc), b.NewLoad(captureTyp, accessCapture))

printIntegerFormat := m.NewGlobalDef("tmp", irutil.NewCString("%d\n"))
pointerToString := b.NewGetElementPtr(types.NewArray(3, types.I8), printIntegerFormat, zero, zero)
// ignore printf
b.NewCall(printf, pointerToString, result)

b.NewRet(constant.NewInt(types.I32, 0))

This is a huge example, I understand it's hard to read, but concept is clean. It would generate below LLVM IR:

%id_capture = type { i32 }
%id_closure = type { %id_capture*, i32 (%id_capture*)* }

@tmp = global [3 x i8] c"%d\0A"

declare i32 @printf(i8* %format, ...)

define i32 @id(%id_capture* %capture) {
; <label>:0
	%1 = getelementptr %id_capture, %id_capture* %capture, i32 0, i32 0
	%2 = load i32, i32* %1
	ret i32 %2
}

define i32 @main() {
; <label>:0
	%1 = alloca i32
	store i32 10, i32* %1
	%2 = alloca %id_capture
	%3 = getelementptr %id_capture, %id_capture* %2, i32 0, i32 0
	%4 = load i32, i32* %1
	store i32 %4, i32* %3
	%5 = alloca %id_closure
	%6 = getelementptr %id_closure, %id_closure* %5, i32 0, i32 0
	store %id_capture* %2, %id_capture** %6
	%7 = getelementptr %id_closure, %id_closure* %5, i32 0, i32 1
	store i32 (%id_capture*)* @id, i32 (%id_capture*)** %7
	%8 = getelementptr %id_closure, %id_closure* %5, i32 0, i32 0
	%9 = getelementptr %id_closure, %id_closure* %5, i32 0, i32 1
	%10 = load i32 (%id_capture*)*, i32 (%id_capture*)** %9
	%11 = load %id_capture*, %id_capture** %8
	%12 = call i32 %10(%id_capture* %11)
	%13 = getelementptr [3 x i8], [3 x i8]* @tmp, i32 0, i32 0
	%14 = call i32 (i8*, ...) @printf(i8* %13, i32 %12)
	ret i32 0
}

Our id function captures an Integer and return it. To reach that id_capture was introduced for storing captured value. For passing whole closure in convenience, id_closure was introduced and stored capture structure and function pointer. When invoke a closure, get captured structure and function pointer from id_closure structure, then apply function with captured structure and additional arguments(if there's any). In this example omit the part about memory management, all structures allocated in the stack, this won't work in most real world case. Must notice this problem.

Improvements

The naive implementation is not good enough, we have several ways can improve it, but instead of implementing them I'm going to list what can we do:

  • Laziness function: Arity would be a thing in case
  • Access cross asynchronous model
  • If language has copy capture and reference capture, e.g. C++?
  • What if working with a GC?

Return Structure

When meet program that return structure by value, compiler has chance to remove such cloning. That's storing return structure into a reference passed by the caller. Which means, if we get:

struct Foo {
    // ...
};

Foo foo() {
    Foo f;
    // ...
    return f;
}

should compile to:

define void @foo(%Foo* noalias sret f) {
    // ...
}
  • sret hints this is a return value.
  • noalias hints other arguments won't point to the same place, LLVM optimizer might rely on such fact, so don't add it everywhere.

Add parameter attributes

Here is example shows how to add parameter attributes:

m := ir.NewModule()

fooTyp := m.NewTypeDef("Foo", types.NewStruct(
	types.I32,
))
retS := ir.NewParam("result", fooTyp)
retS.Attrs = append(retS.Attrs, enum.ParamAttrNoAlias)
retS.Attrs = append(retS.Attrs, enum.ParamAttrSRet)
m.NewFunc("foo", types.Void, retS)