High level types

Structure

Structure is quite common and basic type in programming language. Here focus on how to create an equal LLVM mapping for structure.

LLVM has the concept about structure type, but structure type in LLVM didn't have field name, how to get/set fields of structure would be a thing. Let's assume a structure: foo has a field named x with type i32. Below code shows how to access x.

zero := constant.NewInt(types.I32, 0)
m := ir.NewModule()

foo := m.NewTypeDef("foo", types.NewStruct(types.I32))

main := m.NewFunc("main", types.I32)
b := main.NewBlock("")
fooInstance := b.NewAlloca(foo)
fieldX := b.NewGetElementPtr(foo, fooInstance, zero, zero)
// now `fieldX` is a pointer to field `x` of `foo`.
b.NewStore(constant.NewInt(types.I32, 10), fieldX)
b.NewLoad(types.I32, fieldX)
b.NewRet(zero)

Get element pointer(a.k.a. GEP) is the point here. It computes a pointer to any structure member with no overhead. Then load and store can work on it.

To get more information about GEP can goto Lang Ref.

Array

Arrays are a sequence of elements of the same types in LLVM. And in LLVM, array size is a constant that never changes. In the following text, I will present how to define a global array, and how to operate it.

As usual, we need a module and some helping code

mod := ir.NewModule()
printf := PrintfPlugin(mod)
formatString := "array_def[%d]: %d\n"
fmtStr := mod.NewGlobalDef("x", irutil.NewCString(formatString))
main := mod.NewFunc("main", types.I32)
mainB := main.NewBlock("")
ptrToStr := mainB.NewGetElementPtr(types.NewArray(uint64(len(formatString)), types.I8), fmtStr, CI32(0), CI32(0))

Now we create a global array

arrTy := types.NewArray(5, types.I8)
arrayDef := mod.NewGlobalDef("array_def", constant.NewArray(arrTy, CI8(1), CI8(2), CI8(3), CI8(4), CI8(5)))

Then we can load it into stack, in the following code, we

  1. extract value from loaded array
  2. insert value into loaded array
arr := mainB.NewLoad(arrTy, arrayDef)
for i := 0; i < 5; i++ {
	mainB.NewCall(printf, ptrToStr, CI32(int64(i)), mainB.NewExtractValue(arr, uint64(i)))
	mainB.NewInsertValue(arr, CI8(0), uint64(i))
	mainB.NewCall(printf, ptrToStr, CI32(int64(i)), mainB.NewExtractValue(arr, uint64(i)))
}

Another way to get element and update it is using get element pointer(a.k.a GEP)

for i := 0; i < 5; i++ {
	pToElem := mainB.NewGetElementPtr(arrTy, arrayDef, CI32(0), CI32(int64(i)))
	mainB.NewCall(printf, ptrToStr, CI32(int64(i)), mainB.NewLoad(types.I8, pToElem))
	mainB.NewStore(CI8(0), pToElem)
	mainB.NewCall(printf, ptrToStr, CI32(int64(i)), mainB.NewLoad(types.I8, pToElem))
}

compare with previous result, you will find previous insert didn't work as expected. So this is the key point about insert value. It don't update your aggregate value, it do thing like immutable programming: take and return new one. Thus, we should write

for i := 0; i < 5; i++ {
	mainB.NewCall(printf, ptrToStr, CI32(int64(i)), mainB.NewExtractValue(arr, uint64(i)))
	newArr := mainB.NewInsertValue(arr, CI8(0), uint64(i))
	mainB.NewCall(printf, ptrToStr, CI32(int64(i)), mainB.NewExtractValue(newArr, uint64(i)))
}

Now you should understand array enough to compile your language into this aggregate type!

Algebra Data Type

Algebra data type is a common concept in functional programming language. For example, Haskell can write:

data Expr =
  EInt Int
  | EBool Bool
  | EString String

How to express such concept in LLVM? The idea was selecting the biggest size in all variants and use it as the size of this type. Do bitcast when need to access the actual value. Here is the code:

mod := ir.NewModule()

typeExpr := mod.NewTypeDef("Expr", types.NewStruct(
	types.I8,
	types.NewArray(8, types.I8),
))
// variant tag 0
typeExprInt := mod.NewTypeDef("EInt", types.NewStruct(
	types.I8,
	types.I32,
))
// variant tag 1
mod.NewTypeDef("EBool", types.NewStruct(
	types.I8,
	types.I1,
))
// variant tag 2
mod.NewTypeDef("EString", types.NewStruct(
	types.I8,
	types.NewPointer(types.I8),
))

main := mod.NewFunc(
	"main",
	types.I32,
)
b := main.NewBlock("")
exprInstance := b.NewAlloca(typeExpr)
exprTag := b.NewGetElementPtr(typeExpr, exprInstance, constI32(0), constI32(0))
// set tag to 0
b.NewStore(constI8(0), exprTag)
exprIntInstance := b.NewBitCast(exprInstance, types.NewPointer(typeExprInt))
exprIntValue := b.NewGetElementPtr(typeExprInt, exprIntInstance, constI32(0), constI32(1))
b.NewStore(constI32(2), exprIntValue)
b.NewRet(constI32(0))

These code produce:

%Expr = type { i8, [8 x i8] }
%EInt = type { i8, i32 }
%EBool = type { i8, i1 }
%EString = type { i8, i8* }

define i32 @main() {
; <label>:0
	%1 = alloca %Expr
	%2 = getelementptr %Expr, %Expr* %1, i32 0, i32 0
	store i8 0, i8* %2
	%3 = bitcast %Expr* %1 to %EInt*
	%4 = getelementptr %EInt, %EInt* %3, i32 0, i32 1
	store i32 2, i32* %4
	ret i32 0
}

tag in each variant is important, for example, pattern matching in Haskell looks like:

case expr of
  EInt i -> "int"
  EBool b -> "bool"
  EString s -> "string"

case expression would need tag to distinguish variant.