A simpler scheme than SSA

The SSA complexities

Let’s suppose that you’re working on some small compiler-like project. At some point, you may start thinking about adding optimizations to the code generated by your compiler.

Then you realize that it’s not enough to just have some IR that is suitable for modifications. It’s important to apply only those optimizations that keep the code correct (or at least don’t make it more broken than it was before). Hopefully, we’re also making it faster or smaller along the way.

Most likely, you’re better off choosing something like the SSA form. The SSA form comes with a few complexities that you’ll have to deal with:

  1. SSA introduces a lot of “unique” slots*. You need to perform good dead store eliminations and register allocation later on to keep the number of slots minimal
  2. You either need to insert phi nodes or make basic blocks parametrized (so they get outer values as arguments)
  3. SSA alone is not enough. You need some extra metadata, like the number of SSA value usages (most often you want to check whether v.Uses == 1)

(*) A slot is an abstract term for a place where we store some value. It could be someplace inside a stack, or a register, or a virtual register if we’re talking about a VM with a potentially infinite amount of registers.

Basically, you need to insert some merge points for the SSA values. It can be done with phi nodes or with parametrized basic blocks. It’s not a big deal, but it can be messy for a small project that only wants to perform a few local optimizations. Maintaining the SSA invariant can end up being too messy.

In this article, I’ll try to describe a simpler approach that:

  • Keeps the allocated slots after the early compilation phase
  • Encodes both SSA unique value constraint with a single-use invariant (Uses=1)
  • This form is easy to build and maintain

Unique slots form

We divide all slots into 2 categories:

  • Normal slots
  • Unique slots

Unique slots have these properties:

  • They don’t escape their basic block
  • They’re only used once

That being said, both unique and normal slots have assigned ID that tells which memory location they occupy. The same memory location can be unique in one part of the block and non-unique in another.

Our abstract slot can look like this:

type Slot struct {
    ID     int  // allocated by the compiler
    Unique bool // inferred by the optimizer
}

Instructions operate on slots. Their arguments can have unique or non-unique slots:

type Instruction struct {
    Op   byte
    Args []Slot
}

In the code below, slot0 can be marked as unique:

return 130

# =>

load_int_const slot0 = 130
return slot0

slot0 is assigned exactly once, it’s read only once as well. It doesn’t leave its basic block too.

Here is an example of a block where we have a slot with the same ID marked as unique in several places:

println(130)
println(200)

# =>

load_int_const slot0 = 130
push_arg slot0
call println
load_int_const slot0 = 200
push_arg slot0
call println

slot0 is assigned twice, but both versions are unique: there is only one use after every assignment.

Marking the slots as unique

In reality, we need some extra information to infer that some slot is unique. Namely, we need to know where its lifetime ends. This can be done with pseudo varkill instructions.

The name “varkill” is borrowed from the Go compiler source code. It uses this pseudo node to record that the variable lifetime has ended.

When the compiler allocates the slots for intermediate results, it knows when their lifetime ends. This lifetime tomb can end up in the same basic block or somewhere else.

load_int_const slot0 = 130
push_arg slot0
call println
varkill slot0 # <- slot0 is free after this point
load_int_const slot0 = 200
push_arg slot0
call println
varkill slot0 # <- slot0 is free after this point

Here is another example, when temporary value outlives its block:

return x || y

# =>

  move slot0 = x
  jump_nz L0 slot0
  move slot0 = y
L0:
  return slot0
  varkill slot0

It’s important to include the trailing varkill pseudo ops after the basic block exit instruction. So, the basic blocks for the code above can look like this:

b0:
  move slot0 = x
  jump_nz L0 slot0
  move slot0 = y

b1: # L0
  return slot0
  varkill slot0

Note that varkill is a part of the b1 block.

To compute the unique slots within a block, we need to traverse it only once.

  • Go from the end of a basic block
  • Put all varkill IDs into a map
  • For every recorded ID, collect the number of reads
  • When reached the recorded ID write, check the number of reads
    • If the number of reads is 0, this is a dead store
    • If the number of reads is 1, mark this slot and its usage as unique
    • Otherwise it’s not a unique slot, remove ID from the map
  • When removing a var or marking it unique, an associated varkill should be removed

You don’t really need a real map here. It’s possible to write a zero alloc uniqs marking.

After the first round of optimizations, we need to re-compute the unique slots.

To avoid doing redundant re-calculations, we can skip blocks that don’t have any varkills.

This means we need to store this number of varkills counter somewhere along the basic block. It’s also possible to have a “dirty” flag inside a block that tells whether it may have changed after the last scanning.

type Block struct {
    Body     []Instruction
    Varkills int
    Dirty    bool
}

Strictly speaking, explicit block objects are not required. All metadata can be stored separately, outside of the blocks. It is, however, more convenient to work with explicit block objects.

Where exactly to insert a varkill

For temporary values that are results of expression computation, it’s simple. These are allocated along with the computations. The compiler knows then the expression boundary is over, so it can insert the tombstones right there.

For local variables, these life scopes can be computed using their lexical scoping.

// x slot is assigned when if statement <init> clause is being compiled
if x := f(); x != nil {
    return x
}
// After the if statement is compiled, x variable is no longer alive,
// a varkill for the allocated slot can be inserted.

{
    // This x variable is different from the previous one.
    // In this case, the slot for x can be marked unique.
    x := 10
    println(x)
    // When this lexical block ends, x is no longer alive.
}

For some simple cases, we can insert a varkill at the point of the variable reassignment. This is a more tricky case though: it’s better to be conservative here and insert fewer markers than feeding the incorrect information to the optimizer.

{
    x := 10
    println(x)
    x = 20 // re-assigned: a suitable place for a varkill
    println(x)
}

// =>

// load_int_const slot0 = 10
// push_arg slot0
// call println
// varkill slot0
// load_int_const slot0 = 20
// push_arg slot0
// call println

We can’t analyze all variables, but we can still get some benefits and perform safe optimizations without compromising the generated code correctness.