Twobit's code generator requires a stack frame whenever:
When should these stack frames be allocated and deallocated?
Some compilers allocate a stack frame upon entry to every procedure. This is inefficient for procedures that may contain non-tail-recursive calls but seldom perform them.
Some compilers allocate a separate stack frame for every non-tail-recursive call. This is inefficient for procedures that perform many such calls.
Twobit performs a simple optimization to reduce the overhead of creating and initializing stack frames. The goals of this optimization are:
It is not always possible to achieve both of these goals. Twobit always achieves the first goal, but it sometimes creates an unnecessary stack frame. Chez Scheme always achieves the second goal but does not always achieve the first.
In the subset of Scheme that Twobit uses as its intermediate form, an expression represents straight-line code unless it contains a conditional expression or a non-tail-recursive procedure call. If non-tail calls are regarded as basic blocks with unknown effects, then the body of a lambda expression becomes an acyclic control flow graph with multiple exit points. The number of exit points is equal to 1 plus the number of conditional expressions that occur in a tail-recursive position. Fork points arise from conditional expressions, and come right after the test.
For example, the doubly recursive Fibonacci procedure
(define fib (lambda (n) (if (< n 2) n (+ (fib (- n 1)) (fib (- n 2))))))
contains one fork point and two exit points. Its control flow graph is
---------- | entry | ---------- | | * * * * * * * / * \ / \ / \ / \ ********* ***************** * n * * (fib (- n 1)) * ********* ***************** | ***************** * (fib (- n 2)) * ***************** | ***************** * + * *****************
There are only two kinds of positions within a control flow graph at which Twobit might allocate a stack frame:
Let's call these positions the save positions. If a stack frame is needed anywhere in the control flow graph, then at least one save position will dominate all places where a stack frame is needed. These dominating save positions are totally ordered by the control flow graph. Twobit will allocate a stack frame at the last save position that dominates all places where a stack frame is needed.
If no stack frame is needed, then Twobit will not allocate one.
For the Fibonacci example, Twobit will not allocate a stack frame for the base case. In the recursive case, Twobit will allocate a stack frame immediately after the test has determined that the recursive branch will be taken.
Twobit generates a save
instruction,
which allocates a stack frame, at entry.
The internal representation of the operand for this
save
instruction is a mutable data structure
that the code generator uses to keep track of the maximum
number of data slots within the frame that are live at any
time.
This maximum is initialized to -1.
Twobit generates code as though a stack frame has been
allocated.
Any use of the stack frame will ensure a non-negative value
for the operand of the nearest save
instruction that
dominates the use.
When the code generator reaches a new save position, it
checks the mutable data structure to see whether
a stack frame has been required to that point.
If so, then it does not generate a save
instruction
at that save position, but continues to use the data structure
for the previously generated save
instruction.
If no stack frame has been required, however,
then it generates another save
instruction,
with a new mutable data structure as its operand.
A pop
instruction is generated for returns
and tail-recursive calls. Its operand matches the operand
of the nearest save
instruction that dominates it.
The local optimizer
and assembler
ignore save
and pop
instructions
whose operand is -1.
Twobit's frame optimization is conservative in the sense that no procedure activation ever allocates more than one frame. Sometimes a stack frame will be allocated when none is actually needed, however. Consider
(define f (lambda (x y z) (if (> x 0) (show x)) (if (> y 0) (show y)) (if (> z 0) (show z)) (list x y z)))
Twobit will allocate a stack frame for f
even if all of its arguments are zero.
Chez Scheme would not.
On the other hand, Chez Scheme would allocate three
separate stack frames for f
if
all of its arguments are positive.
Twobit would allocate only one.
The algorithm used in Chez Scheme is a form of total redundacy elimination for register saves. For more information see
Robert G Burger, Oscar Waddell, and R. Kent Dybvig. Register allocation using lazy saves, eager restores, and greedy shuffling. In 1995 ACM Conference on Programming Language Design and Implementation, June 1995, pages 130-138.
Frame optimization in Twobit is similar to partial redundancy elimination, but is not quite the same. For example, Twobit will generate code to allocate a stack frame within each arm of the conditional expression in
(define f (lambda (n) (if (< n 2) (begin (show n) n) (+ (f (- n 1)) (f (- n 2))))))
Partial redundancy elimination would reduce code space by hoisting this code to a common dominator.