Most of the optimizations in Pass 2 are independent of the target architecture. Those that are sensitive to the code generator or target are controlled by parameters, customizable policies, and compiler switches:
The number of
The front end of Twobit currently has no knowledge of specialized
representations or registers. In particular, Twobit boxes all
numbers and uses subroutine calls for all
The IEEE standard for Scheme allows any procedure to be redefined. Since most operations are procedures in Scheme, this prevents the compiler from generating inline code for most operations.
A compiler switch allows the programmer to promise that none of the usual primitive procedures will be redefined, which allows Twobit to compile them specially. When this switch is true:
+
and performs some constant folding.
A table of primitive operations is supplied as a parameter. For each primitive operation, this table specifies:
Except for the assembler, which is target-dependent, Twobit has no further knowledge of primitive data types and operations.
It is always possible to lift all known local procedures to the outermost lambda expression. This is not always desirable, however, because the formal parameters that must be added represent extra registers that must be saved across a non-tail-recursive call. This is a target-dependent tradeoff, which should be evaluated by the interprocedural register allocator.
At each stage of incremental lambda lifting,
after the flow equations have been
solved but before the source code is transformed, Twobit calls
POLICY:LIFT?
, a predicate that determines whether
lifting will be performed.
POLICY:LIFT?
is given three arguments to examine:
Many factors could be taken into account when deciding whether to lift a group of known local procedures. Here are just a few:
For example, Twobit currently transforms
(define (sieve n) (define (new-generator next-prime prime) (let ((psquare (* prime prime))) (lambda () (define (loop t) (if (or (< t psquare) (not (zero? (remainder t prime)))) t (loop (next-prime)))) (loop (next-prime))))) ...)into
((lambda () (define .sieve_3 (lambda (.n_5) ...)) (define .new-generator_6 (lambda (.next-prime_10 .prime_10) ((lambda (.psquare_11) (define .loop_13 (lambda (.t_15) (if ((lambda (.T93_16) (if .T93_16 .T93_16 (not (zero? (remainder .t_15 .prime_10))))) (< .t_15 .psquare_11)) .t_15 (.loop_13 (.next-prime_10))))) (lambda () (.loop_13 (.next-prime_10)))) (* .prime_10 .prime_10)))) ...))Here the known local procedure
.loop_13
has been
lifted out of only one lambda expression. Lifting to the enclosing
lambda expression would add .psquare_11
as another
argument, and there would be no benefit at all: A closure must be
created anyway for the body of the lambda expression that defines
.loop_13
, and .loop_13
can share that
closure at no cost.
The policy used for incremental lambda lifting does not appear to be critical in practice. It is usually advantageous to lift all known local procedures to the outermost level, as is done by conventional lambda lifting.
Since the compiler can by definition locate all calls to a known local procedure, it can reorder the procedure's formal parameters and rewrite each call to match the new order. In Twobit, the formal parameters are just names for the registers that are live at entry to the procedure, so reordering the parameters amounts to interprocedural register targeting.
Register targeting is easier than interprocedural register allocation, because register allocation involves decisions concerning which values should be kept in registers and how long they should be kept there. Twobit simply assumes that the arguments to known procedures will be kept in registers for the duration of each call, so all there is to do is to select a particular register for each argument. The goal of register targeting is to minimize register shuffling when one procedure calls another.
Twobit performs interprocedural register targeting by reordering
the formal parameters that it adds to known local procedures during
lambda lifting.
The ordering is chosen by by a
If the set A of formal parameters to be added to f
is equal to
the set of formal parameters for the lambda expression L
that defines f
, then Twobit adds the new parameters to the
beginning of the parameter list for f
, after sorting them
so that, for a call to f
from the body of L, these parameters
will already reside in the correct registers.
If A is a proper subset of the formal parameters of L, and this is
the first lifting of f
, then the
parameters in A are allocated to the positions they occupy in
the formal parameter list of L, and the original formal
parameters of f
are
x
of L is not an element of A but
occurs as an actual parameter in some call to f
, then the
corresponding formal parameter of f
should be placed in the
position left vacant by x
.
These heuristics reduce register shuffling for calls
between mutually recursive procedures, including loop nests.
Several
(integrate-usual-procedures #t)
tells Twobit to
assume that Scheme's usual procedures will not be redefined,
which allows Twobit to generate inline code for them.
(benchmark-mode #t)
tells Twobit to compile the
(define (f ...) ...)
syntax for a top-level
definition as if it were
(define f (let () (define (f ...) ...) f))This compiler switch has no effect on the
(define f ...)
syntax for top-level definitions,
nor does it affect the interpretation of internal definitions.
(local-optimizations #t)
activates Twobit's Pass 5.
The first two switches are taken from MacScheme. They allow a programmer to assert that preserving the precise semantics of Scheme when certain top-level procedures are dynamically redefined is less important that generating efficient code.
The benefits of
benchmark-mode
are explained by
Andrew W Appel. Loop headers in lambda-calculus or CPS. Princeton University CS-TR-460-94, June 1994.