In block-structured languages, a procedure is more than its code. The procedure must also incorporate the environment in which its code was declared. Without this environment, references to variables that occur free in its code could not be resolved.
A closure is a run-time data structure that represents a procedure by representing both its code and the environment in which that code was declared, or closed. Closure analysis is an interprocedural static analysis that seeks to avoid the creation of closures where possible, and to choose the most efficient representation for those closures that must be created.
Closure analysis is utterly trivial for C. Since all procedures are closed in the global environment, there is never any reason to create a closure.
In most block-structured languages, no procedure outlives the activation of the block in which it is declared. Most non-global procedures are called only from sites within their scope that name the procedure explicitly. There is no need to create a closure for such procedures, since the environment in which they were declared can be computed at each call site.
A closure does have to be created for a procedure that is passed as an argument. Most Pascal compilers create this closure when the procedure is passed. The environment part of the closure can be represented as a pointer into the stack.
In higher order languages that allow a procedure to outlive the block in which it is declared, closure analysis becomes a very important and nontrivial optimization. Without closure analysis, a closure must be allocated for every procedure. Since a procedure's free variables must live as long as the procedure, every variable that occurs free within the code for some procedure must be allocated on the heap.
Higher order closure analysis usually eliminates most heap allocation of closures and variables but requires cubic time. In practice, most of the benefit is achieved by optimizing only named procedures that are never referenced outside the operator position of a call. This is known as first order closure analysis; it suffices to eliminate heap allocation of closures and variables for all first order programs.
Passive closure analysis is content to analyze the program as it was written by the programmer. With passive analysis, and especially with first order passive analysis, a single lambda expression in an argument position can wreak havoc by closing over variables that name procedures for which no closure would otherwise have to be created. If the procedure in argument position has an unknown lifetime, all of the procedures to which it refers will have an unknown lifetime. All of the variables and procedures to which they refer will have an unknown lifetime, and so on. It is quite possible that all variables and procedures whose scope contains the escaping lambda expression will have to be allocated on the heap.
Lambda lifting is an optimization that actively rewrites a program in an attempt to reduce the heap allocation required when a procedure's lifetime cannot be statically determined.
When the compiler can find all of the call sites for a local procedure, and the compiler can determine that each of those sites can call only that one procedure, then that procedure is referred to as a known local procedure. In practice, most known local procedures can be found by a first order closure analysis.
The compiler can use nonstandard calling conventions to call a known local procedure, since it can rewrite each call site.
In particular, the compiler can change each call site to pass some or all of that procedure's free variables as extra arguments. This may allow the compiler to hoist its declaration to an outer scope. This hoisting of known local procedures is called lambda lifting.
Lambda lifting has several benefits:
The main disadvantage of lambda lifting is that it adds extra arguments to known local procedures. If too many extra arguments are added, the cost of passing the extra arguments and of saving them in stack frames may be greater than the benefits. When arguments are viewed as registers, as in Twobit, then this tradeoff becomes a matter of interprocedural register allocation.
Lambda lifting also incurs the compile-time cost of solving a flow equation to determine which arguments must be added. Most compilers that perform lambda lifting solve this equation only once, to find the arguments that must be added in order to lift known local procedures to the global environment. If the number of arguments appears excessive, a procedure will not be lifted at all.
This all-or-nothing approach can be suboptimal. Sometimes it is best to lift a known local procedure only part of the way to the outermost level of a program.
Olivier Danvy has considered lambda dropping, which follows lambda lifting by dropping some of the lifted procedures down into scopes that allow some of the added arguments to be converted back into free variables. Lambda dropping can be done without solving any more flow equations, but can result in passing arguments to procedures that don't use them.
Twobit uses incremental lambda lifting, which lifts a group of known local procedures only one scope at a time. The advantage of incremental lifting is that it can stopped at any intermediate scope.
The output of Pass 1 always contains a single outermost lambda expression. By the end of Pass 2, it is usually the case that all known local procedures have been hoisted to that outermost lambda expression. When this is not the case, it is because the compiler's heuristics have predicted that better code will result from hoisting some known local procedures only part of the way, or not at all.
One disadvantage of incremental lambda lifting is that a flow equation must be solved for each scope. The compile-time cost of incremental lambda lifting is not a major issue, however.
Another problem with lambda lifting in general is that a known procedure generally cannot be lifted past the binding of a variable that appears on the left hand side of an assignment. Lambda lifting in Twobit is therefore preceded by optimizations that remove all assignments to local variables. These optimizations begin with local source transformations.