Typing the Stack, and Why Tag-Free Garbage Collection Doesn’t Quite Work

28 11 2007

One of the most significant barriers to implementing a completely safe garbage collector lies with collecting the stack. Heap datastructures are relatively easy to walk — that is to say, that one can generate a walk function for each variant of heap datastructures, etc., in the language being collected. The stack is more difficult because it is not an explicit structure in the language. The other work I’ve read towards this (described in the post linked above) made rather awkward contortions to get around this fact. Wang and Appel CPS’ed and converted closures for their entire programs. Wang and Fluet side-step the issue by writing a safe collector for the internals of an interpreter – which isn’t really quite garbage collection in the normal sense, since they just have a Scheme stack structure defined in Cyclone which they then traverse. The harder problem is to actually type the machine-native stack.

  • The Essence of Compiling with Continuations: Towards this, my partners and I read the work defining A-Normal Form, or ANF. The authors examined a number of CPS based compilers to see what work they actually performed (many compilers use CPS as an intermediate representation to support some transformations). They found that to mitigate the code expansion of CPS transformations, and to get rid of some of the execution overhead from relatively trivial continuations, the compilers performed a number of additional reductions, pulling the continuation out of argument lists and leaving it implicit in a register, and a few other tricks. Adding these transformations together, it turns out that the sum total of these actions is essentially CPSing the source program, performing the normal reductions, and then un-CPSing the program. The authors then present an equivalent transformation directly on the source language, which is simpler to implement. This essentially accomplishes the same thing as using SSA – giving a name to every intermediate result of the computation.

This may help with typing the stack – we might be able to generate each frame’s type from the ANF of the program. Obviously some more work needs to be done here; how this might be done isn’t immediately obvious. I complained that we would have to mark the stack frames in some way, and to suggest this was not as terrible as I thought, our advisor suggested reading a couple papers by Benjamin Goldberg which pretty much wrapped up work on garbage collection without tags. In a nutshell, it really only works for languages lacking polymorphic operators.

  • Tag-Free Garbage Collection for Strongly Typed Programming Languages: This paper describes a way to sidestep tagging pointers on the stack into the heap (the notion was mentioned earlier by Appel, but implementation details were omitted). It does this basically by embedding frame-specific garbage collection routine pointers right after function calls. It then changes compilation so that returning from a routine goes an extra instruction later. Goldberg says that this works fine because the retl instruction on SPARC is only a pseudo-instruction which compiles to a jump with a hard-coded offset – I wonder how portable this is. This of course only works in a straightforward way for monomorphically typed languages. To extend the scheme to support polymorphic functions, they begin passing type-specific traversal functions to the frame-specific collection functions. This is then extended yet again to support higher-order functions – the overall strategy is to stash location-specific pointers in select locations to provide ready access to correctly-typed collector procedures in an efficient manner. There is also an addendum at the end about implementing collection for parallel languages. One interesting idea presented which isn’t explored in much depth is that there can be different collection procedures for different stages of each procedure. This allows for optimization such as not traversing uninitialized pointers early in the procedure, or not traversing dead local variables during collections after their last access in the function.
  • Polymorphic Type Reconstruction for Garbage Collection without Tags: This second paper revisits much of the content of the first paper, more clearly presenting the type reconstruction algorithm. It then addresses the fact that it is possible to have objects reachable from the garbage collection roots for which it is impossible to reconstruct the type, and proves that in the absence of polymorphic primitive operators, it is possible to reconstruct the value of any useful reachable object, because any object which matters will be passed to a primitive operation at some point, and types can be reconstructed from there. He later touches briefly on the issue of having a polymorphic equality operator (or any polymorphic primitive), and the problem with the operator needing to use data of the objects to do its job, without knowing the type, and mentions offhand that this is dealt with by implicitly passing a method dictionary to the operator, which amounts to a tag. This is still tag-free in the sense that there are no extra bits used up in the pointer, but not fully tag-free because the compiler still needs to pass explicit runtime hints about types.




Provably Correct Garbage Collection

25 10 2007

Working towards my honors thesis, my partners and I have read several interesting papers on using type systems to prove correctness (safety) of garbage collection implementations:

  • Type-Preserving Garbage Collectors: In this paper, Wang and Appel present a way of writing garbage collectors in the language being collected, and relying on a statically checked type system to prove safety of their collector. They implement a copying collector as a function written in the ML-like language they are collecting, basically by parameterizing all types by memory regions. Each data type and language function are parametrized over a single region as a type. The garbage collection routines are a basic collector plus per-datatype collector routines parameterized over a from-space and a to-space. Each copy function deep-copies a structure in the from-space, and returns a copy located in the to-space. They address pointer sharing by providing an implementation with forwarding pointers (for which they perform an odd version of subtyping to guarantee the user program can’t modify the forwarding pointers in objects, since the collector is being written in the language being collected). To guarantee that the collector returns no pointers to data in the from-space, they introduce a language construct ‘only’ which takes one or more regions and a body, and executes the body with only the specified regions in scope. The problem with this is that if you forget to use the object, you could actually return the original The paper is fairly well-written, and the work is presented in a series of progressively more complete garbage collector functions. A fair amount of time is spent addressing particulars of their implementation, because they make a number of simplifying assumptions. The language supported is first-order only, and is written in CPS with closures converted to datastructures. Also presented are performance results, which are interesting, but largely inconclusive because the system they present is intended only to show that it is possible to write a safe garbage collector, and not meant to be a practical system. Many optimizations are mentioned throughout the paper as future work.
  • Implementation and Performance Evaluation of a Safe Runtime System in Cyclone: This paper by Fluet and Wang aims to show that a similar approach to that above can actually be done with reasonable performance. They implement a Scheme interpreter in Cyclone (a safe dialect of C). They also build a copying collector, using Cyclone’s region primitives to do so. The approach is relatively straightforward (and largely similar to the approach above) with one exception. Instead of a safe upcast with a construct which prevents later downcasts, they use linear types (unique pointers) along with some under-specified functions to create a sequence of region types, to support forwarding pointers. They compare their interpreter running some standard Scheme benchmarks to a couple others. Their interpreter running their garbage collector actually slightly outperforms their interpreter using the standard Cyclone conservative collector – consistently. That is fairly impressive, and shows the overhead for safe collection alone is not a significant bottleneck. Comparisons to MzScheme however are not so great – with the exception of the continuation-based benchmark, MzScheme far outperforms their implementation, by a factor of 2 to 3. And this was done in 2004, before MzScheme had a JIT compiler. This may be a result of the structure of their interpreter; it is a very basic interpreter, which performs a predefined number of steps on an abstract machine structure, checks to see if a collection is necessary, and repeats. They perform a comparison however of the total amount of unsafe code in each implementation, and come in at about 1/6th the amount of unsafe code as the next ‘cleanest’ implementation. Most of this apparently is from a highly optimized version of malloc() used to implement the region primitives.

Both papers write their main programs in slighly odd ways – the Wang/Appel paper does CPS and closure conversion to make stacks and environments safely traversable, and the Fluet/Wang paper sort of cheats by writing a straight interpreter in a separate language. Not to degrade the latter – they still write a garbage collector which is (itself) free of unsafe code, which wonderful. But the constructions as state machines and the CPS and closure conversions definitely hurt performance. They’re necessary, however, in order to safely collect stacks and environments.  My thesis adviser told my partners and me that he may be aware of a way to walk these structures without the efficiency loss of the explicit structures used in these papers.  So next on the block: Continuations from Generalized Stack Inspection.  That, and figuring out just how the two mysteriously underspecified primitives in the Cyclone paper really work (time to read the corresponding technical report), and contemplate the problems of forgetting to use the ‘only’ construct in the first paper.