Typing the Stack, and Why Tag-Free Garbage Collection Doesn’t Quite Work

28 11 2007

One of the most significant barriers to implementing a completely safe garbage collector lies with collecting the stack. Heap datastructures are relatively easy to walk — that is to say, that one can generate a walk function for each variant of heap datastructures, etc., in the language being collected. The stack is more difficult because it is not an explicit structure in the language. The other work I’ve read towards this (described in the post linked above) made rather awkward contortions to get around this fact. Wang and Appel CPS’ed and converted closures for their entire programs. Wang and Fluet side-step the issue by writing a safe collector for the internals of an interpreter – which isn’t really quite garbage collection in the normal sense, since they just have a Scheme stack structure defined in Cyclone which they then traverse. The harder problem is to actually type the machine-native stack.

  • The Essence of Compiling with Continuations: Towards this, my partners and I read the work defining A-Normal Form, or ANF. The authors examined a number of CPS based compilers to see what work they actually performed (many compilers use CPS as an intermediate representation to support some transformations). They found that to mitigate the code expansion of CPS transformations, and to get rid of some of the execution overhead from relatively trivial continuations, the compilers performed a number of additional reductions, pulling the continuation out of argument lists and leaving it implicit in a register, and a few other tricks. Adding these transformations together, it turns out that the sum total of these actions is essentially CPSing the source program, performing the normal reductions, and then un-CPSing the program. The authors then present an equivalent transformation directly on the source language, which is simpler to implement. This essentially accomplishes the same thing as using SSA – giving a name to every intermediate result of the computation.

This may help with typing the stack – we might be able to generate each frame’s type from the ANF of the program. Obviously some more work needs to be done here; how this might be done isn’t immediately obvious. I complained that we would have to mark the stack frames in some way, and to suggest this was not as terrible as I thought, our advisor suggested reading a couple papers by Benjamin Goldberg which pretty much wrapped up work on garbage collection without tags. In a nutshell, it really only works for languages lacking polymorphic operators.

  • Tag-Free Garbage Collection for Strongly Typed Programming Languages: This paper describes a way to sidestep tagging pointers on the stack into the heap (the notion was mentioned earlier by Appel, but implementation details were omitted). It does this basically by embedding frame-specific garbage collection routine pointers right after function calls. It then changes compilation so that returning from a routine goes an extra instruction later. Goldberg says that this works fine because the retl instruction on SPARC is only a pseudo-instruction which compiles to a jump with a hard-coded offset – I wonder how portable this is. This of course only works in a straightforward way for monomorphically typed languages. To extend the scheme to support polymorphic functions, they begin passing type-specific traversal functions to the frame-specific collection functions. This is then extended yet again to support higher-order functions – the overall strategy is to stash location-specific pointers in select locations to provide ready access to correctly-typed collector procedures in an efficient manner. There is also an addendum at the end about implementing collection for parallel languages. One interesting idea presented which isn’t explored in much depth is that there can be different collection procedures for different stages of each procedure. This allows for optimization such as not traversing uninitialized pointers early in the procedure, or not traversing dead local variables during collections after their last access in the function.
  • Polymorphic Type Reconstruction for Garbage Collection without Tags: This second paper revisits much of the content of the first paper, more clearly presenting the type reconstruction algorithm. It then addresses the fact that it is possible to have objects reachable from the garbage collection roots for which it is impossible to reconstruct the type, and proves that in the absence of polymorphic primitive operators, it is possible to reconstruct the value of any useful reachable object, because any object which matters will be passed to a primitive operation at some point, and types can be reconstructed from there. He later touches briefly on the issue of having a polymorphic equality operator (or any polymorphic primitive), and the problem with the operator needing to use data of the objects to do its job, without knowing the type, and mentions offhand that this is dealt with by implicitly passing a method dictionary to the operator, which amounts to a tag. This is still tag-free in the sense that there are no extra bits used up in the pointer, but not fully tag-free because the compiler still needs to pass explicit runtime hints about types.




Solaris and OS X Playing Together

4 11 2007

After winning my iPhone in a raffle, I was told by those who raffled it off that it was purchased before the price drop, and as a result, I could get the $100 credit.  What better to spend it on than upgrading to Leopard?  After all, a system with DTrace, DTrace-enabled Ruby, and a friendly GUI to boot is too good to pass up when you only have to pay $40 after taxes and credit. So I spent about 6 hours backing up my data to my Solaris box.  Why so long?  Well, I could not make Tiger and Solaris 10 play nicely with file sharing, so I had to resort to using scp to move 32GB of data.  Not pleasant.  I’d tried getting Tiger to mount NFS shares, but with little (read: no) success. So naturally one of the first things I did with my machine after installing Leopard, Firefox, and Adium was start poking around in the settings.  I quickly found a “Show Advanced” button in the Directory Utility.  And what appeared?  A tab for network mounts!  All I did was add a quick entry of the form you’d expect, start NFS on my Solaris machine, and voila!  Immediate access to my backups over NFS.  This is great – it means I can easily restore my iTunes collection to the fresh install of Leopard.  The ease with which this was done makes me suspect I missed something glaringly obvious in Tiger. Of course, a few things remain to be dealt with:

  • Supporting read/write access – just unifying the UID of my account on both machines.
  • Denying read access to other clients – probably just changing the permissions in my home directory.
  • Getting my Mac to dynamically resolve the hostname of my S10 box – we use DHCP on my home network, and I don’t want to change this.

Now, here’s to hoping this will get me to do more work on my ruby-on-rails project while I’m flying between coasts the next few weeks! 

A successful NFS configuration on OS X Leopard