Provably Correct Garbage Collection

25 10 2007

Working towards my honors thesis, my partners and I have read several interesting papers on using type systems to prove correctness (safety) of garbage collection implementations:

  • Type-Preserving Garbage Collectors: In this paper, Wang and Appel present a way of writing garbage collectors in the language being collected, and relying on a statically checked type system to prove safety of their collector. They implement a copying collector as a function written in the ML-like language they are collecting, basically by parameterizing all types by memory regions. Each data type and language function are parametrized over a single region as a type. The garbage collection routines are a basic collector plus per-datatype collector routines parameterized over a from-space and a to-space. Each copy function deep-copies a structure in the from-space, and returns a copy located in the to-space. They address pointer sharing by providing an implementation with forwarding pointers (for which they perform an odd version of subtyping to guarantee the user program can’t modify the forwarding pointers in objects, since the collector is being written in the language being collected). To guarantee that the collector returns no pointers to data in the from-space, they introduce a language construct ‘only’ which takes one or more regions and a body, and executes the body with only the specified regions in scope. The problem with this is that if you forget to use the object, you could actually return the original The paper is fairly well-written, and the work is presented in a series of progressively more complete garbage collector functions. A fair amount of time is spent addressing particulars of their implementation, because they make a number of simplifying assumptions. The language supported is first-order only, and is written in CPS with closures converted to datastructures. Also presented are performance results, which are interesting, but largely inconclusive because the system they present is intended only to show that it is possible to write a safe garbage collector, and not meant to be a practical system. Many optimizations are mentioned throughout the paper as future work.
  • Implementation and Performance Evaluation of a Safe Runtime System in Cyclone: This paper by Fluet and Wang aims to show that a similar approach to that above can actually be done with reasonable performance. They implement a Scheme interpreter in Cyclone (a safe dialect of C). They also build a copying collector, using Cyclone’s region primitives to do so. The approach is relatively straightforward (and largely similar to the approach above) with one exception. Instead of a safe upcast with a construct which prevents later downcasts, they use linear types (unique pointers) along with some under-specified functions to create a sequence of region types, to support forwarding pointers. They compare their interpreter running some standard Scheme benchmarks to a couple others. Their interpreter running their garbage collector actually slightly outperforms their interpreter using the standard Cyclone conservative collector – consistently. That is fairly impressive, and shows the overhead for safe collection alone is not a significant bottleneck. Comparisons to MzScheme however are not so great – with the exception of the continuation-based benchmark, MzScheme far outperforms their implementation, by a factor of 2 to 3. And this was done in 2004, before MzScheme had a JIT compiler. This may be a result of the structure of their interpreter; it is a very basic interpreter, which performs a predefined number of steps on an abstract machine structure, checks to see if a collection is necessary, and repeats. They perform a comparison however of the total amount of unsafe code in each implementation, and come in at about 1/6th the amount of unsafe code as the next ‘cleanest’ implementation. Most of this apparently is from a highly optimized version of malloc() used to implement the region primitives.

Both papers write their main programs in slighly odd ways – the Wang/Appel paper does CPS and closure conversion to make stacks and environments safely traversable, and the Fluet/Wang paper sort of cheats by writing a straight interpreter in a separate language. Not to degrade the latter – they still write a garbage collector which is (itself) free of unsafe code, which wonderful. But the constructions as state machines and the CPS and closure conversions definitely hurt performance. They’re necessary, however, in order to safely collect stacks and environments.  My thesis adviser told my partners and me that he may be aware of a way to walk these structures without the efficiency loss of the explicit structures used in these papers.  So next on the block: Continuations from Generalized Stack Inspection.  That, and figuring out just how the two mysteriously underspecified primitives in the Cyclone paper really work (time to read the corresponding technical report), and contemplate the problems of forgetting to use the ‘only’ construct in the first paper.

Advertisements




Preparing to Build a Rails App

24 10 2007

I’ve been thinking for some time about building a system to organize all of the papers I need (or just want) to read, whether I’ve read them, etc., along with some indexing and tagging system so I can search for papers both by topic, and by my motivation for reading them.

As I’ve been enjoying playing with Ruby lately while working on crawling online postings of course textbooks for Mocha, and there’s been so much hullabaloo about Ruby on Rails, I’ve decided to give Rails a shot.

The instructions given for getting up and running are pretty straightforward, and short. Downloading, building, and installing Ruby was no problem. Downloading and installing RubyGems was also without incident. Pretty smooth so far. The next step in the instructions looks very straightforward: run gem install rails –include-dependencies. This yielded an error about not finding the rails package, which seems like a pretty big problem for one of Ruby’s flagship apps. Googling revealed that for some reason, you need to purge and regenerate your package cache sometimes before this works. So I did that. Still, no luck, but the error this time is that the package server refused my connection. More googling yielded the helpful suggestion “Your error message seems like a Rubyforge problem. It happens occasionally. Just try it again later.” For something that bills itself as being akin to apt-get, it has a long way to go when a user’s first interaction is:

[colin@sliver:~/Desktop/rubygems-0.9.4]$ gem install rails --include-dependencies
Bulk updating Gem source index for: http://gems.rubyforge.org
ERROR:  While executing gem ... (Gem::GemNotFoundException)
    Could not find rails (> 0) in any repository
[colin@sliver:~/Desktop/rubygems-0.9.4]$ gem env
RubyGems Environment:
  - VERSION: 0.9.4 (0.9.4)
  - INSTALLATION DIRECTORY: /usr/local/lib/ruby/gems/1.8
  - GEM PATH:
     - /usr/local/lib/ruby/gems/1.8
  - REMOTE SOURCES:
     - http://gems.rubyforge.org
[colin@sliver:~/Desktop/rubygems-0.9.4]$ rm -f /usr/local/lib/ruby/gems/1.8/source_cache
[colin@sliver:~/Desktop/rubygems-0.9.4]$ gem install rails --include-dependencies
Bulk updating Gem source index for: http://gems.rubyforge.org
ERROR:  While executing gem ... (Gem::GemNotFoundException)
    Could not find rails (> 0) in any repository
[colin@sliver:~/Desktop/rubygems-0.9.4]$ gem update
Updating installed gems...
Bulk updating Gem source index for: http://gems.rubyforge.org
Gems: [] updated
[colin@sliver:~/Desktop/rubygems-0.9.4]$ gem install rails --include-dependencies
ERROR:  While executing gem ... (Gem::RemoteFetcher::FetchError)
    Errno::ECONNREFUSED reading http://gems.rubyforge.org/gems/rake-0.7.3.gem
[colin@sliver:~/Desktop/rubygems-0.9.4]$ gem install rails --include-dependencies
ERROR:  While executing gem ... (Gem::RemoteFetcher::FetchError)
    Errno::ECONNREFUSED reading http://gems.rubyforge.org/gems/actionwebservice- 1.2.5.gem
[colin@sliver:~/Desktop/rubygems-0.9.4]$ gem install rails --include-dependencies
ERROR:  While executing gem ... (Gem::RemoteFetcher::FetchError)
    Errno::ECONNREFUSED reading http://gems.rubyforge.org/gems/rails-1.2.5.gem
[colin@sliver:~/Desktop/rubygems-0.9.4]$ gem install rails --include-dependencies
ERROR:  While executing gem ... (Gem::RemoteFetcher::FetchError)
    Errno::ECONNREFUSED reading http://gems.rubyforge.org/gems/rails-1.2.5.gem
[colin@sliver:~/Desktop/rubygems-0.9.4]$ gem install rails --include-dependencies
Successfully installed rails-1.2.5
[colin@sliver:~/Desktop/rubygems-0.9.4]$

Hm. At least the next step to generate a simple rails app works fine.





Slight Changes

22 10 2007

Some slight changes have been brewing for some time at this blog.

Mainly, I will start writing here about as many papers as I can of those I read. The motivation for this is twofold: increasing frequency of writing, and understanding papers better. My thesis advisor has been telling me for some time now that I need to write a one paragraph summary of a paper after I finish reading it, that I should not need to refer back to the paper in order to do this, and that I should not move on to the next paper without producing a summary for the previous. I’ll try to group these into sets of relevant papers which also tie together my thoughts on the papers and ideas I get from them.

The next few papers will be on Singularity and provably correct garbage collectors. The JIT JavaScript VM I was originally going to write with friends this semester split into multiple groups when another team member decided to pursue another project and took half the group with him. I intended to stay with the original project, but only two others made the same choice. And three people is not enough to implement a complete JIT compiler from scratch in 3 months. We meandered through a number of topics, and still don’t have a clear final direction (which is worrisome considering this is supposed to turn into my honors thesis). We’ve settled more or less into the broad area of cross language memory management and runtime systems. We’ve read a number of papers: several on conservative garbage collection, a couple on Haskell FFI, the Xerox PARC Portable Common Runtime, and a few others. I’ll try to write about all of them at some point, again grouped into logical sets. That’s all for now – I have to spend time getting ready for upcoming midterms and interviews.





Exciting Upcoming OS Improvements

19 10 2007

I must apologize for the lack of posts – the first few weeks of the semester, plus the job hunt, are extremely time-consuming. I have half-finished drafts of 3 articles, but no time to do the solid revising and research they need. I try to make my technical entries very precise, accurate, and backed by links to reputable sources.

But this is just a brief entry, because I’m excited for upcoming updates to three wonderful OSes:

Mac OS X Leopard
This has been anticipated for some time. It ships on the 26th of October if you weren’t already aware, and has some wonderful features. TimeMachine initially excited me the most. I heard about it while watching a periodically updated blog post of WWDC 2006 coverage with coworkers at NetApp, and to us it immediately suggested that Apple finally wised up a bit and implemented snapshots in one of their filesystems. The fact that TimeMachines requires an external hard drive makes it clear that this isn’t quite the case, which is a bit surprising given that it has been acknowledged that Leopard has at least some support for ZFS. Supposedly Leopard will only support reading from ZFS – alas, my dreams of a dual-boot Solaris/OS X Macbook with a shared ZFS pool will have to wait for another day.

More exciting to me is the addition of DTrace to Mac OS X in the form of Instruments, a snazzy GUI on top of DTrace. This is going to be a killer developer application. DTrace is very powerful, and fairly flexible, but has a bit of a learning curve to do more advanced things. I’m very optimistic about how discoverable an Apple GUI can make this.

And of course, after many years of using multiple desktops on Linux, they’re finally in OS X. For those who can’t wait, or don’t want to upgrade just for multiple desktops, Desktop Manager and Virtue Desktops work reasonably well, though for obvious reasons they’re no longer under development.

That said, my Powerbook is finally going, so I’m probably just going to buy a new Macbook next time the hardware is updated, and get Leopard that way instead of shelling out money for an upgrade. If it weren’t time for a hardware upgrade for me, I think I’d probably still do it just to have DTrace on my Mac.

OpenSolaris Project Indiana
What is Project Indiana? Many things, but primarily two things: an effort to create an all-open-source version of OpenSolaris (which currently includes some binary blobs to run well), and a place to prototype things like the new installer, stable ZFS root and boot, and the new package system. It was uncertain when a prototype of all these things would arrive, but it seems that a developer release will be available in the next couple weeks. This is enough to make me hold off on finishing customization on the new workstation I just got; I’m going to wait, and install this development version from scratch. I’m sure I’ll run into plenty of bugs, but that’s fine – it’s exciting! Also, an additional benefit of doing a reinstall is that I can make an extra slice for doing live upgrades of my system, which the preinstalled configuration doesn’t support. I can’t wait.

[Update: Found a very thorough description of Project Indiana.]

KGDB in Linux
Despite being postponed, it looks like a proper kernel debugger is headed into the mainline Linux kernel. Linux has actually had kernel debuggers for some time, but they were external patches. Being in the mainline kernel will mean better stability and will likely increase use among kernel developers.

This doesn’t directly impact end users, because most users don’t debug kernels. It does however affect them indirectly, because it will help kernel developers find (and fix) bugs faster. For kernel developers, a proper kernel debugger is a blessing. I used kmdb extensively this summer working in the Solaris Kernel Group. I can’t imagine how frustrated I would have been without it. Being able to step kernel code makes it almost as easy to debug as userland code (with some exceptions, obviously). Mac OS X has also had a well integrated kernel debugger (two in fact) for some time as well.

I’m really glad Linux is finally going to integrate this – the Apple documentation on kernel debuggers is spartan, and Solaris is still (unfortunately) not as easy to get up and running as Linux, and anyone who wants to hack on a kernel benefits greatly from having a solid kernel debugger. Hopefully this will encourage more people to jump the gap to kernel work, since this makes it more approachable.