Slight Changes

22 10 2007

Some slight changes have been brewing for some time at this blog.

Mainly, I will start writing here about as many papers as I can of those I read. The motivation for this is twofold: increasing frequency of writing, and understanding papers better. My thesis advisor has been telling me for some time now that I need to write a one paragraph summary of a paper after I finish reading it, that I should not need to refer back to the paper in order to do this, and that I should not move on to the next paper without producing a summary for the previous. I’ll try to group these into sets of relevant papers which also tie together my thoughts on the papers and ideas I get from them.

The next few papers will be on Singularity and provably correct garbage collectors. The JIT JavaScript VM I was originally going to write with friends this semester split into multiple groups when another team member decided to pursue another project and took half the group with him. I intended to stay with the original project, but only two others made the same choice. And three people is not enough to implement a complete JIT compiler from scratch in 3 months. We meandered through a number of topics, and still don’t have a clear final direction (which is worrisome considering this is supposed to turn into my honors thesis). We’ve settled more or less into the broad area of cross language memory management and runtime systems. We’ve read a number of papers: several on conservative garbage collection, a couple on Haskell FFI, the Xerox PARC Portable Common Runtime, and a few others. I’ll try to write about all of them at some point, again grouped into logical sets. That’s all for now – I have to spend time getting ready for upcoming midterms and interviews.





STM and Haskell Attention

12 08 2007

Haskell has been garnering a fair bit of attention of late, largely as a result of Simon Peyton-Jonespresence at OSCON (and the ensuing flurry of Reddit postings of related blog posts). He gave a keynote on Software Transactional Memory (STM), which is an alternative to using explicit locks and condition variables in writing parallel programs. He also gave an introduction to Haskell, and a smaller talk on data parallelism.

I’m quite pleased by this recent attention, for a couple reasons.

The first is that I really agree with Simon that STM is going to be the way to go soon*. I also just think it’s damn cool – that’s why I did an independent study on it with Maurice Herlihy last semester, working with a slightly more recent version of Microsoft’s SXM implementation of STM for C# than that available at the link I just gave. Simon gives excellent motivation for STM in his talk, but there are still others if you’re unconvinced after watching it. Consider a datastructure which you traverse in multiple directions – perhaps a grid, which can be traversed in either direction in either dimension depending on the contents at a given location – or perhaps a mutable graph representation. What locking scheme do you use for scalable (read: not coarse-grained) access to a large instance of this datastructure? You could establish a total ordering of nodes and corresponding locks, or do a row/column at a time with an ordering on rows/columns in a table, and acquire locks only in that order. But since you can traverse in any direction, this would sometimes require building lists of locks to acquire, releasing later locks which were acquired in reaching the current point of traversal, and then walking the list reacquiring locks. How expensive is this, possibly acquiring the same lock over and over? How hard is it to write that code correctly? How much effort does it take to document, and for a new maintainer to learn, this locking scheme? All of this makes fine-grained locking a poor idea at best.

How much easier would it be to write such code naïvely and mark it as atomic? And with a proper implementation of STM, how much faster could this be on a large structure than a coarse-grained lock, or repeated acquisition and release of the same fine-grained locks? And this is a bonus on top easier coding for simple concurrent structures (and partial failures), and composition of atomic transactions, which cannot be done with normal locks.

If, like me, you’re curious about how STM is implemented, I suggest you check out a few papers. Understanding of all of these benefits from understanding the basic ideas and terminology of concurrency, as well as a bit of computer architecture knowledge.

There are certainly many, many more implementations, and many more papers, and more ways to tune an implementation than these couple papers spend much time on. Check out the related work in these papers.

I’m also pleased, because I feel Haskell has been underrated by the programming populus at large for a long time.** I’m interested in becoming more familiar with Haskell largely because of some of my pet peeves with OCaml, and because Haskell’s type system particulars are both richer and closer to category theory, which interests me mainly as an intellectual curiosity (and for its relation to type theory in general). Well written Haskell can also beat out many more well-known languages performance-wise (example, definitely poke around the language shootout). And it’s always nice to see a real functional language (read: one in which functional style is the norm, not just an option) getting mainstream attention.

I can only hope that people continue to take note of Haskell.


* Please note, those of you who are itching to start a flamewar about shared state vs. message passing, that STM does not claim to be the be-all and end-all of concurrent programming. It is designed to make programming with shared state easier than it is today. And shared state is not going away, because it’s the model with which chips are currently designed. So at the very least, operating systems, compilers, and some runtime systems will still need to deal with it, even if all other development switches to message passing concurrency.

** I should probably mention that I have not used Haskell a great deal, though this is not for lack of desire. A couple years ago I wrote a short assignment in Haskell (largely to play with lazy evaluation). I’ve also spent a fair amount of time writing OCaml over the past couple years, for the course I TAed and head TAed which uses OCaml to introduce students to currying and static typing (following an introduction to CS with Scheme), and last semester in a group software engineering course (perhaps I’ll elaborate on this experience another time). So I’m familiar with a similar type system.





Citations, Citations, Citations

20 07 2007

I recently got a paper published in the ASM Workshop ’07. Two of my co-authors presented it in person in Norway, which I was unable to attend.

This was my first academic paper, and while it could have been better, I’m fairly satisfied with it. Finishing it off sucked up most of my time the first two weeks that I was out of school for the summer. I basically left Brown on a Saturday, slept the rest of that day, watched TV and drank soda on Sunday, and on Monday began two weeks’ worth of writing, initially cleaning up the draft submission which resulted in acceptance to the conference, and eventually rewriting the paper from scratch after our advisor took a closer look at it. Two days after final submission I flew out to California to begin my internship at Sun. I must admit that most of the reason this ate those two weeks was because we all put off the revisions for so long because of finals, but from other things Shriram has said, this seems to be pretty common among both inexperienced and veteran academics.

One thing he commented on while he was helping us to revise it was that a number of the citations we had in the paper were originally wrong. Many were taken from the homepages of paper authors kind enough to provide BibTeX information. Most of the others were taken (and somewhat corrected) from CiteSeer. However, Shriram caught that a number of these were still incorrect, and said that many (most?) of the citation information given on CiteSeer is incorrect or incomplete. When I asked what the normal way to find solid citations is, whether it’s anything more than spending lots of time researching, he answered in the negative.

Why is there no central, reputable, correct repository for this information? Or alternatively, why is CiteSeer going to the trouble of archiving all of these papers, and not doing so completely correctly?