JIT VMs and a Compiler Book

30 07 2007

Well, a few friends and I are doing a group independent study this semester where we’ll be writing something at least similar to a JIT JavaScript virtual machine. At least, the original goal was to do this, but we’re now being sidetracked (perhaps productively so) by OpenMoko, an all-open-source smart phone intended to have a feature set comparable to (or exceeding) the iPhone. It’s wide open, and rather than being one-upped by new developments to Tamarin or by new/upcoming efforts by a company we’ve heard rumors might be doing their own JS VM (who would almost certainly beat our performance if they attempt this), we’ll just be the first on a new platform, and rather than focusing on the main goal we had in mind before (doing a JIT compiler for purely performance reasons) we can consider other issues related to using an embedded platform (memory usage being more imporatant, power consumption, etc.) as well as being the first on a new platform. It’s not very mature. But I think it will be interesting, and fun. And educational.

As a result of this, our professor asked us to each get a copy of one of Andrew W. Appel‘s compiler books and finish it by the end of August. Since many (/most/all) of us want to us either Haskell of OCaml, we decided to standardize on Modern Compiler Implementation in ML. I got my copy yesterday, and read 40 pages (I read none today, discarding it temporarily in favor of wine tasting in Napa and Sonoma). So far, the book is well written, concise, and explains both the theory behind, and practical aspects of, the topics it addresses. I’m partway through parsing, which has been a nice refresher on FSMs (deterministic and nondeterministic) and regular expressions. One thing which has bothered me slightly is that he sort of plays fast and loose with models of computation. The example which sticks for me is that he gives a pretty good explanation of FSMs, and then begins to talk about arbitrarily deeply nested comments. The main issue with them being that you can’t use a finite state automata to parse them, because you’d need an infinite number of states to handle an arbitrarily large nesting depth. He gives the (accurate) practical solution that when specifying this in ML-lex (used in the book), states can be explicitly specified, and hitting the open and close comment representations can run ML code which increments and decrements a counter. This works fine, but he continues referring to the resulting implementation as a finite state automata – which it isn’t. Adding the counter transforms it into a pushdown automata with a single (unspecified) letter stack alphabet.

I haven’t finished the whole group of parsing chapters – maybe he doubles back to this. Another consideration is that there are also C and Java versions of this text, both of which have been updated since initial publication, while the ML version has not. It’s possible this was changed in one of those newer versions.

I’m not even sure why this bothers me so much. I used to be all about getting the intuition to work, and leaving formalism and particulars to be secondary concerns. Perhaps the years of hard work from my sometimes overly-pedantic discrete math, models of computation, and algorithms TAs finally sunk in. Or maybe it was writing a theory paper. I’m turning into a stickler for formalism.

Regardless of this silly irritation, I’m still excited for the rest of the book.

Advertisements

Actions

Information

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: