Mercurial + MLKit + OS X = Failure, plus workaround

21 03 2008

My thesis, as I’ve mentioned several times before, has me hacking up the source to the ML Kit compiler. Because the current build arrangement I’ve been using requires a great deal of RAM, I’ve been using my desktop machine and CS department machines exclusively for building. However, I’m going home for part of spring break this coming week, there’s no computer with appropriate specs at home, and editing code over SSH is rather painful because of the high latency involved. The obvious solution, since I’m using Mercurial to manage my thesis, was to do a checkout on my laptop (a G4 Powerbook lacking enough RAM to build, and lacking the correct processor architecture to test), edit locally, and just push intermediate versions back to the CS systems to build there. This would result in some useless crap in the commit logs, but it beats the alternatives. I had an existing checkout from back in January, when I was using OCaml, so I went to update and saw the following:

Yggdrasil:thesis colin$ hg up
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
Yggdrasil:thesis colin$ hg pull
remote: ####################
pulling from ssh://cs/thesis/
searching for changes
adding changesets
adding manifests
adding file changes
added 44 changesets with 18762 changes to 17825 files (run 'hg update' to get a working copy)
remote:
remote: Forwarding you to cslab0a
remote:
remote: ####################
remote:
Yggdrasil:thesis colin$ hg up
abort: case-folding collision between mlkit/kit/basis/MLB/RI_PROF/Splaymap.sml.d and mlkit/kit/basis/MLB/RI_PROF/SPLAYMAP.sml.d
Yggdrasil:thesis colin$

I saw this, noticed it was a temp file (ML Kit makes an MLB directory for temp files) and hg rm’d all such files in the CS copy, committed, and tried again. Same result. Poked around online, and after a bit, found a message from someone with a similar problem, with a similar case difference between two distinct files, and a response pointing out the problem with Mac OS X’s case-insensitive filesystem.

When I installed the latest OS X, I intentionally opted for case-insensitive because many Mac programs have a habit of breaking on case-sensitive filesystems. And now I can’t have a copy of the ML Kit source on my machine. It’s common practice in SML programs to use all-caps names for files containing signatures (akin to interfaces for the non-ML folk) and using “upper” camel case for implementations. So I can’t just rename a couple files, it seems you just can’t have the ML Kit source on most OS X installations’ local disks.

So was it time to give up on this? No, of course not – I need to work on my damn thesis while I’m home. Screw Apple and it’s crappy filesystem design choice.

Fortunately Apple did make one excellent, excellent design choice with OS X, which was making disk images a standard, common object. So I used the Disk Utility tool to make a disk image containing a case-sensitive filesystem, mounted it, and did a checkout inside that. It’s pretty stupid that I needed to do this, but at least I can work on my thesis over break without running Vim over SSH.

Now if you’ll excuse me, my friend Owen is heading off to Apple’s filesystem team next year, and I need to go give him grief for his group’s terrible design choice.





Amazon’s Book Recommendation System is Quirky

15 03 2008

Like many others, I have an Amazon.com wishlist, allow them to track my purchases, and occasionally tell them that I own a particular book if it is recommended to me.  Amazon uses this along with purchase sets made by other customers to generate “better” recommendations for books I might find interesting.  One thing its recommendation algorithm doesn’t seem to take into account is general subject knowledge – it knows of subcategories, and purchases of multiple items by the same people, but has no concept of actual relationships between books in a category.  This is actually a pretty reasonable thing to omit since there’s a great deal of such knowledge which could be taken into account, and the work/reward ratio for maintaining that is probably not great.  But as a result, I’ve had some rather interesting recommendations.

Also, I’m linking to books for clarity, but there’s no Amazon referral crap going on.

Mathematics texts.  Books on my Amazon wishlist: Foundations of Mathematical Logic, The Calculi of Lambda Conversion, Mathematical Foundations of Information Theory, Categories for the Working Mathematician, Basic Category Theory for Computer Scientists, among others.  Some of the more amusing mathematics recommendations from Amazon: Pre-Algebra, Algebra 2.

Literature.  Books on my Amazon Wishlist: Going After Cacciato, Letters to a Young Poet, The Norton Anthology of Poetry, ‘Tis: A Memoir, The Dharma Bums, among others.  Amazon recommends: Vocabulary Workshop: Level C.

These are just the few that I dug up spending a couple minutes searching my emailed recommendations; I’ve seen plenty more out of place recommendations upon logging into Amazon.com.  While it’s certainly true that some people will buy books at both extremes of a field’s complexity (perhaps something advanced for themselves or a friend, and something basic for a child), it’s still an amusing quirk to have this disparity with recommendations for an individual because of the presentation.  Amazon.com presents recommendations as a set of items you should be interested in according to Amazon’s “understanding” of your personal interests, when it’s really just correlating your purchases and explicit interests with the purchases of other users.