Mercurial + MLKit + OS X = Failure, plus workaround

21 03 2008

My thesis, as I’ve mentioned several times before, has me hacking up the source to the ML Kit compiler. Because the current build arrangement I’ve been using requires a great deal of RAM, I’ve been using my desktop machine and CS department machines exclusively for building. However, I’m going home for part of spring break this coming week, there’s no computer with appropriate specs at home, and editing code over SSH is rather painful because of the high latency involved. The obvious solution, since I’m using Mercurial to manage my thesis, was to do a checkout on my laptop (a G4 Powerbook lacking enough RAM to build, and lacking the correct processor architecture to test), edit locally, and just push intermediate versions back to the CS systems to build there. This would result in some useless crap in the commit logs, but it beats the alternatives. I had an existing checkout from back in January, when I was using OCaml, so I went to update and saw the following:

Yggdrasil:thesis colin$ hg up
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
Yggdrasil:thesis colin$ hg pull
remote: ####################
pulling from ssh://cs/thesis/
searching for changes
adding changesets
adding manifests
adding file changes
added 44 changesets with 18762 changes to 17825 files (run 'hg update' to get a working copy)
remote: Forwarding you to cslab0a
remote: ####################
Yggdrasil:thesis colin$ hg up
abort: case-folding collision between mlkit/kit/basis/MLB/RI_PROF/Splaymap.sml.d and mlkit/kit/basis/MLB/RI_PROF/SPLAYMAP.sml.d
Yggdrasil:thesis colin$

I saw this, noticed it was a temp file (ML Kit makes an MLB directory for temp files) and hg rm’d all such files in the CS copy, committed, and tried again. Same result. Poked around online, and after a bit, found a message from someone with a similar problem, with a similar case difference between two distinct files, and a response pointing out the problem with Mac OS X’s case-insensitive filesystem.

When I installed the latest OS X, I intentionally opted for case-insensitive because many Mac programs have a habit of breaking on case-sensitive filesystems. And now I can’t have a copy of the ML Kit source on my machine. It’s common practice in SML programs to use all-caps names for files containing signatures (akin to interfaces for the non-ML folk) and using “upper” camel case for implementations. So I can’t just rename a couple files, it seems you just can’t have the ML Kit source on most OS X installations’ local disks.

So was it time to give up on this? No, of course not – I need to work on my damn thesis while I’m home. Screw Apple and it’s crappy filesystem design choice.

Fortunately Apple did make one excellent, excellent design choice with OS X, which was making disk images a standard, common object. So I used the Disk Utility tool to make a disk image containing a case-sensitive filesystem, mounted it, and did a checkout inside that. It’s pretty stupid that I needed to do this, but at least I can work on my thesis over break without running Vim over SSH.

Now if you’ll excuse me, my friend Owen is heading off to Apple’s filesystem team next year, and I need to go give him grief for his group’s terrible design choice.

Source Control and Build Systems

18 07 2007

I’ve been thinking lately about the shortcomings with SCMs (Source Control Management systems) and build systems. Mainly, that by integrating a certain kind of SCM with a build system, builds of large projects could be made to go a bit faster, as well as gaining some semantic information.

SCMs I’ve used:

  • CVS – About as basic as you can get while still being useful. Basic checkout-modify-checkin model which forms the basis of most SCMs.
  • Subversion – A slightly nicer version of CVS for most intents and purposes. This is what I consider baseline for source control. The only difference from CVS that I’ve ever cared about is that it screws up less frequently with binary files than CVS.
  • Mercurial – A nice distributed SCM.
  • SCCS / Teamware – Sun’s internal SCM. It’s a distributed SCM with very explicit parent-child repository relationships (which are of course flexible, more later)
  • Perforce – Pretty obnoxious in my experience, and not singularly any particular SCM paradigm.

I like distributed SCMs, largely because it also provides a way for a developer to have their own version control without dealing with the main repository. There’s no need to make the SCM work with another used locally, or with another copy of the same SCM – checkins can be done on the local repository without issue. This is important for developers who may for example just be starting on an existing project and not have commit access yet, or for a developer who wants to work on a laptop or other local machine under circumstances which lack access to the main repository. I also like a noticable parent-child relationship between repositories. SCCS has this. Mercurial has a notion of a parent repository (by default the one you cloned from), but it isn’t emphasized in use of the tool, so it isn’t as semantically meaningful – this is a personal nitpick, and I don’t really have a technical reason for backing up this preference. Perforce is mostly just obnoxious in my opinion*. It is similar to a DSCM, in that each developer has their own copy of the respository which they do direct file checkouts from. It keeps checkout state on a central server, however. And to save space, it enforces a requirement that all copies of the repository be on the same filesystem, because it uses hardlinks to do an initial clone of the repository. This is novel, and certainly saves space, but in addition to making remote work impossible also creates an emormous problem. Developers do occasionally screw up their umask on UNIX systems, so they do occasionally need to chmod files. Or more commonly, an editor will allow users to easily (sometimes accidentally) override read-only protection on files by chmod’ing them independently. Then the editing begins. It’s bad enough when someone commits bad code to a main repository, and those who sync up before the changes are backed out have their workspaces broken. But when someone accidentally chmod’s a file in the Perforce situation above, it can simultaneously break building in every checked out workspace that hasn’t done a local checkout of that file, which would remove the hardlink and replace it with an actual copy. For this reason, I dislike SCMs which make hardlinks for checkouts. One thing I like about both Perforce and SCCS/Teamware is that each has an explicit list of files being edited, which is the inspiration for the main point of this post.

Build systems I’ve used:

  • make (and parallel versions of it) – The baseline build system from a UNIX/*nix perspective.
  • ant – Used for Java.

The main problem with make and ant is that they are stateless – so for partial builds, they still need to do a lot of initial work to determine what needs to be compiled, and for languages like C, linked. Both traverse the filesystem, stat’ing every file listed such that the build target is dependent on it. For small projects, this isn’t significant. For larger projects (such as OpenSolaris, it becomes noticable, and increases the build time – particularly on systems with other slowdowns for file operations, such as older systems with slow disks, or systems using network mounts for files. An incremental x86 build of OpenSolaris, on the build machines at work, takes between 10 and 40 minutes depending on the number of users on the build servers, after changing one file. Recompiling any one of the main files in the genunix module takes less than 30 seconds. Re-linking all of the object files which compose genunix doesn’t take more than a minute or so. And rebuilding cpio archives for the bfu script only takes about 2 to 3 minutes. Which means that each OS build I do takes between 2 and 8 times as long as it should. This is partly a result of the OS makefile structure – there has been some discussion recently on some OpenSolaris lists about the construction of the makefiles. But the main issue is that either way, hundreds of files are stat’ed which are known not to have changed – the build system simply lacks this information.

One possible solution to this (which I might try to code up this weekend) is to combine a distributed SCM with explicit local checkout (so it has a full list of all potentially modified files), and a build system. This would allow the tools, given a dependency graph, to not even look at source files which don’t need to be rebuilt. Only modified files and those which depend explicitly on them would need to be checked. This could shave some noticable time off the OS build system. Of course, this doesn’t address things like header-dependency unless those are added explicitly to the dependency tree requirements. But for object files, this could be a win.

* It is probably worth noting that this is from my experience at one company where Perforce is in use. I haven’t looked into it too much, so these problems could just be a particular configuration of Perforce.