Source Control and Build Systems

18 07 2007

I’ve been thinking lately about the shortcomings with SCMs (Source Control Management systems) and build systems. Mainly, that by integrating a certain kind of SCM with a build system, builds of large projects could be made to go a bit faster, as well as gaining some semantic information.

SCMs I’ve used:

  • CVS – About as basic as you can get while still being useful. Basic checkout-modify-checkin model which forms the basis of most SCMs.
  • Subversion – A slightly nicer version of CVS for most intents and purposes. This is what I consider baseline for source control. The only difference from CVS that I’ve ever cared about is that it screws up less frequently with binary files than CVS.
  • Mercurial – A nice distributed SCM.
  • SCCS / Teamware – Sun’s internal SCM. It’s a distributed SCM with very explicit parent-child repository relationships (which are of course flexible, more later)
  • Perforce – Pretty obnoxious in my experience, and not singularly any particular SCM paradigm.

I like distributed SCMs, largely because it also provides a way for a developer to have their own version control without dealing with the main repository. There’s no need to make the SCM work with another used locally, or with another copy of the same SCM – checkins can be done on the local repository without issue. This is important for developers who may for example just be starting on an existing project and not have commit access yet, or for a developer who wants to work on a laptop or other local machine under circumstances which lack access to the main repository. I also like a noticable parent-child relationship between repositories. SCCS has this. Mercurial has a notion of a parent repository (by default the one you cloned from), but it isn’t emphasized in use of the tool, so it isn’t as semantically meaningful – this is a personal nitpick, and I don’t really have a technical reason for backing up this preference. Perforce is mostly just obnoxious in my opinion*. It is similar to a DSCM, in that each developer has their own copy of the respository which they do direct file checkouts from. It keeps checkout state on a central server, however. And to save space, it enforces a requirement that all copies of the repository be on the same filesystem, because it uses hardlinks to do an initial clone of the repository. This is novel, and certainly saves space, but in addition to making remote work impossible also creates an emormous problem. Developers do occasionally screw up their umask on UNIX systems, so they do occasionally need to chmod files. Or more commonly, an editor will allow users to easily (sometimes accidentally) override read-only protection on files by chmod’ing them independently. Then the editing begins. It’s bad enough when someone commits bad code to a main repository, and those who sync up before the changes are backed out have their workspaces broken. But when someone accidentally chmod’s a file in the Perforce situation above, it can simultaneously break building in every checked out workspace that hasn’t done a local checkout of that file, which would remove the hardlink and replace it with an actual copy. For this reason, I dislike SCMs which make hardlinks for checkouts. One thing I like about both Perforce and SCCS/Teamware is that each has an explicit list of files being edited, which is the inspiration for the main point of this post.

Build systems I’ve used:

  • make (and parallel versions of it) – The baseline build system from a UNIX/*nix perspective.
  • ant – Used for Java.

The main problem with make and ant is that they are stateless – so for partial builds, they still need to do a lot of initial work to determine what needs to be compiled, and for languages like C, linked. Both traverse the filesystem, stat’ing every file listed such that the build target is dependent on it. For small projects, this isn’t significant. For larger projects (such as OpenSolaris, it becomes noticable, and increases the build time – particularly on systems with other slowdowns for file operations, such as older systems with slow disks, or systems using network mounts for files. An incremental x86 build of OpenSolaris, on the build machines at work, takes between 10 and 40 minutes depending on the number of users on the build servers, after changing one file. Recompiling any one of the main files in the genunix module takes less than 30 seconds. Re-linking all of the object files which compose genunix doesn’t take more than a minute or so. And rebuilding cpio archives for the bfu script only takes about 2 to 3 minutes. Which means that each OS build I do takes between 2 and 8 times as long as it should. This is partly a result of the OS makefile structure – there has been some discussion recently on some OpenSolaris lists about the construction of the makefiles. But the main issue is that either way, hundreds of files are stat’ed which are known not to have changed – the build system simply lacks this information.

One possible solution to this (which I might try to code up this weekend) is to combine a distributed SCM with explicit local checkout (so it has a full list of all potentially modified files), and a build system. This would allow the tools, given a dependency graph, to not even look at source files which don’t need to be rebuilt. Only modified files and those which depend explicitly on them would need to be checked. This could shave some noticable time off the OS build system. Of course, this doesn’t address things like header-dependency unless those are added explicitly to the dependency tree requirements. But for object files, this could be a win.

* It is probably worth noting that this is from my experience at one company where Perforce is in use. I haven’t looked into it too much, so these problems could just be a particular configuration of Perforce.