Working with Large ML Code Bases

11 02 2008

It’s interesting how tool prevalence and standardization (or lack thereof) for a programming language are very strong indicators of how many people have used it to build significant projects.

My honors thesis has me hacking up ML implementations – I started with OCaml, but later switched to MLkit after realizing that it already had support for some features I was working to implement in OCaml. As you might imagine, the source for a compiler and runtime for reasonably mature languages (the ML family of languages started in the mid-70s) is a reasonably hefty chunk of code. The layouts of the MLkit source tree is fairly clear (there is a conveniently-named “Compiler” directory, containing such subdirectories as “Backend/X86/”). The layout of the OCaml source is slightly less intuitive, but it comes with an informative README which points out where various parts of the system reside. Yet even within these more focused segments of the source trees, there is a lot of code. And it’s not boilerplate – when you need functions and types for spitting out most of the relevant instructions on the x86 architecture, or handling every abstract syntax tree node of a very expressive language, you’re going to have a lot of code. But there are no good tools for managing a code base of this size in any ML descendant.

I ran into this to some degree with the class project for my software engineering class last year (CS190 – 2007′s page is missing… I need to look into that). We wrote a frontend to GDB using OCaml. The idea was to display separate, independently-controlled panes for each thread of a multithreaded program (obviously, this didn’t scale well past, oh, 5 threads). It worked reasonably well at the end of the semester, but not well enough that we would ever release it. There were a few times, in a project which totaled only a few hundred lines of code, that I wanted some mechanized help working through other modules quickly to find what I needed. Several times I resorted to emailing a peer – hardly a terrible horror, but not something which should have been necessary for simply finding a function in a small code base. The only tools we had for managing the code base were…. Subversion, and VIm.

MLkit is nearly 240K lines of code in over 1000 files (including comments, but as it is primarily a research project, those are few and far between, plus a few odd linebreaks). Even cutting it down to just the compiler (ignoring its dependencies elsewhere in the tree) we’re looking at almost 56K lines of code in over 150 files.

C, C++ and Java programmers have a plethora of tools at their disposal for working with code bases of this size. When I was at Sun this past summer, between cscope and having OpenGrok set up on the kernel code, finding definitions, declarations, types, etc. was usually a breeze. Both have their limitations, and their bugs, but they’re still great. And that’s not even using a lot of the IDE support present in major IDEs like Visual Studio or Eclipse.

ML programmers have no reliable systems which I can find.

A quick search for OCaml IDEs yields a few results. A page listing a few of what look to be mostly-dead OCaml plugins for Eclipse. Of these ODT has apparently seen activity this year, but seems to be mostly syntax highlighting and a build tool. A from-scratch IDE, Camelia, built by my friend Nate when we TAed CS017 (now unfortunately renamed CSCI0170, thanks to the abomination that is Sungard‘s Banner…). Searching for any of OCaml, SML, ML, or Haskell with “cscope” on Google yields nothing of use. Jane Street Capital uses OCaml. I wonder what tools they use.

Oddly enough, it seems the winner in this case may be the F# plugin for Visual Studio. It has support for following references around large code bases, and is actively maintained (as MS intends it to be a mainstream .NET language in the near future). Unfortunately, it can also only deal with F# files, which are close to, but not quite close enough to Standard ML files….

Perhaps I’ll build a cscope-alike for ML myself.

After I finish my thesis.

EDIT: A commenter on reddit suggested taking a look at ocamlbrowser. It seems to be at least mostly what I’m looking for (for OCaml), though I can’t seem to get it to do anything other than display the types for functions in the standard module library, so I can’t say for sure. There’s a setting to change the module search path, but it doesn’t seem to change anything for me. I also find it odd that despite the fact that it is included with the OCaml base distribution (at least in Debian), no web search for any query I can think of which expresses the sort of task ocamlbrowser is supposed to facilitate yields any results referencing it. This suggests that very few people use it – maybe I’m not the only one who can’t get it to work on an arbitrary code base easily. Might play with it some more once I get some more time.

EDIT2: Another reddit commenter mentioned otags (in the vein of ctags), which looks good.  If only I was still working in OCaml :-p On the other hand, looking up the link for ctags made me look at that again (I had foolishly assumed that it, like cscope, supported only C and C++).  In particular, apparently someone reimplemented ctags for more language support (and called it Exuberant Ctags).  The language support list claims that it in fact works on SML!  A download and compile later, I’m quite the happy camper!  It works just like the standard ctags.  Many thanks to zem on reddit for inspiring a very fortuitous Google search.

About these ads



10 responses

12 02 2008
12 02 2008
Ralph Douglass

Most ocaml programmers I know use emacs and vim. A couple of us (myself included) use Textmate, but that’s less common.

There is Dromedary, which was started for the OCaml Summer Project ( last year, but I’m not sure where it ended up.

As for tools, you can use cmigrep (you can build it from GODI), but you need to write your own nice interface for it if you want more than the command line (this is what one of the experimental code completion bundles in TextMate does).

For version control, I suggest HG.

12 02 2008
Vesa Karvonen

If you don’t mind the plug, I would recommend trying my def-use mode. I’ve used it myself to browse MLKit’s codebase. If you need help setting it up, you can ask on the MLton users mailing list or e-mail me directly.

12 02 2008

ocamlbrowser -I path is your friend.

13 02 2008

Emacs with Taureg mode has browsers and type inspection.

27 02 2008
Remainder of February Bookmarks Trawl » What the rain knows (archives)

[...] Working with Large ML Code Bases « A Ham Sandwich [...]

20 07 2008
Jon Harrop

Historically, there are no modern GUI tools for these languages because they all have dreadful FFIs and unusable GUI libraries. F# could be the exception but its Visual Studio mode is awful (e.g. very unreliable, very slow, generally buggy) and Microsoft are closing the interface to the compiler specifically because they want a monopoly over the tools available to F# programmers. This is also why they have shelved metaprogramming, which is another great loss.

The only viable option appears to be to reinvent ML on .NET yourself and write the GUI tools you need. We are seriously considering this option even though it would be an enormous undertaking.

24 02 2010

I am glad I found your blog on reddit. Thank you for the sensible critique. Me and my brother were just preparing to do some research about this. I am glad to see such good info being shared freely out there.
Fremont from Salem city

22 09 2011

Muy buen post, me ha gustado, gracias. Good Post. Thank you.

19 12 2012

Might you have even more posts similar to this particular one called, Working with Large ML Code Bases A
Ham Sandwich? We desire to read through even
far more regarding it. Thanks for your time.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

%d bloggers like this: