Giving up on Ruby on Rails for Python and Django

24 02 2008

Well, a while back I wrote about starting to use Ruby on Rails to build myself a web application to track technical papers which I had read or planned to read. I gave up on RoR after a (very short) time period. Now, despite preferring to code in Ruby over Python, I’ve switched to Django. Why?

  • While I am capable of writing SQL myself, I really, really loathe it.
  • The Django documentation is easier to find/deal with
  • Django seems to do more busywork for you than Rails does.
  • It’s easy to poke around in the intermediate stages of development.

One of the pain points of RoR is the fact that you still have to deal directly with your database engine (likely, as in my case, MySQL). I’ve learned to deal with SQL after working on Mocha for a while. That doesn’t mean I like it. In fact, it’s still a pain. And the worst part of dealing with SQL is setting up tables, which is in fact the only database work Rails requires you to do – the worst part. It then does some magic to connect your ruby to the schema you’ve specified. Django lets you define the data model directly in Python. From there, it automatically creates the SQL tables, including creating join tables for many-to-many relationships, which is where I hit the main roadblock with Rails; I knew how to define the tables for the many-to-many relationship, but for whatever reason couldn’t easily find information on making a many-to-many relationship explicit in the Ruby model. The only SQL I had to do with Django was enough to create a new database, and ‘CREATE DATABASE whatever’ isn’t much to do, even if you’ve never used SQL directly before.

The many-to-many relationships documentation for Django is still a bit confusing, but clear enough that you can play with things, and unlike RoR, I could find it. The RoR documentation seems to consist primarily of the class API docs (only useful once you already mostly know what you’re doing) and somewhat mediocre 3rd-party tutorials. The Django documentation is actually organized as a manual. Overall the Django documentation is clear, and the initial tutorial on the Django site is incredibly useful – I’ve built everything thus far simply following along with the tutorial and tweaking things as needed – following links to more detailed documentation on what data types are available, using many-to-many instead of many-to-one, etc. Django also creates far fewer files, and almost all of them have an immediately obvious purpose. There’s less magic in the system, without making you do more work. It also makes setting up an administrative interface much easier, and includes a module for easy authentication setup for your application in the default install.

This is not to say that picking up Django has been nothing but butterflies and ponies; I’ve run into a few rough edges.

One design decision in Django that I’m not sure I’m happy with (though I’m not unhappy with it either) is the decision to orient it towards a “newsroom model” as the tutorial describes it. This means that the architecture was designed with the intention that there were distinct sets of users: publishers and readers. Django includes a big module (the one just mentioned above) for managing users and creating an administrative interface (‘django.contrib.admin’) and the documentation suggests its use. While you’re certainly free not to use it, its presence encourages its use, thereby encouraging the same model the developers had in mind. This isn’t necessarily a bad thing. I’m not sure it makes a difference to me, though it definitely encourages a split interface between the two sides of the application, which might not be right for all cases.

The real rough edges I’ve been rubbing up against are mostly regarding the many-to-many relationship. While I wasn’t expecting everything to be perfect, the disparity in easy flexibility between normal object fields and many-to-many fields is surprising. For example, the administrative interface Django suggests includes support for a lot of easy UI additions, such as searching on fields. This works great for standard fields, and not at all for many-to-many fields. In my case I have a many-to-many mapping relating papers and authors. For other fields in the Paper or Author models, I can add a search_fields = [‘title’] and make it easy to search the titles in the UI. Doing the same for the author list (a many-to-many field) does nothing. The semantics aren’t clear, but there doesn’t seem to be an easy way to add this functionality. The other issue is the manager class which is actually returned for member accesses to that field of the Paper object; it masquerades as a list, but isn’t one. It has an iterator, and can be used normally in that way, but most of the standard list uses don’t apply. In my case, I wanted to use string.join() to concatenate the author names in a listing. I can accumulate a string with a for loop, but that’s not the most natural or idiomatic way to express that action. Little things like that.

Please, if it seems like I’m doing something retarded, or missing some part of the documentation on Django’s many-to-many relations, please tell me. Also, as a final note, it’s probably worth mentioning that the version of Django in Debian and Ubuntu’s repositories is slightly older than the version the main tutorial is for. I followed a link from that tutorial to the tutorial for the previous release.

Working with Large ML Code Bases

11 02 2008

It’s interesting how tool prevalence and standardization (or lack thereof) for a programming language are very strong indicators of how many people have used it to build significant projects.

My honors thesis has me hacking up ML implementations – I started with OCaml, but later switched to MLkit after realizing that it already had support for some features I was working to implement in OCaml. As you might imagine, the source for a compiler and runtime for reasonably mature languages (the ML family of languages started in the mid-70s) is a reasonably hefty chunk of code. The layouts of the MLkit source tree is fairly clear (there is a conveniently-named “Compiler” directory, containing such subdirectories as “Backend/X86/”). The layout of the OCaml source is slightly less intuitive, but it comes with an informative README which points out where various parts of the system reside. Yet even within these more focused segments of the source trees, there is a lot of code. And it’s not boilerplate – when you need functions and types for spitting out most of the relevant instructions on the x86 architecture, or handling every abstract syntax tree node of a very expressive language, you’re going to have a lot of code. But there are no good tools for managing a code base of this size in any ML descendant.

I ran into this to some degree with the class project for my software engineering class last year (CS190 – 2007’s page is missing… I need to look into that). We wrote a frontend to GDB using OCaml. The idea was to display separate, independently-controlled panes for each thread of a multithreaded program (obviously, this didn’t scale well past, oh, 5 threads). It worked reasonably well at the end of the semester, but not well enough that we would ever release it. There were a few times, in a project which totaled only a few hundred lines of code, that I wanted some mechanized help working through other modules quickly to find what I needed. Several times I resorted to emailing a peer – hardly a terrible horror, but not something which should have been necessary for simply finding a function in a small code base. The only tools we had for managing the code base were…. Subversion, and VIm.

MLkit is nearly 240K lines of code in over 1000 files (including comments, but as it is primarily a research project, those are few and far between, plus a few odd linebreaks). Even cutting it down to just the compiler (ignoring its dependencies elsewhere in the tree) we’re looking at almost 56K lines of code in over 150 files.

C, C++ and Java programmers have a plethora of tools at their disposal for working with code bases of this size. When I was at Sun this past summer, between cscope and having OpenGrok set up on the kernel code, finding definitions, declarations, types, etc. was usually a breeze. Both have their limitations, and their bugs, but they’re still great. And that’s not even using a lot of the IDE support present in major IDEs like Visual Studio or Eclipse.

ML programmers have no reliable systems which I can find.

A quick search for OCaml IDEs yields a few results. A page listing a few of what look to be mostly-dead OCaml plugins for Eclipse. Of these ODT has apparently seen activity this year, but seems to be mostly syntax highlighting and a build tool. A from-scratch IDE, Camelia, built by my friend Nate when we TAed CS017 (now unfortunately renamed CSCI0170, thanks to the abomination that is Sungard‘s Banner…). Searching for any of OCaml, SML, ML, or Haskell with “cscope” on Google yields nothing of use. Jane Street Capital uses OCaml. I wonder what tools they use.

Oddly enough, it seems the winner in this case may be the F# plugin for Visual Studio. It has support for following references around large code bases, and is actively maintained (as MS intends it to be a mainstream .NET language in the near future). Unfortunately, it can also only deal with F# files, which are close to, but not quite close enough to Standard ML files….

Perhaps I’ll build a cscope-alike for ML myself.

After I finish my thesis.

EDIT: A commenter on reddit suggested taking a look at ocamlbrowser. It seems to be at least mostly what I’m looking for (for OCaml), though I can’t seem to get it to do anything other than display the types for functions in the standard module library, so I can’t say for sure. There’s a setting to change the module search path, but it doesn’t seem to change anything for me. I also find it odd that despite the fact that it is included with the OCaml base distribution (at least in Debian), no web search for any query I can think of which expresses the sort of task ocamlbrowser is supposed to facilitate yields any results referencing it. This suggests that very few people use it – maybe I’m not the only one who can’t get it to work on an arbitrary code base easily. Might play with it some more once I get some more time.

EDIT2: Another reddit commenter mentioned otags (in the vein of ctags), which looks good.  If only I was still working in OCaml :-p On the other hand, looking up the link for ctags made me look at that again (I had foolishly assumed that it, like cscope, supported only C and C++).  In particular, apparently someone reimplemented ctags for more language support (and called it Exuberant Ctags).  The language support list claims that it in fact works on SML!  A download and compile later, I’m quite the happy camper!  It works just like the standard ctags.  Many thanks to zem on reddit for inspiring a very fortuitous Google search.