Problems with C and C++ Separate Compilation

11 11 2008

After graduation, a couple months of watching television, driving cross-country (if you get the chance you should drive across northern Wyoming), settling in at Microsoft and living in Seattle, I’m back.  And I’m annoyed at C.

C is a fantastic language in many ways.  It is essentially an abstract assembly language.  Almost any general-purpose operation which can be done in assembly can be done in C, and it makes building large, relatively portable systems much easier.  The only things which can’t be done directly in C are operations on specific registers (and it’s easy enough to link in short assembly routines when that’s necessary.

Most of my early interest in programming languages and most of my problems when I first started doing systems work were related to basic typing issues: the ugliness of casting things to void pointers and back, the conversions between various integer types, and other relatively mundane C errors which are easy to make and hard to debug.  I came to believe that additional features other than type system and memory safety improvements in other languages, while extremely useful, were mostly great conveniences rather than fundamental improvements.

But the past several months have changed my mind.  While the ease of turning a pointer to one type of object into a pointer to another type in the same place is certainly a bane as often as it is a boon, a reasonably experienced C programmer begins to recognize common symptoms for such problems.  A more serious, though less frequently encountered problem has to do with type identity and versioning.

Consider the case where you write an application (in C) to use an external library.  Your application interfaces with this library through two means: #include-ing its public header, and being linked to the library’s object file.  Initially these two interfaces will probably be fine (if, for example, you just installed this library).  Now move forward a couple months.  Update your library.  Did your update include the object file and the header file?  If not then any of sizes or layout changes to the library’s data types might cause non-obvious errors; your application will happily compile and link, but the results you get back from the library may not be what you expect.

What if it’s your library, or just an object file in your project?  These tend to have a fair amount of turnover.  Most moderately-sized projects use separate compilation to separate code changes and avoid recompiling the same code repeatedly if it doesn’t change.  But when tying these object files together, there are no checks to ensure that data structures exchanged between object files are consistent; the C compilation model assumes that your data structure definitions are stable, or you recompile from scratch every time.  It also makes the reasonable assumption that the same compiler is used for every object file.  On the off chance you violate that expectation (perhaps with a compiler update), memory layouts of the same structure definition may differ between object files.

It’s possible to work around this problem with a build system if you track every header file dependency explicitly.  For large projects, this can be difficult.  Especially with fast-moving projects, it’s easy to add an include to a .c file without remembering to add the dependency to the build system configuration.  Once this missing dependency goes unnoticed for some time it becomes considerably more difficult to track down, and developers end up either spending their time debugging the build system or resorting to rebuilding from scratch every time in favor of the broken incremental build.

Another permutation of the same problem is that of unrelated structures with the same name.  It’s easy to imagine a large system with two subsystems defining structures named CALLBACK_ARGS.  What happens when one section of code needs to interact with both of these systems?  If all appropriate headers are included, then the name collision will be detected.  If only one of the conflicting headers is included, then depending on how the headers are organized it becomes trivially easy to pass the wrong structure to a function.  Especially when working on a new system, it usually seems reasonable to assume that structures of the same name are the same semantic (and in-memory) structure.

Namespaces can help alleviate the same-name problem: including only one structure’s header and trying to pass that to another function will result in an error complaining about passing an argument of type Subsystem1::CALLBACK_ARGS* to a function expecting a Subsystem2::CALLBACK_ARGS*.  This doesn’t actually prevent you from declaring two structures of the same name in the same namespace in separate header files, but if namespaces are used judiciously to separate subsystems then the likelihood of doing so accidentally is greatly reduced.

The versioning problem is a direct result of how #include works in C.  Rather than being a direct part of the language, #include is a preprocessor directive equivalent to “take the text of the specified file and pretend I typed it in place right here, then pass that result to the actual compiler.”  At its core most C compilers only handle single files at a time, so they don’t actually know anything about other object files (or at least, they don’t directly use information about other object files).  That’s the linker’s job, and the linker knows nothing about structures per se – only matching symbolic references.

One solution is to store all structure layout information in object files, and generate code for accessing those structures once at link time.  This slows the linking process, but prevents the mismatched definition problem; all code for accessing the structure is generated at the same time from the same definition.  This blurs the distinction between compiler and linker, but adds great value.

Doing this at compile time for static linking is relatively cheap and straightforward.  Doing this at load-link time is a bit trickier.  While compilers and static linkers can play any tricks they want for code which only interacts directly with itself, dynamically linked executable formats must be defined in standard ways, limiting what can be done.  I don’t know of any major executable formats which support this (most were designed in the heyday of C and C++, when they were still the best languages around), but that is a matter of format standards rather than a technical limitation.  This would be more expensive than current dynamic linking, but doable.   A compiler could choose to use a richer format for its own object files and then resort to standard formats when asked to generate a standard library or executable.  OCaml does this; for a Test.cmx and Mod.cmx compiled to objects using differing interface files for a Test module data structure:

Yggdrasil:caml colin$ ocamlopt Test.cmx Mod.cmx
Files Mod.cmx and Test.cmx make inconsistent assumptions over interface Test
Yggdrasil:caml colin$ 

Unfortunately C and C++ have a compilation and linking model which is now so well-established that I suspect any proposal to fix this in the standards for those languages would likely meet with significant resistance.  Though at the same time, I can’t think of any desired C\C++ semantics that this would break, so maybe it could happen.





Giving up on Ruby on Rails for Python and Django

24 02 2008

Well, a while back I wrote about starting to use Ruby on Rails to build myself a web application to track technical papers which I had read or planned to read. I gave up on RoR after a (very short) time period. Now, despite preferring to code in Ruby over Python, I’ve switched to Django. Why?

  • While I am capable of writing SQL myself, I really, really loathe it.
  • The Django documentation is easier to find/deal with
  • Django seems to do more busywork for you than Rails does.
  • It’s easy to poke around in the intermediate stages of development.

One of the pain points of RoR is the fact that you still have to deal directly with your database engine (likely, as in my case, MySQL). I’ve learned to deal with SQL after working on Mocha for a while. That doesn’t mean I like it. In fact, it’s still a pain. And the worst part of dealing with SQL is setting up tables, which is in fact the only database work Rails requires you to do – the worst part. It then does some magic to connect your ruby to the schema you’ve specified. Django lets you define the data model directly in Python. From there, it automatically creates the SQL tables, including creating join tables for many-to-many relationships, which is where I hit the main roadblock with Rails; I knew how to define the tables for the many-to-many relationship, but for whatever reason couldn’t easily find information on making a many-to-many relationship explicit in the Ruby model. The only SQL I had to do with Django was enough to create a new database, and ‘CREATE DATABASE whatever’ isn’t much to do, even if you’ve never used SQL directly before.

The many-to-many relationships documentation for Django is still a bit confusing, but clear enough that you can play with things, and unlike RoR, I could find it. The RoR documentation seems to consist primarily of the class API docs (only useful once you already mostly know what you’re doing) and somewhat mediocre 3rd-party tutorials. The Django documentation is actually organized as a manual. Overall the Django documentation is clear, and the initial tutorial on the Django site is incredibly useful – I’ve built everything thus far simply following along with the tutorial and tweaking things as needed – following links to more detailed documentation on what data types are available, using many-to-many instead of many-to-one, etc. Django also creates far fewer files, and almost all of them have an immediately obvious purpose. There’s less magic in the system, without making you do more work. It also makes setting up an administrative interface much easier, and includes a module for easy authentication setup for your application in the default install.

This is not to say that picking up Django has been nothing but butterflies and ponies; I’ve run into a few rough edges.

One design decision in Django that I’m not sure I’m happy with (though I’m not unhappy with it either) is the decision to orient it towards a “newsroom model” as the tutorial describes it. This means that the architecture was designed with the intention that there were distinct sets of users: publishers and readers. Django includes a big module (the one just mentioned above) for managing users and creating an administrative interface (‘django.contrib.admin’) and the documentation suggests its use. While you’re certainly free not to use it, its presence encourages its use, thereby encouraging the same model the developers had in mind. This isn’t necessarily a bad thing. I’m not sure it makes a difference to me, though it definitely encourages a split interface between the two sides of the application, which might not be right for all cases.

The real rough edges I’ve been rubbing up against are mostly regarding the many-to-many relationship. While I wasn’t expecting everything to be perfect, the disparity in easy flexibility between normal object fields and many-to-many fields is surprising. For example, the administrative interface Django suggests includes support for a lot of easy UI additions, such as searching on fields. This works great for standard fields, and not at all for many-to-many fields. In my case I have a many-to-many mapping relating papers and authors. For other fields in the Paper or Author models, I can add a search_fields = ['title'] and make it easy to search the titles in the UI. Doing the same for the author list (a many-to-many field) does nothing. The semantics aren’t clear, but there doesn’t seem to be an easy way to add this functionality. The other issue is the manager class which is actually returned for member accesses to that field of the Paper object; it masquerades as a list, but isn’t one. It has an iterator, and can be used normally in that way, but most of the standard list uses don’t apply. In my case, I wanted to use string.join() to concatenate the author names in a listing. I can accumulate a string with a for loop, but that’s not the most natural or idiomatic way to express that action. Little things like that.

Please, if it seems like I’m doing something retarded, or missing some part of the documentation on Django’s many-to-many relations, please tell me. Also, as a final note, it’s probably worth mentioning that the version of Django in Debian and Ubuntu’s repositories is slightly older than the version the main tutorial is for. I followed a link from that tutorial to the tutorial for the previous release.





Working with Large ML Code Bases

11 02 2008

It’s interesting how tool prevalence and standardization (or lack thereof) for a programming language are very strong indicators of how many people have used it to build significant projects.

My honors thesis has me hacking up ML implementations – I started with OCaml, but later switched to MLkit after realizing that it already had support for some features I was working to implement in OCaml. As you might imagine, the source for a compiler and runtime for reasonably mature languages (the ML family of languages started in the mid-70s) is a reasonably hefty chunk of code. The layouts of the MLkit source tree is fairly clear (there is a conveniently-named “Compiler” directory, containing such subdirectories as “Backend/X86/”). The layout of the OCaml source is slightly less intuitive, but it comes with an informative README which points out where various parts of the system reside. Yet even within these more focused segments of the source trees, there is a lot of code. And it’s not boilerplate – when you need functions and types for spitting out most of the relevant instructions on the x86 architecture, or handling every abstract syntax tree node of a very expressive language, you’re going to have a lot of code. But there are no good tools for managing a code base of this size in any ML descendant.

I ran into this to some degree with the class project for my software engineering class last year (CS190 – 2007’s page is missing… I need to look into that). We wrote a frontend to GDB using OCaml. The idea was to display separate, independently-controlled panes for each thread of a multithreaded program (obviously, this didn’t scale well past, oh, 5 threads). It worked reasonably well at the end of the semester, but not well enough that we would ever release it. There were a few times, in a project which totaled only a few hundred lines of code, that I wanted some mechanized help working through other modules quickly to find what I needed. Several times I resorted to emailing a peer – hardly a terrible horror, but not something which should have been necessary for simply finding a function in a small code base. The only tools we had for managing the code base were…. Subversion, and VIm.

MLkit is nearly 240K lines of code in over 1000 files (including comments, but as it is primarily a research project, those are few and far between, plus a few odd linebreaks). Even cutting it down to just the compiler (ignoring its dependencies elsewhere in the tree) we’re looking at almost 56K lines of code in over 150 files.

C, C++ and Java programmers have a plethora of tools at their disposal for working with code bases of this size. When I was at Sun this past summer, between cscope and having OpenGrok set up on the kernel code, finding definitions, declarations, types, etc. was usually a breeze. Both have their limitations, and their bugs, but they’re still great. And that’s not even using a lot of the IDE support present in major IDEs like Visual Studio or Eclipse.

ML programmers have no reliable systems which I can find.

A quick search for OCaml IDEs yields a few results. A page listing a few of what look to be mostly-dead OCaml plugins for Eclipse. Of these ODT has apparently seen activity this year, but seems to be mostly syntax highlighting and a build tool. A from-scratch IDE, Camelia, built by my friend Nate when we TAed CS017 (now unfortunately renamed CSCI0170, thanks to the abomination that is Sungard‘s Banner…). Searching for any of OCaml, SML, ML, or Haskell with “cscope” on Google yields nothing of use. Jane Street Capital uses OCaml. I wonder what tools they use.

Oddly enough, it seems the winner in this case may be the F# plugin for Visual Studio. It has support for following references around large code bases, and is actively maintained (as MS intends it to be a mainstream .NET language in the near future). Unfortunately, it can also only deal with F# files, which are close to, but not quite close enough to Standard ML files….

Perhaps I’ll build a cscope-alike for ML myself.

After I finish my thesis.

EDIT: A commenter on reddit suggested taking a look at ocamlbrowser. It seems to be at least mostly what I’m looking for (for OCaml), though I can’t seem to get it to do anything other than display the types for functions in the standard module library, so I can’t say for sure. There’s a setting to change the module search path, but it doesn’t seem to change anything for me. I also find it odd that despite the fact that it is included with the OCaml base distribution (at least in Debian), no web search for any query I can think of which expresses the sort of task ocamlbrowser is supposed to facilitate yields any results referencing it. This suggests that very few people use it – maybe I’m not the only one who can’t get it to work on an arbitrary code base easily. Might play with it some more once I get some more time.

EDIT2: Another reddit commenter mentioned otags (in the vein of ctags), which looks good.  If only I was still working in OCaml :-p On the other hand, looking up the link for ctags made me look at that again (I had foolishly assumed that it, like cscope, supported only C and C++).  In particular, apparently someone reimplemented ctags for more language support (and called it Exuberant Ctags).  The language support list claims that it in fact works on SML!  A download and compile later, I’m quite the happy camper!  It works just like the standard ctags.  Many thanks to zem on reddit for inspiring a very fortuitous Google search.





Variant Interfaces

1 01 2008

Today I finished my first pass reading of Colimits for Concurrent Collectors, and was struck with an interesting idea. The paper is a demonstration of an approach to refine specifications using the notion of colimits. I don’t fully understand the refinement process yet (hence the “first pass reading”). The paper contains many examples specifications written in a variant of Specware, to specify a safe interface between a mutator and garbage collector without using stop-the-world semantics – this is the case study for the refinement.

An interesting pattern of sorts is used. A number of the specifications for portions of the system are parameterized by other specifications. A common idiom in the examples are lines similar to:

MORPHISM Cleaning-View = ONLY nodes,roots,free,marked,workset,unmark

What this line means is that the current specification, when it refers to an instance of the Cleaning-View specification, can only access the mentioned observers. This is interesting if we translate it to interfaces in an object-oriented environment. What this amounts to is essentially defining an interface (here, a specification) and then multiple subsets of this interface (here, imported subsets of observer functions). One application of this (and I think the most interesting) is using the sub-interfaces as restricted capabilities on objects. This is a reasonably common use of interfaces in OO languages.

You can already mimic this in normal object oriented languages by defining minimal subsets, and larger subsets as extending the smaller interfaces. But what about keeping siblings with overlap in sync? A nice alternative would be to be able to specify all of the siblings together, as a set of variant interfaces (similar to the notion of variant types). For example, in psuedo-Java:

public interface Primary {
    void func1();
    void func2(int);
    void func3(double);

    subset Takeargs {func2, func3};
    subset Noargs {func1};
    subset Nofloatingpoint {func1, func2}
}

This example would of course need to be cleaned up a bit to deal with method overloading.

The problem with doing the equivalent of this in normal Java is keeping these “sub-interfaces” in sync with the main interface, and keeping overlapping siblings in sync without just having an arbitrarily large hierarchy of interfaces. Allowing joint specifications, and then referring to variants through names like Primary.Takeargs (to say that a variable should be of type Primary, but the current scope should only have access to the Takeargs subset) would ease this difficulty, making the use of interfaces for restricted control slightly easier to deal with as a programmer.

I also wonder what could be done with this as an extension to a language such as Spec#, with pre- and post-conditions on methods. You could then deduce that objects using only a restricted subset of an interface might maintain certain invariants (such as making no modifications to an object), more easily than before. More importantly, you could guarantee it statically at compile time – the interface variants would be subtypes of the main interface type. Without the explicit sub- and sibling-interface relationships it would still be possible to statically verify properties of interest, as the compiler can see what methods are called on various objects, but this approach lets the programmer encode this restriction easily.

Ultimately this is just syntactic sugar for a construct which can already be used, but it encourages the programmer to make the type checker do some more verification work for the programmer. It’s also sort of a static version of something similar to the E caretaker for capability delegation or more generally the object capability model(lacking revocation, because all of this happens at compile time). I should read some of the EROS and Coyotos papers…

It would be interesting to hack up a prototype implementation of this, but I don’t feel like mucking about right in the internals of another compiler (thesis work…), and unfortunately I don’t think any of the good meta-programming languages are really appropriate for this. Scheme is dynamically-typed, Ruby is duck-typed (so really, dynamically typed), and you need static typing to get the programming benefits of this. Perhaps it’s finally time for me to learn MetaOCaml.





Typing the Stack, and Why Tag-Free Garbage Collection Doesn’t Quite Work

28 11 2007

One of the most significant barriers to implementing a completely safe garbage collector lies with collecting the stack. Heap datastructures are relatively easy to walk — that is to say, that one can generate a walk function for each variant of heap datastructures, etc., in the language being collected. The stack is more difficult because it is not an explicit structure in the language. The other work I’ve read towards this (described in the post linked above) made rather awkward contortions to get around this fact. Wang and Appel CPS’ed and converted closures for their entire programs. Wang and Fluet side-step the issue by writing a safe collector for the internals of an interpreter – which isn’t really quite garbage collection in the normal sense, since they just have a Scheme stack structure defined in Cyclone which they then traverse. The harder problem is to actually type the machine-native stack.

  • The Essence of Compiling with Continuations: Towards this, my partners and I read the work defining A-Normal Form, or ANF. The authors examined a number of CPS based compilers to see what work they actually performed (many compilers use CPS as an intermediate representation to support some transformations). They found that to mitigate the code expansion of CPS transformations, and to get rid of some of the execution overhead from relatively trivial continuations, the compilers performed a number of additional reductions, pulling the continuation out of argument lists and leaving it implicit in a register, and a few other tricks. Adding these transformations together, it turns out that the sum total of these actions is essentially CPSing the source program, performing the normal reductions, and then un-CPSing the program. The authors then present an equivalent transformation directly on the source language, which is simpler to implement. This essentially accomplishes the same thing as using SSA – giving a name to every intermediate result of the computation.

This may help with typing the stack – we might be able to generate each frame’s type from the ANF of the program. Obviously some more work needs to be done here; how this might be done isn’t immediately obvious. I complained that we would have to mark the stack frames in some way, and to suggest this was not as terrible as I thought, our advisor suggested reading a couple papers by Benjamin Goldberg which pretty much wrapped up work on garbage collection without tags. In a nutshell, it really only works for languages lacking polymorphic operators.

  • Tag-Free Garbage Collection for Strongly Typed Programming Languages: This paper describes a way to sidestep tagging pointers on the stack into the heap (the notion was mentioned earlier by Appel, but implementation details were omitted). It does this basically by embedding frame-specific garbage collection routine pointers right after function calls. It then changes compilation so that returning from a routine goes an extra instruction later. Goldberg says that this works fine because the retl instruction on SPARC is only a pseudo-instruction which compiles to a jump with a hard-coded offset – I wonder how portable this is. This of course only works in a straightforward way for monomorphically typed languages. To extend the scheme to support polymorphic functions, they begin passing type-specific traversal functions to the frame-specific collection functions. This is then extended yet again to support higher-order functions – the overall strategy is to stash location-specific pointers in select locations to provide ready access to correctly-typed collector procedures in an efficient manner. There is also an addendum at the end about implementing collection for parallel languages. One interesting idea presented which isn’t explored in much depth is that there can be different collection procedures for different stages of each procedure. This allows for optimization such as not traversing uninitialized pointers early in the procedure, or not traversing dead local variables during collections after their last access in the function.
  • Polymorphic Type Reconstruction for Garbage Collection without Tags: This second paper revisits much of the content of the first paper, more clearly presenting the type reconstruction algorithm. It then addresses the fact that it is possible to have objects reachable from the garbage collection roots for which it is impossible to reconstruct the type, and proves that in the absence of polymorphic primitive operators, it is possible to reconstruct the value of any useful reachable object, because any object which matters will be passed to a primitive operation at some point, and types can be reconstructed from there. He later touches briefly on the issue of having a polymorphic equality operator (or any polymorphic primitive), and the problem with the operator needing to use data of the objects to do its job, without knowing the type, and mentions offhand that this is dealt with by implicitly passing a method dictionary to the operator, which amounts to a tag. This is still tag-free in the sense that there are no extra bits used up in the pointer, but not fully tag-free because the compiler still needs to pass explicit runtime hints about types.




STM and Haskell Attention

12 08 2007

Haskell has been garnering a fair bit of attention of late, largely as a result of Simon Peyton-Jonespresence at OSCON (and the ensuing flurry of Reddit postings of related blog posts). He gave a keynote on Software Transactional Memory (STM), which is an alternative to using explicit locks and condition variables in writing parallel programs. He also gave an introduction to Haskell, and a smaller talk on data parallelism.

I’m quite pleased by this recent attention, for a couple reasons.

The first is that I really agree with Simon that STM is going to be the way to go soon*. I also just think it’s damn cool – that’s why I did an independent study on it with Maurice Herlihy last semester, working with a slightly more recent version of Microsoft’s SXM implementation of STM for C# than that available at the link I just gave. Simon gives excellent motivation for STM in his talk, but there are still others if you’re unconvinced after watching it. Consider a datastructure which you traverse in multiple directions – perhaps a grid, which can be traversed in either direction in either dimension depending on the contents at a given location – or perhaps a mutable graph representation. What locking scheme do you use for scalable (read: not coarse-grained) access to a large instance of this datastructure? You could establish a total ordering of nodes and corresponding locks, or do a row/column at a time with an ordering on rows/columns in a table, and acquire locks only in that order. But since you can traverse in any direction, this would sometimes require building lists of locks to acquire, releasing later locks which were acquired in reaching the current point of traversal, and then walking the list reacquiring locks. How expensive is this, possibly acquiring the same lock over and over? How hard is it to write that code correctly? How much effort does it take to document, and for a new maintainer to learn, this locking scheme? All of this makes fine-grained locking a poor idea at best.

How much easier would it be to write such code naïvely and mark it as atomic? And with a proper implementation of STM, how much faster could this be on a large structure than a coarse-grained lock, or repeated acquisition and release of the same fine-grained locks? And this is a bonus on top easier coding for simple concurrent structures (and partial failures), and composition of atomic transactions, which cannot be done with normal locks.

If, like me, you’re curious about how STM is implemented, I suggest you check out a few papers. Understanding of all of these benefits from understanding the basic ideas and terminology of concurrency, as well as a bit of computer architecture knowledge.

There are certainly many, many more implementations, and many more papers, and more ways to tune an implementation than these couple papers spend much time on. Check out the related work in these papers.

I’m also pleased, because I feel Haskell has been underrated by the programming populus at large for a long time.** I’m interested in becoming more familiar with Haskell largely because of some of my pet peeves with OCaml, and because Haskell’s type system particulars are both richer and closer to category theory, which interests me mainly as an intellectual curiosity (and for its relation to type theory in general). Well written Haskell can also beat out many more well-known languages performance-wise (example, definitely poke around the language shootout). And it’s always nice to see a real functional language (read: one in which functional style is the norm, not just an option) getting mainstream attention.

I can only hope that people continue to take note of Haskell.


* Please note, those of you who are itching to start a flamewar about shared state vs. message passing, that STM does not claim to be the be-all and end-all of concurrent programming. It is designed to make programming with shared state easier than it is today. And shared state is not going away, because it’s the model with which chips are currently designed. So at the very least, operating systems, compilers, and some runtime systems will still need to deal with it, even if all other development switches to message passing concurrency.

** I should probably mention that I have not used Haskell a great deal, though this is not for lack of desire. A couple years ago I wrote a short assignment in Haskell (largely to play with lazy evaluation). I’ve also spent a fair amount of time writing OCaml over the past couple years, for the course I TAed and head TAed which uses OCaml to introduce students to currying and static typing (following an introduction to CS with Scheme), and last semester in a group software engineering course (perhaps I’ll elaborate on this experience another time). So I’m familiar with a similar type system.





Alternative Languages on .NET and the JVM

3 08 2007

The other day I stumbled across this link on Reddit about the formation a coherent project (the JVM Language Runtime)for implementors of alternative (non-Java) languages on the JVM to get useful information and code for using the JVM. The article points out something I’d taken note of before, which is the difference in Microsoft and Sun’s attitudes towards third parties implementing languages on their virtual machines. Really, until relatively recently, Sun did not really care about such projects. They did nothing to actively stop them, but little if anything to assist them. They really only started helping out (and hiring) members of projects like JRuby after Microsoft took the lead! Microsoft, almost as soon as projects like IronPython appeared, started actively helping them and hiring their developers – what better way to draw programmers to your platform than let them use any language they want? On the other hand, Sun also spent many years trying to push Java Java Java, while Microsoft’s developer tools have always included a variety of languages, most notably Visual Basic and Visual C++ in the pre-.NET era. So perhaps Microsoft just planned support for things like this from the start, because they already needed/wanted support for multiple languages on the same runtime – really, C#, VB.NET, ASP.NET, and a number of other .NET upgrades to major Microsoft product lines ran and inter-operated on the CLR right from the get-go.

For some reason it took Sun some time to see the light on this, and it’s one of the things for which I admire Microsoft (don’t get me wrong – I have all sorts of gripes with other things they do and have done). Mainly that, and the fact that they really spend a lot of money on pure, corporate-hands-off research. Sun also does this, though not as much, with good reason. Sun is doing much better since the pain of the dot-com bubble burst, but still has a ways to go to return to it’s previous state.

Really, I’m just looking forward to the time that I can write a program in Python or Ruby, have it run normally on any *nix (BSDs, OS X, Solaris, Linux, etc.), and run correctly on the JVM or .NET (user’s choice) on Windows through IronPython, Jython (which last I heard was semi-dead), JRuby, and IronRuby. I suppose I’ll have to wait for one of the proper native Ruby VMs for Ruby to be adequately fast – I’m rooting for Rubinius.

See also Microsoft’s framework for webapps, which from what I’ve seen is basically a .NET-esque multi-language Flash competitor (currently Windows and OS X versions): Silverlight.








Follow

Get every new post delivered to your Inbox.