Covering the bioinformatics niche and much more

The Current State of Bioinformatics Software

| Comments

I have been planning to write about the current state of bioinformatics software development for quite sometime. Not something scientific, but something that I see along the days and weeks in the lab.

This post on Flags and Lollipops showed the current state of availability of bioinformatics software published in the past years. The post itself says it is not scientific, but give (along with the comments) an excellent base to what I see daily.

In no particular order, in my opinion there are three classes of bioinformatics software developed in academia:

  • for-publication programs – these are the ones developed with the sole purpose of publication. They are poorly and not actively maintained, sometimes (a small percentage) lack good documentation and usually perform one type of analysis that is available in many different packages

  • closed source programs – these are the relatively well maintained packages, with good or sufficient documentation. The point here is the fact that they have source not available, and in some cases they require some extra installation in order to work, not mentioning the fact that you sometimes have to fax the license agreement in order to obtain it. If you work in a close system (my case) with no admin access, it might take a week or two to have such package installed, counting the time to send the fax, to receive the application and to ask the sys admin to install it and its requirements

  • and finally open source packages – these fall in a 50/50 tie regarding maintenance: they can bell well maintained or basically abandonware. The well maintained cluster is usually well documented, have a very active group of developers and users receive good support. The abandonware group in some cases are for-publication and in other cases they are abandoned because the grad student/post-doc left the lab/job and now has to pay the bills with a day job that does not give him/her time to code.

I have developed and released software in two of the three categories above. Myself, I don’t like to release closed source programs, but sometimes the institution where you are working at make you do that. All my extra-lab software is open source, or at least you can ask me to get the source.

Regarding the for-publication software, this is the most pernicious category. Sometimes they are a by-product of some larger research, sometimes they are just bad software that replicate another widely used and available package, sometimes a tool that has not scientific merit. I respect large and small projects that release a very well programmed application, with a good documentation and thrives to give support to the users. Projects that clearly show that they had project, development and testing phases along the way, and actually enhances our scientific knowledge or make scientific analysis more straight forward. But lately, I have seen greasemonkey scripts being published, what can be compared to the MS Excel macro published some years ago in Bioinformatics.

As always, just my 2 cents.