March 3, 2017

Bioinfo-C and the merits of open-sourcing

I write this post in two simultaneous states of mind. On the one hand, I know the author of the new Bioinfo-C project and am happy that he and his employer have chosen to open up their work in this way. On the other, I find myself wondering if he knows just what is required to make it succeed, and if he has really thought about the implications. I wish him well with this project, so here's a few tips for him and others considering doing the same in future:

  1. Ask yourself - why are you open-sourcing this? The answer is probably because you want others to benefit from the effort you have put in to developing the code, so that they do not need to do it again themselves. This is a noble goal to be pursued but if your answer is anything else, don't bother. In particular this applies if you're doing all this in the hope that someone out there will contribute a great idea that will make your code so much better in ways you couldn't currently imagine or achieve. Such an innovation is as rare as hen's teeth.
  2. Is it documented? Would a complete novice who has never seen your stuff before be able to tell what it is supposed to be doing and how to use it? Many programmers think their code is obvious and can be interpreted merely be reading the source code. Believe me, it is not. There is no such thing as too much documentation - both for the end-user and the developer.
  3. If this is existing code, not a new project, how closely tied to your other existing (non-open-source) systems is it? Can it easily be extracted and run in an entirely different environment, different filing systems, different paths, different versions of supporting libraries, without jumping through hoops or following a complex list of actions? If not, don't open-source it.
  4. Is it a core library? Core to who exactly? It might be incredibly useful to you internally, but that doesn't mean anyone else will need it (or if they do, not necessarily in the form you created it in). You might be surprised how much of your code was created in response to specific internal ways of doing things rather than a common need across the industry.
  5. Can you support it? Open-source is much more than just free code to download. To succeed and to reach its target audience a project must actively engage with the world. There must be community forums, high quality documentation, updates via a reliable and predictable release cycle that takes into account community feedback, mechanisms for accepting and vetting code contributions, managing development priorities for future releases, running a website, organising the occasional hackathon, presenting at conferences, responding to support requests, and the list goes on. If you and your project team can't do all of this, then your project will never get the exposure it requires to become a success.
  6. Similarly, make it easy for people to get your code using modern repository tools such as Git or Sourceforge. A library that is a simple download off an independent website does not encourage an expectation of reliability or longevity in the end-user.
  7. If there is a commercial motive behind your project, be very transparent and clear about the boundaries and purpose of the commerical involvement and what the benefit to the wider community is about participating and contributing regardless.
  8. If you do accept feedback or code contributions, who will own the improved code? Your licence and copyright statements need to be extremely clear on this front.

That's all from me this week. Summer is generally quiet on the bioinformatics front which in turn means there is much less for me to rant about...

Topics: Bioinformatics, Bioinformatics, commercial, community, copyright, dependencies, documentation, features, licence, open, requests, reusable, source, support, tips