March 3, 2017

Open source vs. open development

At the 2nd day of BOSC 2010, Ross Gardler of the Apache Software Foundation gave a keynote which triggered no end of discussions afterwards about the difference between sharing your project's source code, and sharing its development too.

The assumption commonly made when hearing the term 'open source' is that the software project is collaboratively developed and shared freely with whoever wants to see it (some licences prohibit commercial use or abstraction, but in general anybody who wants to can look at the code and modify it for their private non-commercial use). They imagine a world where anyone can join the developer community and submit code and patches and other resources, and contribute to the roadmap of that project. Sometimes the new joiner may not have commit access until their credentials are proven, but the project is generally community based and community driven.

In the bioinformatics world this is often not the case. Larger projects generated by universities freely share their source code, but they provide no community mechanisms for contributing to it or helping guide its direction. Sure you can email in a patch or suggestion and hope that it gets applied but there is no guarantee this will happen. The only way to get your voice heard is to either know the right people within the developer team, or to establish a formal collaboration between your institutes so that you can co-develop the project together.

Universities do this because they need to keep control of what their research budget is spent on. Fair enough really because those budgets are limited and intended for specific goals, and if the university gets distracted by goals voted for by the community that do not align with the original research intentions outlined in the funding proposal then this can be a major problem. Also the management of widely collaborative development can be time consuming and divert the university's resources away from their primary research goals. The only way to shift the project goals and develop new features is to get grant funding for the new goal, or to formally collaborate with another institute on the project.

So what the universities are often doing then is open sourcing their projects in only the strictest literal sense of the word, in that the source code is open for anyone to see. What they are not doing is open development.

Of course none of the above applies to projects developed without the aid of university grants, and in the developers' own time. They are truly open source and openly developed. The Bio* suite of projects (BioPerl, BioJava, BioRuby, BioPython etc.) is a classic example of how this can work really well. Also a growing number of enlightened university projects are also opening up their development process to the wider community, but it is nowhere near a majority yet.

My point then is that projects need to be very clear what they mean when they claim to be open source. Open source does not always imply open development. Conferences like BOSC insist that a project's licence be declared before it can be considered open source - but maybe there should also be a system of categorising projects according to their community development practices so that it can be clear what response users can expect from the developers should they run into problems or wish to request improvements.

There is one other solution - if a commercial entity is having trouble getting the improvements or feature requests it needs from a university managed project because the code is not openly developed, then vendors like Eagle can act as a third-party go-between. Eagle can then manage the implementation of bugfixes, enhancements and new features in-house without draining university resources, and then deal with the contributions of those improvements to the main project in their own time whilst the commercial user gets the immediate response and development work that they need to continue their research. Eagle in effect builds an open development layer on top of the strict open source model used by so many projects.

Topics: Bioinformatics, Open source