March 3, 2017

Solutions without problems

There are so many good ideas floating around in the bioinformatics open-source world that its hard to know what to do about them all. Sometimes it feels like almost every day yet another paper has been published on some fantastic piece of software that outclasses everything that's gone before.

Trouble is, most of them are fairly limited in their application, making their audience also a limited one. There's very few globally useful innovative toolkits that can be applied to a wide variety of situations without needing any major tweaking. Examples would include Taverna, the Ensembl APIs, BioMart, and the Bio* programming toolkits. Trying to work out which ones are more generally useful and which ones are very niche is a hard task - papers on niche applications tend to get presented optimistically but inaccurately in a more general way, whilst papers on general applications tend to focus on one area and seem to overlook the general case. But that's what we're paid to do - to sort out and identify the most appropriate tools for the task - so we mustn't complain!

More of an issue though is the worrying amount of publications and presentations on pieces of software that appear to solve a problem that has never been demonstrated. Sure, the problem might exist in theory, but without a clearly defined path from real-life problem to design to solution how can authors hope to convince people of the efficacy of their efforts? It seems it's easier to try and imagine a hypothetical problem, solve it, then try and find a real-life problem to apply it to afterwards to try and justify all the effort that's gone before.

This isn't an efficient or useful use of the limited resources that are available to bioinformatics research. If more time was spent researching real problems and proving they exist before a single line of code was written in an attempt to solve them, then the general standard, consistent quality and usefulness of the software generated as a result would be massively improved.

Topics: Big data technology, Bioinformatics, bioinformatics problems, Open source