Update: Go to this year's survey
As promised, and without further todo, here are the full results of the survey that Eagle Genomics ran prior to our 1st Annual Symposium on 5th April 2011 at Babraham, on the subject of "Provisioning Bioinformatics for the Next Decade: Are we prepared?".
Let's go through the questions one by one to see how the responses panned out. There were 118 respondents in total.
No surprises there - the majority of respondents were academics and non-profits. This may have skewed some of the subsequent responses, but when we broke down by academic vs. commercial we in fact found very little difference in responses, except in one area which we have detailed below.
Why were most of the respondents academics? Could have been because we heavily promoted this to the London BioGeeks network, whilst commercial outfits are generally more reticent in offering their opinion.
Given that the majority of respondents were academic, you would expect to see a greater number of bioinformaticians in the organisation (light blue = >10). However, those organisations relying on only one bioinformatician were only about a quarter - most either had none, or at least 5. Do bioinformaticians only do things by extremes?
Good to see that the majority of respondents were experienced bioinformaticians with at least 5 years experience, many with 10 or more. This suggests that the responses are based on real experience of the real world as opposed to a perception of it.
Most respondents were sole operators, not surprising if tallied with the earlier response that most of them had no dedicated bioinformaticians in their organisation - this suggests that most of the respondents were postdocs or similar having had bioinformatics tasks delegated to them in addition to their normal duties. Of those that do manage people, most only had 2-4 people under them, suggestive of small academic groups rather than larger commercial hierarchies.
What's hot and what's not? Gene expression, genomic variation, and other genomics activities are the current area of focus. In future there may be a shift towards proteomics, metagenomics, systems biology and pathways. Metabolomics is not a popular field and comparative genomics appears to be in decline even though it is currently very popular.
People are currently most concerned about integrating disparate data sources, followed by genome assembly, resequencing, RNA-seq and comparative genomics. Microarrays are already on their way out and future use of related technologies, including proteomics and mass spec, is looking to be heading for a serious decline.
This one question is the only area where academic/non-profit and commercial respondents significantly differed. Overall, in-house computing, development, analysis are the status quo. Open-source software is wildly popular with not many people seeing any increase in the use of commercial solutions. All this looks like it is unlikely to change, with the exception of cloud computing. Almost a third of respondents said they would be using cloud computing in future - a big leap in terms of potential market share for cloud computing vendors?
When it came to academics vs. commercial respondents, the attitudes show a small but appreciable increase in outsourcing in the commercial sector, and a much bigger usage of commercial software solutions amongst the same people.
Well, surprise surprise, everyone owns a big cluster and lots of servers! Although interestingly a quarter of respondents say they do their bioinformatics on their desktop PCs. Are PCs more powerful than before, or are datasets getting smaller and more manageable? Only a tiny proportion presently run their analyses on the cloud.
[Apologies for the missing purple legend, this should say 'On the cloud']
Of all the positive terms given as options, not even close to a majority considered open-source bioinformatics tools to be worthy of any of these accolades. In fact, the majority were ambivalent - suggestive of an audience who realise that the tools are not great but still have the technical skills to overcome those shortcomings. The worst score of all went to ease of integration where most people believe open-source bioinformatics tools are hard to integrate with each other. This is true - think of the hundreds of poorly documented formats and data IO methods there are out there. Maybe some serious thought should be put into making tools work with each other nicely (is that why there are so many workflow tools on the market?).
We were not very surprised by this last question. The biggest concerns of bioinformaticians is that their tools are scientifically validated and won't fall over halfway through a big analysis. Ease of use, integration, and security were all way down the priority list. Why is that? Probably because the respondents were back-end users who are technical experts and capable of working around sticky issues like usability. If the respondents had been front-end users who saw nothing except the interface, then the responses may have been very different. Note that the majority of people thought training was good to have, but not essential (most tools can be self-taught?), and in support of the previous question the biggest request was to have tools better integrated.
Overall, this survey shows no surprises. but the takeaway messages for the open-source bioinformatics community of developers are:
- Integrate your tools better.
- Make them stable.
- Scientifically validate them by publishing the algorithms.
- Offer training.
- Make them server/cluster/cloud aware by default - most people don't run stuff on desktop PCs.
- Genomics is the major growth area.
- Most bioinformatics teams are one-man bands with no dedicated resources - so make it easy on them to install your stuff.
Hope you enjoyed this review of the survey results! Full (anonymised) raw data is available on request.