March 3, 2017

Annual Bioinformatics Survey Results

Eagle recently completed its second annual bioinformatics survey, intended to monitor trends in the industry and predict its future direction. Congratulations to Oliver Deusch who was randomly selected from all the responses to win the Amazon voucher prize draw. Responses were good this year, 108 in total, from a good range of bioinformaticians working in industrial and academic settings. There were 9 questions with chartable answers which this blog report will summarise one at a time.

The majority of respondents were university academics (51.6%), followed by employees at smaller biotechnology companies (20%), non-profits (15.8%), and pharmaceutical companies (11.6%). This gave a slant to the responses in favour of cutting-edge research as opposed to established operational requirements.

Although this question appeared last on the survey it logically belongs just below the first one, so here it is. The largest portion of the respondents were single-person teams (46/108), with almost all the remainder (45/108) having just 1-5 employees reporting to them. Only two respondents indicated that they managed teams of 15 or more. Compared to the total number of bioinformaticians in responding organisations (a pretty even spread in the 1-5, 5-15 and 15-50 categories) this suggests that the majority of respondents were in non-management roles as part of larger organisations, rather than working as lone-ranger bioinformaticians. About half the respondents had 5-15 years experience, whilst the bulk of the rest had less than 5. Around 10% had more than 15 years experience.

Current research focuses on gene expression and genomic variation analysis (61/108 each), with pathway analysis, genome annotation and comparative genomics close behind with current users in the mid-40s out of 108. It is anticipated, based on these results, that future attention will be paid most closely to proteomics (25/108), systems biology and epigenetics (19/108 each), and a reduced but still significant emphasis on pathway analysis (20/108) and comparative genomics (16/108) supported by an increase in metabolomics (15/108). This suggests that despite the increase in availability and use of genomics sequencing facilities, genomics may be decreasing as the most important field of research in future bioinformatics labs.

Respondents state that the key technologies they rely on at present are those around meta-analysis/data integration (57/108), array expression (41/108, a surprisingly popular choice given the antiquity of the technology), and various forms of sequencing (RNA-Seq at 40/108, Genome-seq at 43/108). Other array-based research forms are still popular too with array genotyping (26/108) and array-enriched sequencing (35/108) still playing a major role. Of all the technologies listed in the survey only two showed a reduction in predicted future demand (array expression and meta analysis/data integration), whilst others seemed to remain steady in their anticipated share of the work.

The majority of respondents (91/108) indicate that the bulk of their work is still done on in-house computational resources, using in-house developed analyses (83/108) and open-source tools (76/108) backed with in-house custom software (69/108). Web-services and commercial software share the majority of the remainder (47 and 39/108 respectively). There is very little use of outsourced analysis, development, or computing resource - excepting cloud. Cloud was interesting because the survey indicates that whilst still relatively poorly adopted (24/108), it is the fastest growing area listed and shows the greatest number of people intending to use it in future (26/108). Only the use of web-services showed anywhere near this level of potential growth but came in at only a fraction of cloud (9/108).

The physical hardware used to run analyses currently is unsurprising - roughly a third each use compute clusters or individual servers. A quarter use their own desktop PCs, unsurprising given the typical smaller-sized/academic organisation of most respondents, whilst only 5.6% currently use cloud resources to obtain sufficient computational power. Cloud usage for bioinformatics is therefore currently achieving only a fraction of the potential market.

The current state of open-source bioinformatics software can be summarised as generally secure (68%), but only marginally (52-53%) tending towards sufficient support, user-friendliness, and ease of maintenance/installation. A small majority of 53% agreed that it was hard to integrate open-source software into their own data sources, but by far the biggest current issue was integrating open-source tools with other tools of the same type - 72% disagreed that this was easy to do. IN summary, open-source appears to be trusted and its user-friendliness (or lack of) generally tolerated, but users really struggle to adapt and integrate it with their data and with other tools.

When choosing open-source tools for deployment, the most important features (nearly 100%) were the availability of scientific validation of the tool (e.g. through publication/peer review), and the computational efficiency of it. Ease of maintenance and installation (~85%) and, surprisingly, access to a command-line interface (80%) were considered also very important. Whilst visualisation of results was important (~75%) there was only minor emphasis on choosing tools for their ease of use by less technical staff, availability of training, or commercial support. Although integration was widely criticised in the previous question, here only 60% of respondents check the ability to integrate when choosing a tool - the science is always more important than the ease of deployment. Only 60% were worried about security testing of new tools - unsurprising given the large proportion of respondents who work in institutions dealing with public rather than private research data.

This question addressing the debate of whether in-house or outsourced is better proved, as expected, fairly conclusive in favour of in-house services. The only area where outsourcing was considered superior was in the ability to better manage workload scalability and iron out peaks/troughs in demand. Whether the responses in this question are based on experience or perception is unknown. Of course here at Eagle we believe that outsourcing is generally the exact opposite of what the responses show here, but this question clearly demonstrates the need for us to better articulate (and demonstrate) the benefits of working with companies like ourselves.

So, all in all another interesting year! We'll run the survey again next year to see how things have changed.

Topics: Announcements, Bioinformatics, bioinformatics survey