On July 9th, Eagle co-hosted a one-day workshop on plant bioinformatics with NIAB at their brand new Innovation Farm facilties just outside Cambridge, UK. The conference centre itself had only opened six weeks previously and certainly delivered on all fronts as an excellent place to have held the meeting. Of interest to many delegates was the one-hour tour of the greenhouses and fields that contain NIAB's demonstration projects, giving a real sense of purpose to the conversations during the rest of the day as the end result of the research being discussed was clearly demonstrated.
The speaking programme opened with a brief introduction from Eagle, followed by the John Innes Centre's Martin Trick discussing his technique for mapping physical traits to genetic variations in organisms that lack complete reference genomes. This technique is now well-established under the TraitTag brand name, is offered commercially by Eagle, and has been applied to a number of different types of species including rubber, palm oil, and canola/rapeseed (and other brassicas), and works even on complex hexaploid genomes.
Following Martin was Eagle's Will Spooner who set out his vision on where plant informatics have come from and where they're heading in the future, using a couple of real-life case studies along the way to support further research into the idea of the pan-genome. Traditionally genomics has worked on the basis of establishing a reference genome for just one individual and then calculate the differences between that individual and the subject of interest. However, the degree of variability between individuals of the same species is so great that it is hard to filter out genuine differences from the background noise of the total allowable variation that any one individual can have before it actually becomes another species altogether. The idea of understanding the total set of allowable combinations within a single species can be used to build a pan-genome representing the species as a whole, including genetic outliers still defined as part of that species, from which individual variation is much more limited and therefore much more likely to give greater confidence in the association of that variation with observed phenotypic differences.
After a coffee break, the second session placed two commercial researchers alongside two academic research groups to hear the challenges each faced and how they were addressing them.
Mark-Christoph Ott from Bayer reminded everyone that it is not all just about big data or managing or storing data, but about generating knowledge. There is no point having lots of data unless you can properly mine and integrate it to support reproducable experiments with consistent data analysis and reliable results. What is missing is not the ability to understand each individual piece of data, but to look at the big picture, the metadata around the data, and see what the landscape as a whole looks like.
Chris Rawlings of Rothamstead, and Mario Caccamo from TGAC, each presented an overview of their current activities within their respective groups and how the increasing volumes of data were beginning to present new challenges to them. The implicit message was that collaboration, whilst always having been important to good quality research, is now a neccessity.
The second commercial speaker, Tim Swaller of Ceres, presented Persephone, a genome browser highly optimised to run lightning fast and present very complex data in a very intuitive visual manner to make it easier for scientists to understand the implications of their results. The use of well-tuned C++ code enables the browser to remain highly responsive even when displaying millions of data points, although it does rely on good network connectivity to the backing databases in order to support this; the issue here being that reliable high-speed high-bandwidth internet is not something that every research lab has access to. Ceres are planning to offer Persephone to other companies once they have established a business model around this offering (Ceres being a research company, offering software commercially is not part of their usual mode of operation), and they already have some instances deployed at partner sites on a trial basis.
The second session closed with presentations on funding opportunities and requests for input into plans for future funding rounds from the Technology Strategy Board (TSB) and the BBSRC. The event sponsor, Arkivum, led the home run to lunch with a brief but useful checklist on how to ensure data is archived securely whilst remaining retrievable and readable long into the future.
The afternoon session consisted of the facility tour and a participatory session where people were asked to prioritise the things they'd like to see changed in plant bioinformatics. Unsurprisingly, 'don't reinvent the wheel' came out top - but it is anyone's guess as to when people will actually start to listen to that plea!
All in all the day went very well, and feedback so far has been nothing but positive. This was a one-off event but similar things may be planned in future if demand merits - so if you have a topic or a theme you'd like to see covered, and have a suggestion for a co-host to help organise it and share costs, then please do get in touch!