Earlier this month I was privileged to be asked to help judge the Best of Show winners at this year's Bio-IT World Expo in Boston. Whilst I can't give too much away about the judging process or the heated discussions that took place in the judge's chamber over lunch (although the sandwiches were very tasty...), here's my personal take on the winners and what this implies about the state of the bioinformatics industry.
Web-Based Services and Software: Biofortis, for Labmatrix Chameleon 5.5
"Labmatrix Chameleon is a research management software system that can integrate clinical, specimen, genetic and molecular assay data, and the full life-cycle of Next Generation Biobanking sample management."
What does this mean? It means data integration; that old chestnut that keeps on popping its head up more and more frequently now that volumes of data are growing and old-fashioned Excel spreadsheets and even traditional relational databases are beginning to struggle under the load. Now I am no expert on biobanking or clinical specimen management, but I do know that data integration tasks are hard and good solutions are rare as hens teeth, so it is nice to see something as good as this one in the market.
It begs the question though - specialist, or generic? The actual under-the-hood algorithms in this product are unlikely to be significantly different to those found in any similar product, in my opinion (not having seen the source code I can't say for certain though!), as data integration is currently an area with just two solution spaces: divide and distribute, or merge and summarise. What makes the difference between one tool and the next is how good the user interface is, i.e. can the end-user who needs to work with the system on a daily basis find what they need to find and not have to learn some convoluted procedure to get at their own data. This is where the specialist tools have the advantage over generic ones - and Labmatrix Chameleon certainly does well in the UI space.
Informatics Tools and Data: Seven Bridges Genomics, for the IGOR platform
"The IGOR Platform is a cloud based platform run on Amazon Web Services that allows researchers from any technical background to run, customize, and share peer-reviewed pipelines and tools, including sophisticated tools like BWA+GATK for whole genome analysis and TopHat+CuffDiff for RNA-seq analysis."
IGOR has one brilliant feature - its user interface. I have not seen anything before that comes close to IGOR in cleanliness, usability, and functionality. Technical knowledge is still required to be able to take full advantage of the pipeline design tools, but to the end-user biologist IGOR makes it easy to put together basic analysis pipelines and execute pre-defined ones over their datasets. Pipeline platforms are a dime a dozen these days, as was demonstrated by the multitude on display in the Bio-IT exhibition hall, but few can match IGOR for its UI design.
Do we need another pipeline platform? Not sure. Given that so many are on offer now, there need to be real differences between them in order for the market to develop and mature. The ones that are too similar will most likely merge or fade, leaving the niche offerings to expand and become the dominant players. A split between interface and engine would also be nice - at present everyone has to reinvent both, which is a bit of wasted effort. There is also now a huge market for people to develop apps or plugins to run on all these various platforms - so that no matter which platform wins, the app developer stays in business. I suspect therefore that IGOR is a great tool in its own way, but the killer pipeline platform is yet to be seen.
IT Infrastructure and Hardware: Bright Computing, for the Bright Cluster Manager 6.0
"Bright’s cluster manager is a tool for provisioning, scheduling, monitoring and managing servers—either in the cloud or on-premise—and allowing users to move between the two locations dynamically according to need."
The idea of moving servers and data seamlessly from local data centre to a cloud location, and back again, seems magical. It would truly enable burst deployment to cope with peaks in demand, and it would greatly improve migration paths as data centres slowly depreciate and servers are moved into the cloud a rack at a time until eventually the entire system is virtual.
How is it done though? You have to make sure first of all that all the cloud options you have chosen are intercompatible without requiring any software changes. This is not as easy as it sounds - each provider has its own API or standards, hardly any of which are compatible with the competition. Therefore you are likely to need to restrict yourself to planning ahead on just one or two selected cloud providers, and who's to say which ones will be the best choice 5 or 10 years from now when the hype has passed? Bright attempts to handle this by disguising the differences behind their own toolkit, but you still have to implement or reengineer this toolkit into your existing servers and filesystems in order to be able to make them moveable.
As with many cloud projects, this technology works brilliantly for a blank sheet - where a project is being built from the ground up and can follow the design specs and requirements of the software toolkits and hardware infrastructure being used. It works far less well as a retrofitted product to be applied to existing systems. Now if someone could solve that latter problem then I would be properly impressed.
Clinical Trials: ePharmaSolutions, for PatientLive
"PatientLive is a geo-therapeutic matching algorithm that links patients who disqualify from one study with other studies for which they are better suited. The system then refers prequalified patients to the closest study and remaining patients can register to receive new study alerts. More than 20 major pharmaceutical companies and CROs have agreed to pilot the program and “share” patients."
Clinical trials is not my specialist area, but my naive point of view is that it must be pretty hard to recruit sufficient patients for a trial and mighty frustrating when you fail to gather enough together that present the correct phenotype. If a patient is turned down for a trial they've applied for, I assume this means it is less likely they'll apply for another trial in future because of the 'disappointment' factor (or am I assuming too much here?). Therefore to find a way to rescue the non-selected patients and offer them alternative trials must save the trial organisers a lot of time and hassle by offering a shortcut to at least part of their recruitment, and it makes the volunteers taking part in the trial feel more wanted and more likely to continue volunteering for future trials.
PatientLive seems like a great thing then - it scans through a database of all trial volunteers past and present and matches them up against all current trial requirements. Data protection and privacy requirements permitting they can then be contacted to see if they want to take part, rather than waiting for them to volunteer themselves. The flaw in the plan though, from what my colleagues tell me, is that human psychology just doesn't work this way when it comes to trial participation, and also the way in which existing cohorts have been recruited historically leave the users of a system like this with a very limited self-selecting pool of patients who are not particularly diverse in their phenotypes.
This type of tool will come into its own only when a better way is found to recruit a more diverse base of volunteers in the first place. Until then, I feel it is a great effort but is limited not by its own features but by the very nature of clinical trials as currently practiced!
All in all then, an interesting selection of winners. One thing was very clear this year though and that was the categories themselves need a serious revamp! Web-based software and services are so ubiquitous now that it seems silly to attempt to separate them from the informatics tools and data category, and the IT infrastructure and hardware category sees so little actual hardware activity these days that it might better be repurposed as data centre management tools. It is encouraging that the organisers of Bio-IT already acknowledge this and are actively seeking suggestions for new categories for 2014. I'm sure they'd love to hear from you! (If you'd like to leave your thoughts in the comments below, I'll make sure to pass them on.)