On 21st July the Pistoia Alliance holds an initial community meeting on the topic of AI & Deep Learning in Healthcare and Life Science, with the aim being to understand what barriers there are to using these technologies and how to work collaboratively as an industry. As I outline below, this is a timely initiative.
It is worth appreciating that Artificial Intelligence (AI) has been used extensively in pharma R&D since theexpert systems of the 1980s. Deep Learning is simply thelatest incarnation of the Machine Learning branch of AI; one that is intuitively appealing because it happens to mimic some mechanistic aspects of the human brain. I predict that the significance of Deep Learning to our community will be reinforced by a happy coincidence; Deep Learning is likely to reach its height at the same time as the life science R&D industry finally embraces data driven discovery at a fundamental level.
The key benefit of Deep Learning and other predictive analytics technologies is to lower the cost and increase the quality of predictions (or relationships, or hypotheses) from data. Having lots of predictions is great for early stage/translational research (biomolecular data), and also for late stage/post-marketing (real world data). The opportunity lies in the use of data-driven predictions to inform decisions; in the discovery of new drugs (early) to matching of patient to treatment pathway (late). AI/Deep Learning thus has a pivotal role in precision medicine.
As experienced practitioners we are all too aware of the challenges for the
adoption of AI/Deep learning in R&D:
- Technically, the key challenge is datasets with huge numbers of dimensions (features) but small numbers of records. AI/Deep learning will never get over the problem that multi-level biological data is inherently noisy and many studies are simply underpowered. Increasing power by combining datasets into meta analyses raises issues of data integration, batch effects and all that. The problem rapidly moves from the tractable world of slick statistics and computational horsepower to the murky world of garbage-in-garbage-out, data cleansing/preparation, incomplete and incompatible metadata, and "feature engineering".
- But the biggest problem is cultural; AI/Deep Learning needs access to large and valuable datasets; exactly the assets that data owners keep under lock and key. The traditional hypothesis-driven community is generally unimpressed by the demands of the data-driven aficionados to have a speculative "rifle through" to see what the old-guard might have missed.
Solutions to both technical and cultural challenges lie, in part, in better data governance, for which there is growing appreciation exemplified by the FAIR Guiding Principles for scientific data management and stewardship. Principles are nothing, of course, without supporting tools and methods, which is precisely the focus of Eagle software and services.