One Bat, one wet food market, one virus.

Sven Sewitz circle bw
 
Sven Sewitz, Head of Biodata Innovation at Eagle Genomics, discusses linear versus connected thinking in the time of COVID-19.

 

In a highly connected world, only by enabling connected thinking will we be able to find new solutions to the challenges we face.

The current global COVID-19 health crisis has led to large volumes of data being released, calling for efforts in data unification and improved collaboration, as featured in our previous post.

Even with this unprecedented influx of new information, we tend to resort to overly simplistic explanations, searching for monocausal narratives, pointing to one clearly identifiable animal - a bat ? - that came into contact with a human at one particular location - a wet food market ? - from where the causative agent has now spread across the globe. And even during this short journey, as we now know, a variety of sub-strains have already evolved, meaning that it is strictly no longer ‘one’ virus. 

Nonetheless, the aims of such extensive data sharing are clear and reasonable. In order for a country to mount a significant and effective response to the oncoming wave of infections, information about the virus has to be transmitted and acted upon faster than the incoming virus. Yet, one important aspect remains sorely lacking. The data are largely unconnected – there are missing connections between and even within data sources. Internal links and extensive metadata collection notwithstanding, the degree to which the data can be traversed, queried and explored is limited by our current capability to technically and conceptually link data and insights. This is not to slight any of the organisations who have, at tremendous expense and with great effort, made the data available. It points more to the challenge that is being recognised across industries and sciences alike. As early as 2016, Griff Weber and colleagues of the IBM Almaden Research Center said of the diverse sets of data being produced that:

“If these datasets could be combined and connected to other data types, insights from each dataset could be pieced together to unlock understanding about the origins and processes of many diseases.”

Biology represents one of the most connected domains we have so far encountered, in which alterations to one pathway can have hard to foresee effects, where ecological interactions can ripple across continents, and where pathogens can cross species boundaries and lead to simultaneous or sequential organ failures. Our ability to fight this one disease (Covid-19) will be a function of how well we can organise our information and accordingly, how easily we can traverse such connections and how accessible we can make - not just the data - but also the links between data entities and data sets. 

We need only to look at current efforts to combat COVID-19 as proof. Recently, several UK scientists have called upon the Secretary of State for Health to evaluate the role the human gut microbiome plays in the aetiology of the disease. 

This is a sensible request, as there are well known links between the microbiome and immune function. Just as there are thoroughly researched links between diet and gut microbiome composition. Also, there is a slew of information pertaining to anti-inflammatory diets. The challenge is to draw clear functional connections between diets, gut microbiome composition, anti-inflammatory function and viral defence. If one where to query the world’s largest public literature database with all of these terms, this would result in zero hits!

The above query fails, today, because we cannot ask a system to show all connections between the search terms, thereby providing not just an answer that incorporates all or any of the search terms, but which can show all of the existing links between the search terms in a connected and structured path. One area that tackles this problem head on is the burgeoning field of ‘network medicine’, which maps and leverages connections of biological entities enabling relevant and prescient conclusions to be drawn. 

Still, it is becoming increasingly evident that we are only at the beginning of developing the sufficiently advanced toolset and widespread conceptual expertise to enable us to capture, represent and explore a highly connected data sphere. A toolset that enables accurate, exploratory and causal inference. Humans, being what they are, tend to narrow down the universe of possibilities far too quickly, looking for direct connections where highly indirect ones are, in fact, the norm. We can observe the challenge in real time, as we struggle to understand the various phenotypes with which COVID-19 presents itself. 

What the emergence of recent data has highlighted is that we need to create a place where we can evaluate and explore connected datasets. Where we can develop justified and well reasoned connected thinking that remains on the path of the well established, but leads us to areas unknown. 

The answers are clearly located in the data and in their connections. Nowhere is this more evident than when looking at the microbiome. The challenge is that there are far too many connections to navigate and assess unassisted. Even if we know that diets and the immune system are linked, which exact foods would boost my personal variant of the immune system to help me fight this particular viral infection ? Does the same diet have an effect on all members of a certain population? Or can microbiome composition and variety help explain why certain people seem to be far more susceptible to develop serious symptoms, while others remain almost unaffected ? Answers to these questions would help us develop remedial strategies as well as point towards causal links between certain commensal gut bacteria and disease susceptibility. Even further, exploring a highly connected dataset could allow us to find links between viral infection and the newly seen increase in hyperinflammatory diseases that seem to occur weeks after infection with the SARS-CoV-2 virus. Are some of the same pro-inflammatory pathways affected, and does the unusual intra-cellular behaviour of the virus fit with the observed hyperinflammatory symptoms? 

Enabling the exploration of connected data is the mission of Eagle Genomics. Leveraging graph technology, we have built and are deploying a highly user friendly data management and exploration platform that enables users to onboard and explore data in a connected framework, allowing users to explore data functionally, identify actionable insights and make more effective scientific and business decisions.

This is important now, because of COVID-19. It will only become more important as the demand for more effective data sharing initiatives continues to grow. The exponential increase in data volumes and complexity will become the most significant bottleneck to improving our understanding in all areas of the life sciences, as the drive towards ever more in-silico science increases. So while we are fast approaching a post Covid-19 world, these challenges remain. Faced with the unchanging complexity of biological systems, their inter-relationship with business insights and economic decision making, we must fully embrace this fact to enhance our ability to deliver results and deliver impact!