Radouane is Chief Technology Officer and Data Scientist at Eagle Genomics. In this article he discusses the role of data governance in the era of digital research and its impact on the future of precision medicine and product development.
“Effective analysis of high-quality, contextualised data is vital to understanding everything from human genomics and the microbiome, to disease treatment and product development.”
Q: What is your role at Eagle Genomics?
I’m Chief Technology Officer and Chief Data Scientist at Eagle Genomics. Currently I am leading the strategy for our technical vision. This means participating in many areas and working closely with a number of teams, from product management and development, to innovation and support. My job is to help people think outside of the box!
Q: What is data governance and what is its role within the life sciences?
Data governance is a framework of processes and policies for making decisions and taking actions regarding data. It is always closely related to decision-making processes and the actions which result, providing a control structure for how those decisions are made.
In general, governance is not only related to data, it is about the mechanisms and controls an organisation has in place around all kinds of decision-making. The only difference in the context of data governance is that these are decisions made about the data itself, as well as the context surrounding the data. It’s not about the execution process on a day-to-day level but enabling an organisation to ensure it is making the right decisions surrounding a given aspect; in this context, data.
Life science is becoming a data science discipline and industry. In the past, most innovation happened in the wet lab, but the value and volume of biological data is changing the way researchers and organisations are carrying out science and discovery. Within the entirety of the life sciences, from pharma to personal care, data science is going to have a massive impact.
In the past, most discoveries happened in the wet lab, but now data science is changing the way research happens.
Q: For you, what is the most important area of data governance and why?
You can’t talk about governance without talking about policy as a guiding mechanism. Unfortunately, what often happens in industry is that as soon as a business is dealing with compliance or regulation it becomes a tick-box exercise. Often the thought process becomes, “I have achieved compliance, so I’ve done my job.” Governance then becomes a bureaucratic audit control mechanism and becomes ‘government’ rather than ‘governance’.
It’s really important that this doesn’t happen, it’s a killer because it leads to a bad attitude towards governance. The more regulated an industry is, the more prevalent the tendency for there to be a specific group of people who are in charge of governance, instead of responsibility being a cultural attitude within the organisation.
The most important thing is culture, which, in terms of governance, comes down to asking two very important questions. The first is, ‘Are we doing the right things?’, which is a question of validity and considering if the organisation has the right data, is using data of the appropriate quality etc. The second is, ‘Are we getting things right?’, which is more a case of process and checking that the organisation is following the processes and guidance that are in place for the entire data management lifecycle.
It is always very important to ask the first question before the second! Starting the other way around results in what is called ‘local optimisation’, which is the tick-box style attitude I was referring to before and which undermines the entire system.
If there is a cultural attitude of shared responsibility for governance then, overtime, it should become almost unnecessary to have a specific group of people policing because effective governance should have become an organisation-wide habit; what we call at Eagle Genomics, ‘governance by design’.
Q: What role will data governance play in managing the rapidly expanding volumes of life sciences data?
I believe that data will save lives, because uncovering and understanding the connections and insights hidden within life sciences data will result in better health outcomes for clinical patients and the development of precision, customised medicine for individuals.
Data donations are as vital as blood donations, because effective analysis of high-quality, contextualised data is vital to understanding everything from human genomics and the microbiome, to disease treatment and drug development. Just as governance surrounding blood donations is needed to successfully and safely treat patients receiving transfusions, so is a strong governance system for handling life sciences data.
Life sciences research produces a huge volume of data, which continues to rapidly expand.
Strong data governance goes beyond the General Data Protection Regulation (GDPR). GDPR only addresses social data, such as that collected by social media platforms and e-commerce, but, to my mind, it does not effectively deal with healthcare data. The control of health and scientific data governance needs to be taken out of the hands of private companies in order to create consistent, distributed governance. This process is only at its very beginning, but things in the data science industry will change. Without governance there can no trust between the people whose data is being collected and those using it to make discoveries.
Effective data governance cannot be centralised, it needs to be distributed in order for it to scaled-up effectively and integrated into the cultural fabric of industry.
Q: How does effective data governance help to enable the capabilities of Eagle’s e[datascientist] platform?
At Eagle Genomics we have a duty and a responsibility to our customers to protect their data; having an effective data governance structure in place is vital to achieving this.
Control over which users can see which data is part of the core governance we have in place. Controlling security privileges and how we approach the private institutional data held by our customers is central to the function of our knowledge discovery platform. Security privileges must be managed at the most atomic level and any use or versioning of the data has to be transparent and easy to understand and access.
Visulisation of the e[datascientist] from big data to insight
The platform also integrates open-source datasets from organisations such as EMBL’s European Bioinformatics Institute. The governance processes we have in place for integrating high-quality open-source datasets are fundamental to providing our users with the most informed view of an area of interest through the contextualisation offered by external data.
Q: As artificial intelligence (AI) and machine learning (ML) technologies and definitions continue to evolve, how can data governance ensure these fields are well regulated and clearly defined?
Data governance is going to move from being organisationally centralised to becoming a distributed community process. Today these rules and policies are defined a priori but, in order for these to scale, governance needs to be a learning mechanism. Therefore AI and ML will be fundamental to the process of helping enforce governance, both in terms of acting as a leading case-study for distributed processes and for using the technologies themselves to enable effective governance across industries.
For example, at the moment we know that AI and ML are biased because we as humans are biased. This is demonstrated in the case of a non-touch hand-soap dispenser which is only capable of detecting white skin. This ultimately loops back to the question of ‘Is the right data being used?’, which is where the role of governance steps in.
Q: Where do you do your best thinking?
Generally good ideas come to me when I stop thinking about the challenge in hand. It happens when I’m most relaxed, when I’m walking or cycling. Then the problem is that I forget - it’s not easy to write down ideas when you’re on a bicycle! But even if I remember one or two of those ideas, it’s great!