News and Blog

The value of ontologies to Eagle Genomics 

Business value of ontologies 

Ontologies are essential to many aspects of R&D data integration and data governance. Use of structured metadata including ontologies and controlled vocabularies /nomenclatures can rapidly increase an organisations data management maturity. In order to bridge the gap between “big data” and “biological insight", Eagle utilises pioneering value driven biocuration to assist with answering business and scientific questions where data harmonisation is essential to improving the quality of the underlying datasets.

Ontologies Mapping Project

Eagle Genomics has been a member of the Pistoia Alliance for a number of years.

PistoiaAlliance_wide_colour_RGB-600.png

Pistoia Alliance are a global, not-for-profit alliance of life science companies, vendors, publishers, and academic groups that work together to lower barriers to innovation in R&D.

One of the current active Pistoia Alliance projects is the Ontologies Mapping Project for which Eagle Genomics was invited onto the Project Team. I have been representing Eagle in this project for the last 2 years, being involved in various stages throughout including contributing to a variety of documentation and assessment of the functionality of available academic and commercial ontology mapping tools. It has been a very interesting journey that has enabled us to gain both knowledge and insight in recognizing some of the better ontologies and how some of the best ontology mapping tools work.

I was given a months notice to prepare myself as a panelist on the Pistoia Alliance Debates Webinar "Ontologies Mapping for more effective data integration and knowledge management", to be aired live in February 2017. I was very nervous at the thought of being heard talking by lots of people over the web, live. My largest audience to date was a class of 28 ten year olds at my son’s school and around the same number at a conference talking about my poster. I have never spoken in front of a large audience (200+ in this case) before and this was also my first experience at participating as a panelist on a webinar!

The outline of my presentation in the webinar took the following format:

Ontologies support the bridge between data and insight

The Life Sciences face increasing volumes, variety, veracity, velocity and sources of data, all of which require integration and interoperability. Eagle provides software solutions that bridge between “big data” and “innovative biological insight”.

 
ecatalog-1.png

Ontologies allow disparate data from a range of sources to be harmonized, federated and integrated into a resource from which various high performance computational analysis such as data processing, statistical analyses and data mining can be carried out towards novel biological insights.

Data curation

Curation is a multistep activity performed on datasets involving its' collection, characterisation, contextualisation and categorisation, thereby allowing for better data management.

 

 EGOMCuration-1.png

 

Eagle plays an active role in curating, organising and federating a variety of customer multi-omics datasets and associated metadata into a knowledge management platform (eaglecatalog) making data more visible and available for searching, sharing and further analyses including data valuation.

Data valuation

Eagle pioneers measurement of data value (i.e. its’ usefulness and relevance) in the context of specific scientific questions. Value driven biocuration is utilized for data harmonisation and to improve the quality of the underlying datasets for value modelling. This has been the basis for our product eaglediscover which objectively assesses the value of the data and discovers relationships between components contributing to the value.

 

EGOMValue.png

 

We can measure the value of data before the use of ontologies and after, according to quality metrics and value metrics such as AHP (analytic hierarchy process; a structured technique for organizing and analyzing complex decisions, based on mathematics and psychology) and QFD (Quality Function Deployment; a structured approach to defining customer needs or requirements and translating them into specific plans to produce products to meet those needs).

Data governance

Data Governance is emerging as an important activity within the biopharma and healthcare industries. This is a complex initiative which relates to the validity (such as are we doing the right things) and consistency (are we doing the things right) throughout the organisation for efficient data and knowledge management.

 

EGOMGovern-1.png

 

It goes towards ensuring everyone refers to the same entity (drug or disease or gene) across all organisational departments and sites (R&D -> clinical trials -> sale of drug to treat disease), which is essential. Data governance should be an activity by design and not a “tick box” one. Hence, it can be initiated by the use of ontologies/ controlled vocabularies to tag and link experiments/ datasets throughout different departments of an organisation.

Conclusion

Despite the need for better data standards to sanction better data management, integration and interoperability, these remain poorly defined and can often be overlooked or ignored, even when present. Ontologies are an important solution because they serve as the “smart glue” for data integration to semantically allow data and knowledge management. However, this is hampered by multiple standards and many varying ontologies which can overlap in the same data domain. Hence, the Ontologies Mapping project has been working towards supporting better tools, services and best practices for ontology management and mapping in the Life Sciences.

Other panelists on this webinar included Martin Romacker (Roche), Simon Jupp (EMBL-EBI) and Lee Harland (SciBite), who all describe different aspects on the importance of ontologies for standardising datasets in the life sciences.

Listen to the Pistoia Alliance Ontology Mapping webinar in full here.

If you would like learn more about how Eagle Genomics utilise ontologies in both eaglecatalog and eaglediscover please get in touch! Call us on +44 (0) 1223 654481 or drop us a line with our specialised form

 

 

 

biocuration ontology data governance data valuation

Yasmin Alam-Faruque

About Yasmin Alam-Faruque

Biocurator, Yasmin Alam-Faruque is a member of Eagle Genomics' Biocuration team, joining in early 2014. "Why do I enjoy data curation at Eagle? It gives me the opportunity to find out about new industries, their areas of research, investigate and organise new datasets and work with the biomedical scientists who create and submit the data to make the data more accessible." Yasmin came to biocuration from a start as a bench scientist, and brings an understanding of biomedical science from an academic perspective, with an MSc in immunology comparing the immunological mechanism involved in corneal and skin graft rejection, a PhD in differential gene expression in mucosal cancers and postdoctoral experience in autoimmune skin disease. In her previous role as a scientific database curator at the European Bioinformatics Institute (EMBL-EBI), she worked on the Renal Gene Ontology Annotation Initiative, a project funded by the charity Kidney Research UK, to produce a resource that can be utilised in the interpretation of data from small- and large-scale experiments investigating molecular mechanisms of kidney function and development, providing new biological insights and thereby help towards alleviating renal disease. She also worked on the curation of various proteins, across species, in the UniProt Knowledgebase, including contributions to the Gene Ontology Annotation and the IntAct protein-protein interaction databases.