News and Blog

Translational medicine research application #2: patient selection and cohort builder for correlation and association studies

November 28, 2017 / no comments / in translational medicine
Selecting valuable patient cohorts for cancer biomarker discovery

Providing a scientist with prioritised datasets based on their scientific relevance to the research that the scientist wants to carry forward, allows for improved data selection, encourages data reuse and hence makes datasets more valuable.

Systematic and explicit data prioritisation is at the heart of Eagle's translational medicine platform. In a case study we show how the platform was used to select and prioritise the most valuable patients in the context of a specific customer project, namely the identification of genetic (haplotype) associations with skin cancer prognosis from publicly available information

We used the well-known patient dataset from the International Cancer Genome Consortium (ICGC), with over 20,000 patient donors. ICGC is unique in providing links to primary sequence data across many contributing projects. This provided our association analysis to include a greater number of samples than any single project such as The Cancer Genome Atlas (TCGA).

The stepwise process, from data modelling to usage and exploitation, enabled by our translational medicine platform is described in the following figure.


Several software components are used; e[catalog] for cataloguing the datasets, e[discover] for valuing and prioritising the data and e[hive] for running the association analysis.

The systematic data organisation and valuation model provided by Eagle’s translational medicine platform allows for fast and effective patient selection for cohort building, followed by robust and reproducible correlation and association analysis.

We demonstrated the benefits of our prioritisation approach whereby we were able to select and prioritise the most relevant patients on explicit, well understood criteria and access their associated datasets in order to run complex comparison analysis between groups of patients to identify biomarkers, assist with stratification of patients and perform biological analysis of targets.

If you want to know more, please reach out to us.



To register for our upcoming webinar 'Expert Panel on Data Challenges in Translational Research' with guest panellists from Pierre Fabre, Institut Curie, sign up here:

translational medicine

Yasmin Alam-Faruque

About Yasmin Alam-Faruque

Biocurator, Yasmin Alam-Faruque is a member of Eagle Genomics' Biocuration team, joining in early 2014. "Why do I enjoy data curation at Eagle? It gives me the opportunity to find out about new industries, their areas of research, investigate and organise new datasets and work with the biomedical scientists who create and submit the data to make the data more accessible." Yasmin came to biocuration from a start as a bench scientist, and brings an understanding of biomedical science from an academic perspective, with an MSc in immunology comparing the immunological mechanism involved in corneal and skin graft rejection, a PhD in differential gene expression in mucosal cancers and postdoctoral experience in autoimmune skin disease. In her previous role as a scientific database curator at the European Bioinformatics Institute (EMBL-EBI), she worked on the Renal Gene Ontology Annotation Initiative, a project funded by the charity Kidney Research UK, to produce a resource that can be utilised in the interpretation of data from small- and large-scale experiments investigating molecular mechanisms of kidney function and development, providing new biological insights and thereby help towards alleviating renal disease. She also worked on the curation of various proteins, across species, in the UniProt Knowledgebase, including contributions to the Gene Ontology Annotation and the IntAct protein-protein interaction databases.