News and Blog

Linking cancer genome resources, ICGC, TCGA and EGA.


In recent years science has moved rapidly from identifying the first genes involved in cancer (1) to characterising entire cancer genomes (2). These advances herald our understanding the molecular basis of this entire class of complex disease and the development of personalized medicines for their treatment.

This massive worldwide collaborative undertaking has resulted in several genomic resources for cancer, which provide a vital foundation for continued advances in the field. Each resource has different and complementary strengths and weaknesses. Here we take the reanalysis of “whole exome sequences” (WXS) of large numbers of cancer patients as an important task that these resources enable. Integration of WXS datafiles from multiple resources increases the number of genomes, hence power of downstream analyses.

Screen Shot 2016-03-03 at 12.54.55

A single resource often contains multiple samples for an individual patient donor, notably paired tumor/normal, but also multiple tumor samples. There is also significant overlap in samples between the resources. The overlap in samples between EGA, ICGC and TCGA is shown in the figure below;

Screen Shot 2016-03-03 at 12.56.02

This overlap, available after extensive semi-automated curation of records spanning the resources, yields a total of over 17,000 genomes with WXS data available for analysis. This also allows us to coalesce the unique characteristics of each resource; ICGC for example, has an extensive collection of standardized clinical metadata for donors which adds considerable value to the primary sequences in EGA and TCGA.


  1. “A point mutation is responsible for the acquisition of transforming properties by the T24 human bladder carcinoma oncogene” P. Reddy, R. K. Reynolds, E. Santos & M. Barbacid. Nature 300, 149-152 (1982)
  2. “A small-cell lung cancer genome with complex signatures of tobacco exposure” Erin D. Pleasance, Philip J. Stephens, Sarah O’Meara,, Michael R. Stratton, P. Andrew Futreal & Peter J. Campbell. Nature 463, 184-190 (2010), doi:10.1038/nature08629

Blog cancer Carcinoma database EGA genomics ISGC life science data life science R&D personalised medicine small-cell lung cancer TCGA

Eleanor Stanley

About Eleanor Stanley

Scientific data and information security specialist, Eleanor Stanley is a biocurator at Eagle Genomics, and is also responsible for information security. She joined the company in mid 2014 from the Wellcome Trust Sanger Institute (WTSI), where she worked as a bioinformatician building a pipeline for genome annotation within the 50 Helminth Genomes Initiative, which is part of the Global health research project at WTSI. Eleanor’s entire career since University has been biocuration, though she had a flutter and gained a Masters degree in bioinformatics in 2012. She began as a literature curator with FlyBase at the University of Cambridge and then UniProt at European Bioinformatics Institute (EMBL-EBI), focusing on Drosophila, worms, alternative splicing and complete proteome sets. From here she mixed bioinformatics and biocuration at WTSI, building a gene annotation pipeline and taking her automatically generated gene models for Onchocerca volvulus and manually improving them for WormBase. "While fly biology and biocuration of worm datasets isn't the most common route into human genomics, it's all about getting new data and understanding its potential for scientific discovery. Eagle has given me a great opportunity to keep learning, such an energetic company."