March 3, 2017

Using Data Hubs in Ensembl to display your organisation's genomic data

Configuring Data Hubs in Ensembl

Using data hubs in Ensembl is a great way to view preconfigured tracks displaying internal (or external) genomic data. There is no need for each user to upload their own data. Nor is there any need to do complex configuration within Ensembl. No Ensembl configuration is required and more data can fairly easily be added as needed later on. This data will all appear as preconfigured tracks in Ensembl for users to see. It is therefore a great method for viewing large internal datasets on an internal Ensembl mirror.

What is a data hub?

Data hubs are fairly simple web accessible sites that contain genomic data arranged in a format that is described within a few meta files.

This is a longer definition:

“Track hubs are web-accessible directories of genomic data that can be viewed on the UCSC Genome Browser alongside native annotation tracks. Hubs are a useful tool for visualizing a large number of genome-wide data sets. For example, a project that has produced several wiggle plots of data can use the hub utility to organize the tracks into composite and super-tracks, making it possible to show the data for a large collection of tissues and experimental conditions in a visually elegant way, similar to how the ENCODE native data tracks are displayed in the browser.”

Creating a Data Hub

Creating a data hub is fairly straightforward. It can be on an FTP or HTTP server. The required format of the data hub is very well documented by UCSC.

Enabling the Data Hub in Ensembl

A prerequisite to enabling Data Hubs in Ensembl is that you already have the relevant software installed so that you can upload files such as BAM, VCF or BigWig into Ensembl. These tools will be needed for Ensembl to display these data types. Then the configuration in Ensembl is quite straightforward. Add this configuration to the species.ini file corresponding to the species that the Data Hub refers to ( if the data hub contains data for more than one species, then add this to each of the other species.INI files too ).

[ENSEMBL_INTERNAL_DATAHUBS]

OUR_DATA_HUB = http://yourdatahub.com

Also in the species.ini file, make sure that in the “general” section the USC_GOLDEN_PATH is set to match the genome name in the data hub. Eg.

UCSC_GOLDEN_PATH = CE_1.0

Lastly, make sure that data hub parsing is enabled in Ensemb by uncommenting the following line in the file: /usr/local/ensembl/modules/EnsEMBL/Web/ConfigPacker.pm

# Internal flatfile data sources

$self->_summarise_datahubs;

This is at about line 61.

Once all these changes have been made and the data hub is up and running remove the Ensembl config packed files and restart Ensembl. Then the new tracks should appear in the 'Configure this page' section of the Ensembl website!

Topics: data hub, eaglensembl, Ensembl, Ensembl browser, external data