March 3, 2017

NCBI phase out the SRA, bail from the data tsunami

Apologies in advance, but the analogy is too tempting to pass up... Here is the scene; the sun is setting over a glassy sea. A surfer starts to paddle as the waters stir, accelerating as the wave builds to three or four times their height. The surfer struggles to their feet and and glances back over the face of the wave to the lip breaking ominously above. This is the moment that they lose their nerve and bail; left to watch the back of the giant wave continuing on its inexorable journey. The surfer is, of course, the NCBI, riding the encroaching next-gen data tsunami on their Sequence Read Archive (SRA). We can all guess why (apart from their budget shortfall) the NCBI has decided to phase out the SRA, right? Most labs have enough problem managing sequence output from a single next-gen instrument, let alone all the public sequences in the world. As the next-next-gen technologies produce more, faster, cheaper, is it even technologically possible to keep up? And then there's utility; who will use the data and how? SRA interfaces have come under fire recently, for example from the tree of life, hyphal tip and mendelian disorder. In short, how could the NCBI have continued to justify this extreme act of molecular stamp collecting at the public expense? There are reasons to defend the SRA, in particular for its role in the rapid release of large-scale sequence data sets as mandated by the Fort Lauderdale Agreement. Will the community's valuable and hard-won commitment to "open data" be damaged by the loss of this important portal? The SRA also provided an important mechanism for labs to outsource the long-term hosting of their large sequence data sets to a trusted 3rd party; I'm already hearing "what will we do now?" from several quarters. The technologist in me is also disappointed; what 'big data' advances were waiting to be discovered from riding the next-gen tidal wave? There is a twist to this tale - the NCBI was not the only ones waiting out back; the EBI and DDBJ have their own SRA database mirrors. Their position will be watched with increasing scrutiny in the light of NCBI's apparent cold feet. A final thought; once this particular wave has been missed, then it's gone for good. [The NCBI had not made an official announcement by the time of writing, but a leaked email on the subject can be read here and here, and also nature blog]

Related content

Eagle Genomics Symposium: provisioning bioinformatics for the next decade

Topics: Big data, Bioinformatics, Open data