Back in December I wrote a post arguing that the best place to store DNA data was in the individual from whence it came, rather than try to make electronic copies. My reasoning was that it would soon become cheaper (and more effective as technology evolved) to re-sequence each time the data was required than to purchase storage to archive the electronic copy.
A Nature article from the EBI on 23rd Jan announced research from a team led by Ewan Birney and Nick Goldman into the use of DNA as a data storage mechanism. Base-2 binary data (two symbols, i.e. 0 and 1) transcoded into base-4 (four symbols, i.e. 0/1/2/3) can happily be represented by the four bases of DNA (AGCT), and so by using simple DNA synthesis methods that already exist a molecule can be created that contains an exact copy of the base-4 encoding of the original data (with a suitable level of redundancy, indexing, and checksumming to ensure minimal loss in case of molecule damage). Standard DNA sequencing technology can then be used to read the molecule back into electronic form from when the encoded information can be decoded and reconstituted. The team reported an impressive 100% round-trip accuracy.
Importantly, the team behind this research believe that soon it could be cheaper and easier to store and retrieve data in this way than on traditional electronic or optical media. The universality of DNA would mean that there should be no need to transfer archives from one media to another over time as technology becomes redundant - it should always be possible to read DNA for as long as life is still based on it. Even if the specific technology needed to read it may change, the format and information contained within never will.
This ties in nicely with my original argument - that it is soon going to be more cost-effective to sequence and discard the data after analysis than to attempt to store the reads generated.
We shouldn't be investing in giant data centres to store petabytes of sequencing data. We just need bigger freezers.