March 3, 2017

Quantum bioinformatics

The future has arrived - commercial quantum computers are now on the market. Admittedly they're only 128 qubits (roughly equivalent to being able to process 16 bytes of data at a time) and require everything to be coded in the form of a Markov random field (a statistical model that is a special case of an undirected graph) which prohibits any current algorithms from running out-of-the-box, but its a start, and its a very important one.

Quantum computing, in a grossly oversimplified nutshell, allows binary bits (the 0s and 1s) to exist in both states at the same time - a quantum bit, or qubit. Thus it is possible to represent every possible combination of 0s and 1s simultaneously. As all data processed by current computers is just 0s and 1s, and all problems revolve around manipulating the state of those bits until they match some predetermined requirement, a quantum computer can bypass a lot of this computational heavy lifting by representing every possible combination of bits simultaneously, therefore knowing at the same time every single possible solution/outcome given the inputs.

Why is this important? There is a class of computer science problems called NP-hard which cannot be solved by any other means than constructing every possible solution then measuring them all to see which one comes out best. The classic example of this is the travelling salesman problem - given a map with distances and roads between cities that the salesman has to visit, which route will be shortest? It sounds simple, but it really is NP-hard. 

So what about bioinformatics? Genome assembly is probably NP-hard - depending on which way you look at it. The only way to get the genuinely best assembly is to try out all possible combinations, score them, and select the best. All current approaches attempt to avoid having to do this by clustering or preprocessing the components of the assembly in order to bypass the necessity of trying out every combination. Given a quantum computer with sufficient qubits to represent the completed assembly you can straight away discover all possible variants of the assembly and score each in turn, all in an instant, turning a job that used to take hours or even days into one that takes seconds, if that.

Other areas that could be impacted are biomarker discovery (which combination of SNPs best predict a certain disease? check them all simultaneously and find out instantly), and comparative genomics (which species are the closest to the sample? align/compare them all simultaneously and find out instantly). The possibilities are endless.

The only drawback is the previously mentioned requirement to code everything in a particular way in order for the machine to be able to process the problem. That's not new - people working with GPUs already have to do this - so although it is a short-term roadblock it won't be long before tools are made available to make this job easier. The hardest part will be gaining acceptance. As with GPU and other alternative processors and new algorithms it can take time for the scientific community to convince itself of the accuracy and reliability of a new approach. But given the huge benefits that this technology has to offer I'm sure it won't take long for people to be convinced.

Quantum computing really will change everything, once it takes hold.

Given the extreme cost of purchasing and running one of these early quantum computer models, I'm waiting for Amazon to put one in their cloud first before giving it a go (unless one of our customers happens to have very deep pockets and a strong urge to test the unknown). Well, one can hope!

Topics: Bioinformatics, Cloud, quantum bioinformatics