March 3, 2017

What is bioinformatics?

What is bioinformatics? Where did it come from? What does it do and what is it used for? I decided to try to answer these questions as part of a series of blogs explaining how bioinformatics is applied in different life science sectors: plant and animal breeding, biofuels, drug discovery, cosmetics. This blog is the introduction to the series.

What is bioinformatics?

Explained with one sentence it is the science of managing, analysing, storing and merging biological data using advanced computing techniques.

Bioinformaticians and computational biologists are field experts who use bioinformatics to aid their research. Defining bioinformatics is equally easy for both, but the definition given by either of them might be different. The reason behind this is that both come from slightly different backgrounds: the bioinformatician is a biologist using software tools to interact with biological data, while a computational biologist is a computer scientist developing theories, algorithms and techniques for the tools which bioinformaticians use.

Thanks to bioinformatics, there is another way to do experiments-in silico.

Where did the term bioinformatics came from?

The term Bioinformatics was coined by Paulien Hogeweg and Ben Hasper in 1970 (unlike many Google searches which say 1978), when they carried out research on informatic processes in biotic systems. An essay about the roots of bioinformatics is available here

To Paulien and Ben, it seemed that the defining properties of life were various forms of information processing. Their experiments distinguished bioinformatics as a separate research field.

What does it do?

In the beginning of the "Genomic Revolution" bioinformatics was applied in the creation and maintenance of databases to store biological information such as nucleotide and amino acid/ protein sequences. Examples of well known and freely available databases include:

  • ENA - nucleotide sequence data resource
  • UniProtKB - protein sequence database
  • Protein Databank - macromolecular structures
  • ArrayExpress - gene expression data
  • Ensembl - genome databases for vertebrates and other eukaryotic species
  • IntAct - protein interaction data

Development of these types of databases involved creating complex interfaces where researchers could access existing public data and submit new or revised data.

A goal of bioinformatics is to discover the meaning of the biological information extracted from the DNA of different species and obtain a better insight into the biological processes which happen inside organisms. In order to study how the cell mechanisms are altered in different disease states, the biological data must be analyzed and interpreted. To do this bioinformatics is broken down into sub-disciplines:

  • the development and implementation of tools that enable efficient access to, and use and management of, various types of biological information.
  • the development of new algorithms (mathematical formulas) and statistics with which to assess relationships among members of large data sets, such as methods to locate a gene within a sequence, predict protein structure and/or function, and cluster protein sequences into families of related sequences.

Examples of bioinformatics tools and algorithms include mapping and analyzing DNA and protein sequences, aligning different DNA and protein sequences to compare them and creating and viewing 3-D models of protein structures.

What is it used for?

By understanding the massive amounts of biological data, humanity hopes that it will be able to improve human health, animal and plant breading, energy and biotechnology.

Bioinformatics is currently most frequently used in the following fields:

  1. Molecular medicine
  2. Cosmetics and personal hygiene/health
  3. Microbial genome applications
  4. Agriculture/crop science
  5. Animal health and improvement
  6. Comparative studies

My next blog will cover one of these fields in more detail.

Topics: ArrayExpress, Bioinformatics, bioinformatics use, biological data, biological database, Blog, DNA, ENA, Ensembl, Genomic, IntAct, meaning of bioinformatics, Protein Databank, Sequencing, UniProtKB, What is bioinformatics