New DNA search engine a ‘bug-busting Google’
A new search engine which effectively acts as a Google for bug-busting genomic scientists has been unveiled by researchers at EMBL’s European Bioinformatics Institute at the Wellcome Trust Genome Campus in Cambridge.
They have combined their knowledge of bacterial genetics and web search algorithms to build a DNA search engine for microbial data.
The search engine, described in a paper published in Nature Biotechnology, could enable researchers and public health agencies to use genome sequencing data to monitor the spread of antibiotic resistance genes.
By making this vast amount of data discoverable, the search engine could also allow researchers to learn more about bacteria and viruses.
The search engine, called Bitsliced Genomic Signature Index (BIGSI), fulfils a similar purpose to internet search engines, such as Google.
The amount of sequenced microbial DNA is doubling every two years. Until now, there was no practical way to search this data. This type of search could prove extremely useful for understanding disease.
Google and other search engines use natural language processing to search through billions of websites. They are able to take advantage of the fact that human language is relatively unchanging.
By contrast, microbial DNA shows the imprint of billions of years of evolution, so each new microbial genome can contain new ‘language’ that has never been seen before. The key to making BIGSI work was finding a way to build a search index that could cope with the diversity of microbial DNA.
Take, for example, an outbreak of food poisoning, where the cause is a Salmonella strain containing a drug-resistance plasmid (a ‘hitchhiking’ DNA element that can spread drug resistance across different bacterial species).
For the first time, BIGSI allows researchers to easily spot if and when the plasmid has been seen before.
Zamin Iqbal, research group leader at EMBL-EBI said: “We know that bacteria can become resistant to antibiotics either through mutations or with the help of plasmids.
“We also know that we can use mutations in bacterial DNA as a historical record of bacterial ancestry. This allows us to infer, to some extent, how bacteria might spread across a hospital ward, a country or the world.
“BIGSI helps us study all of these things at massive scale. For the first time, it allows scientists to ask questions such as ‘has this outbreak strain been seen before?’ or ‘has this drug resistance gene spread to a new species?’.”
The search engine complements other existing tools and offers a solution that can scale to the vast amounts of data the lab is now generating.
Iqbal adds: “As DNA sequencing becomes cheaper, we will see a whole new host of users outside basic research, and a rapid increase in the volume of data generated. “
We will very likely see DNA sequencing used in clinics, or in the field, to diagnose patients and prescribe treatment, but we could also see it used for a range of other things, such as checking what type of meat is in a burger. Making genomics data searchable at this point is essential and it will allow us to learn a huge amount about biology, evolution, the spread of disease, and much more.”