Scientists aim to sequence DNA of every living thing
Scientists at the Wellcome Sanger Institute in Cambridge recently announced that they had created the first comprehensive summary of all genes known to be involved in human cancer, the ‘Cancer Gene Census’.
Now The Sanger Institute has started work on unlocking the genetic codes of 66,000 UK species as part of a global initiative to sequence the genomes of all 1.5 million known species of animals, plants, protozoa and fungi on Earth.
Efforts by Sanger on the The Darwin Tree of Life Project are part of a wider play in the global Earth BioGenome Project (EBP). Officially launched yesterday, the Earth BioGenome Project saw key scientific partners and funders from around the world gather to discuss progress in organising and funding the project. The EBP will ultimately create a new foundation for biology to drive solutions for preserving biodiversity and sustaining human societies.
The Sanger Institute was founded in 1993 by Professor Sir John Sulston as part of the Human Genome Project. The Institute made the largest single contribution to the gold-standard sequence of the first human genome, which was published in 2003. Of the 23 human pairs of chromosomes, eight were sequenced by researchers at the Sanger Institute in Cambridge and their collaborators.
A genome is an organism’s complete set of genetic instructions written in DNA. Each genome contains all of the information needed to build that organism and allow it to grow and develop.
Since the landmark completion of the human genome, the Sanger Institute has become a globally recognised leader in the field of genomics.
Numerous important reference genomes have already been sequenced – from the mouse and zebrafish genomes to the pig, gorilla, mosquito among others. Beyond animal species, infectious diseases and bacteria also feature prominently on the list of reference genomes, from salmonella and MRSA to chlamydia and malaria. All of these have offered up important insights about these species in health and disease.
Describing all genes strongly implicated in the causes of cancer, the Cancer Gene Census explains how they function across all forms of this disease.
Reported in Nature Reviews Cancer, the resource catalogues over 700 genes to help scientists understand the causes of cancers, find drug targets and design treatments.
The study characterises the increasing understanding that many genes have multiple different roles in different cancers. This paves the way for improvements in personalised medicine, and building combinations of anti-cancer drugs for any given set of genetic functions or mutations.
To address this, researchers working with the Catalogue of Somatic Mutations in Cancer (COSMIC) have created the Cancer Gene Census. While the COSMIC database characterises over 1,500 different forms of human cancer and types of mutations, the Cancer Gene Census describes which genes are fundamentally involved and describes how these genes cause disease.
For the first time in history, functional changes to these genes are summarised in terms of the 10 cancer hallmarks – biological processes that drive cancer. Mutations in some genes lead to errors in repairing DNA, whereas mutations in other genes can suppress the immune system or promote tumour invasion or spreading.
Across the 700 genes in the Cancer Gene Census, many have two or more different ways of causing cancer. These are often different and sometimes contradictory depending on the tumour type, and are all described in this single resource.
The scientists have manually condensed almost 2,000 papers to draw together strong evidence of a gene’s role in cancer and describe which genetic functions go wrong to cause cancer. This knowledge is crucial to developing new therapeutics.
When combined with the COSMIC database, it is now possible to characterise exactly which cancers are impacted by which genes and which mutations are likely involved.
Cancer is a genetic disease, and mutations in DNA can affect cells so that they are able to grow uncontrollably. Cancer can form in over 200 parts of the body and hundreds of different genes are known to be involved.
To understand individual cancers and design specific treatments, many complex details need to be combined. However, this information is often spread out across thousands of different scientific publications and public databases.
Dr Zbyslaw Sondka, the lead author on the project from the Wellcome Sanger Institute, said: “Scientific literature is very compartmentalised. With the Cancer Gene Census, we’re breaking down all those compartments and putting everything together to reveal the full complexity of cancer genetics.
“This is the broadest and most detailed review of human cancer genes and their functions ever created and will be continually updated and expanded to keep it at the forefront of cancer genetics research.”
The Tree of Life
The Darwin Tree of Life project is expected to cost £100 million in the initial first five years, and the sequencing of 66,000 species’ genomes will take around 10 years.
The project has been made possible due to recent advances in sequencing and information technology that will enable the reading and interpretation of thousands of species’ genomes each year by the Sanger Institute and its partner institutions across the UK and internationally.
The data will be stored in public domain databases and will be made freely available for research use.
To mark the 25th anniversary of the Wellcome Sanger Institute, the institute and its collaborators used PacBio® long-read technology and protocols developed by the VGP to sequence the genomes of 25 UK species for the first time*, including red and grey squirrels, the European robin, Fen raft spider and blackberry.
The insights gained from the 25 Genomes Project form a basis for scaling up to sequence the genomes of 66,000 species.
Professor Sir Mike Stratton, Director of the Wellcome Sanger Institute, said: “Globally, more than half of the vertebrate population has been lost in the past 40 years, and 23,000 species face the threat of extinction in the near future.
“Using the biological insights we will get from the genomes of all eukaryotic species, we can look to our responsibilities as custodians of life on this planet, tending life on Earth in a more informed manner using those genomes, at a time when nature is under considerable pressure, not least from us.”
Sir Jim Smith, Director of Science at Wellcome, said: “When the Human Genome Project began 25 years ago, we could not imagine how the DNA sequence produced back then would transform research into human health and disease today.
“Embarking on a mission to sequence all life on Earth is no different. From nature we shall gain insights into how to develop new treatments for infectious diseases, identify drugs to slow ageing, generate new approaches to feeding the world or create new bio materials.”.
The Sanger Institute will serve as the genomics hub in the UK and will collaborate with the Natural History Museum in London, Royal Botanic Gardens, Kew, Earlham Institute, Edinburgh Genomics, University of Edinburgh, EMBL-EBI and others in sample collection, DNA sequencing, assembling and annotating genomes and storing the data.
Sanger will also work with with other groups contributing to the EBP, such as the G10K Vertebrate Genomes Project (VGP) and the 10,000 Genomes Plant Project, to ensure there is no redundancy of effort, and that each project contributes to the other.
Sequencing the eukaryotic species in the UK and worldwide will revolutionise our understanding of biology and evolution, bolster efforts to conserve, help protect and restore biodiversity, and in return create new benefits for society and human welfare.
Professor Harris Lewin, University of California, Davis, United States and Chair of the Earth BioGenome Project, said: “The Darwin Tree of Life Project is a tremendously important advance for the Earth BioGenome Project and will serve as a model for other parallel national efforts.
“The Wellcome Sanger Institute brings decades of experience in genome sequencing and biology to help build the global capacity necessary to produce high quality genomes at scale. The Earth BioGenome Project and its partner organisations welcome the outstanding leadership that the Wellcome Sanger Institute brings to our efforts to sequence all known eukaryotic life on our planet.”
The estimated cost of the The Earth BioGenome Project (EBP) is $4.7 billion. Accounting for inflation, the Human Genome Project today would cost $5 billion.
The Wellcome Sanger Institute will use core funding from The Wellcome Trust to introduce a research programme in Tree of Life genomics. Further funding support for sample collection, sequencing machines, data infrastructure is required.
Activities of the EBP are currently being funded by the participating organisations as well as private foundations, governmental organisations and crowd-funding sources.
Significant funds have been raised by taxon-based communities, national and regional projects to meet the $600 million goal necessary to complete Phase 1 of the project, which aims to produce approximately 9000 reference quality genomes across all taxonomic families.