HPC revolutionising speed of life science research
Until recently High Performance Computing (HPC) was largely the preserve of the automotive, aerospace and financial services industries but, increasingly, the need for HPC within the life sciences sector has predominated.
Never has this been more evident than in the last six months of the pandemic, which has seen global HPC resources pooled in an unprecedented effort to halt the progress of COVID-19.
HPC refers to the practice of aggregating super-computing power in a way that delivers much higher performance than the typical desktop computer or workstation.
It enables researchers to analyse vast datasets, run extraordinarily large numbers of calculations in epidemiology, bioinformatics and molecular modelling and solve highly complex scientific problems.
Most significantly, it enables scientists and researchers to do in hours and days what would have, otherwise, taken months and years via slower, traditional computing platforms.
According to HPC expert Adrian Wander over the next few years, we will start seeing an ever-increasing “democratisation of HPC” which will be central to super-computing becoming a routine part of life sciences and pharmaceutical research.
A former research student with a PhD in theoretical solid-state physics, Adrian was a chemistry lecturer at Cambridge University, working under Professor Sir David King (later chief scientific advisor to Tony Blair’s government) before devoting the next 30 years to working in HPC.
During this time, Adrian was an integral part of the team setting up the Hartree Centre – home to some of the most advanced computing, data and AI technologies in the UK– ran the scientific computing department at the Science & Technology Facilities Council before moving to the European Centre for Medium-Range Weather Forecasts (ECMWF), delivering highly critical, time-sensitive computations that relied heavily on HPC modelling for speed and accuracy.
He says: “Historically, HPC was seen as a complex, specialist activity requiring special machines and expert techniques. But, especially in recent months, we have seen a dramatic change in who wants and needs access to HPC.
“Traditionally, the biggest users were the automotive and aerospace sectors. But in the current COVID-19 climate, aeroplanes aren’t being purchased and consumers aren’t buying cars, whereas we are seeing a dramatic rise in the use of HPC within the life sciences industry – not least because the need has never been greater.”
The life sciences sector was initially slow to catch onto HPC compared with the physical science community. In 2007, the Biotechnology & Biological Sciences Research Council (BBSRC) paid for 10 per cent of Britain’s academic national supercomputer service, HECToR (High End Computing Terascale Resource) – but it ended up dropping the partnership because it wasn’t being used by its community.
More recently there has been a huge upsurge in life sciences to explore workloads more actively and efficiently e.g. via simulation and modelling of protein folding and the structures of proteins, and similar areas. And, of course, Artificial Intelligence is being relied on heavily in the search for new drugs by automatically sampling huge numbers of drug candidates on the target treatment.
Adrian adds: “Over the next few years, HPC will become both simpler to use and more easily available – and it will inevitably become a more mainstream part of research portfolios just as incubators and wet benches are now.
“Even with genome sequencing, the Oxford nanopore system takes you down into quite small organisations doing this kind of work because the new sequencing machines make sequencing quite easy to do.
“The tricky part is putting all the bits together to assemble the full genome – and this requires increasing amounts of compute power, irrespective of the size of the organisation doing the assembly.
“The technological advance in this field has been incredible: Sequencing the first human genome was an international effort that cost around $1 billion and took 13 years to complete. Today, genomic studies and meta-genomics are routinely run for between $3000-$5000 and take little more than a couple of days to complete.”
To quantify the remarkable difference HPC is making to scientific discovery, one only has to note the following: After HIV-1 (the main cause of AIDS in humans) was first identified in 1981, it took almost three more decades to genetically decode it.
Four years later, in 2013, the SARS outbreak (due to another coronavirus) was decoded within three months. This year the genome behind COVID-19 was decoded and published globally within days. Things are indeed changing in the life sciences sector. And rapidly so.
At the end of May, this country – led by UK Research and Innovation – became the first European super-computing partner to join the COVID-19 High Performance Computing Consortium, contributing more than 20 Petaflops of HPC capability to the global effort addressing the coronavirus crisis.
The consortium currently has 56 active projects and more than 430 Petaflops of compute which, collectively, is equal to the computing power of almost 500,000 laptops.
For perspective, a supercomputer with just eight petaflops can do a million calculations per person in the world per second. But, by pooling supercomputing capacity the consortium offers 50 times that and hopes to reduce the time it takes to discover new molecules which could lead to coronavirus treatments and a vaccine.
Last week, it was revealed that Summit, the world’s second-fastest supercomputer, had been employed to produce a genetic study of COVID-19 patients, analysing 2.5 billion genetic combinations in just two weeks.
The insights which Summit has produced through HPC and AI are significant in understanding how coronavirus causes the COVID-19 disease and additionally indicate potential new therapies to treat the worst symptoms.
Of course, return on investment is also important in driving the move towards HPC. Hyperion Research recently reported that every dollar spent on HPC in life sciences earns $160 of which $41 is profit or cost-saving.
Adrian explains: “On the face of it, when you look at the cost of hosted services, it might seem expensive. But you need to weigh that against the fact that you no longer need an in-house team of high voltage engineers and all that comes with them.
“And for the big drug companies and pharmas using increasing amounts of HPC cycles for drug discovery, personalised healthcare has the potential to become a huge profit-making market.”
With astronomical levels of computational life sciences data being produced daily, the need for secure storage to house this and advanced computing infrastructure to rapidly analyse the vast datasets is becoming paramount. That’s the reason why more and more research institutions are outsourcing or co-locating to specialist data centres such as the bleeding edge Kao Data campus in Harlow.
Kao Data’s director, Adam Nethersole explains: “Staying static isn’t an option within a highly competitive sector where being first to market is everything – especially when we’re talking about life-saving treatments, medicines and vaccines.
“So across the life sciences sector, most universities, laboratories or research institutes are looking to expand their access to high performance computing.
“But, of course, many of these organisations are pretty landlocked and with old architecture and there isn’t available space that can be turned into a hyper-efficient data centre unless you’re in new, state of the art facilities.
“And even if you are able to scale internally and have the technical expertise in-house to do this you still need to consider how you’re going to power and cool the additional servers – especially in locations like Cambridge where there simply isn’t a vast surplus of available electricity ready and available to be utilised.
“One solution is using hyperscale cloud services – but, while these are great for streaming videos, music and playing video games, they aren’t optimal for specialist computing which requires servers located closely together (and not virtualised in the cloud) and in many cases, bespoke, tailored IT architecture. Cloud platforms also tend to be expensive when moving large amounts of data.
“This is why we’re seeing an increasing number of enquiries about moving computing infrastructure off-premise and into an advanced industrial scale data centre like the one we operate at Kao Data.
“With multi-megawatts of power and space available immediately and excellent connectivity back-up to Cambridge city’s research parks, we’re ideally placed to support.
“One of Cambridge’s most forward-thinking research institutions, EMBL-EBI, has already done this and we are in conversation with others about helping them plan their computing footprint for the next 10, 15 and 20 years.”