You could say the Wellcome Genome Campus, near Cambridge UK, is the CERN of bio-sciences. It leads the world’s efforts to apply genomic research to benefit human health, while CERN leads the world’s particle physics research from its base in Geneva.
The Campus has built up around the Wellcome Sanger Institute, set up in 1992, and now includes a rapidly-expanding cluster of other bio-science and bio-informatics organizations (see box: Wellcome to the world of genomics).
Everything on the campus seems new, and DCD’s visit begins in the Ogilvie Building (opened 2016) with a graphic illustration of progress in DNA sequencing. Some of the Human Genome Project’s early equipment is on display, which took 13 years to sequence the first reference human genome. Next to it are subsequent systems, which do it much quicker (see box: Faster genomes).
The lights dim, and a video plays on one wall, illustrating the rapid progress in the field. Now, we are told fresh genomes are sequenced every day by banks of dozens of machines. There’s a glut of data: sequences for humans, for cancer cells, for parasites, and for bacteria.
This feature appeared in the November issue of DCD Magazine. Subscribe for free today.
Life saving data centers
These genomes hold the keys to new medicine. So far, we have failed to eliminate malaria, which kills half a million people a year in Africa. One problem is the parasite plasmodium falciparum develops resistance to drugs. Checking the genome of malaria samples lets health bodies track that resistance and keep one step ahead, targeting new drugs where they are needed.
Meanwhile, hospitals live in fear of the “superbug” MRSA, which is resistant to standard antibiotics. In fact, analysis of MRSA’s genome has shown that several strains can be dealt with by specific antibiotics. Sequencing and analysis could save the lives of patients in hospitals struck by the superbug.
And a full genome sequence could potentially improve the healthcare an individual receives. Our genetic fingerprint influences how likely we are to suffer specific diseases or conditions. Beyond that, our genome determines which treatments will be effective, and which will have side effects if we succumb to illness.
As the video ends, the wall it’s projected on slides apart. Through a two-way mirror, in a bright gleaming laboratory, we see a bank of NovaSeq 6000s - the latest sequencing machines from Illumina.
Each is fed a continuous stream of genetic samples, and each takes just one day to repeat the task which took the Human Genome Project 13 years. Between them, they are pumping out petabytes of genomic data.
Everything that happens at the Wellcome Sanger campus flows from this firehose. Keeping up with the data deluge is the mission of the Wellcome Sanger Institute’s data center manager, Simon Binley. It’s his job to share it and make it useful to scientists on the campus and around the world.
“We are the single largest user of sequencing consumables in the world,” says Binley. He’s proud of his facility, but has no illusions about who is the star of the show: “Our priority is to make sure the science gets the grunt it needs to perform world-class science.”
That goal places unique demands on the Wellcome Trust’s data center, he says: “The original sample, that piece of human tissue, or organic matter, that will be lost. Eventually, it will decay. So the only reference we've got to that is the data stored here. If it is referenced in a paper, we have to retain the data forever.”
Welcome to Wellcome
To see the data center, Binley walks us to the Morgan Building, an older space which opened in 2005, On the lowest floor, there are four 250 sq m (2,700 sq ft) data halls, color-coded red, yellow, green, and blue, which hold some 35,000 computer cores in 400 racks.
The life-cycle of this data center is far more interesting than even these raw figures suggest. When it opened in 2005, the Sanger Institute planned to adapt to technology changes, by reserving a “fallow” hall. Three rooms were gradually populated, and the blue hall left empty, waiting for a new generation of equipment.
The fallow hall remained empty for a long while, while the first three halls got some updates over the data center’s first 14 years. As a result, the three “legacy” halls have some quite recent equipment, and they’ve recently implemented comprehensive cloud-based data center infrastructure management (DCIM) using Schneider’s EcoStruxure.
The infrastructure management extends beyond the data center to communications rooms through the campus - and also to the crucial sequencers in the Ogilvie building. While we were there, Binley pointed out individual UPS systems sitting by each one, all under the DCIM control.
Binley shows us one of the legacy halls: there is conventional air conditioning with no aisle containment, and the air around is quite chilly as it’s drawn upwards through the racks.
The racks each have about 10kW of load, and the room totals 750kW, with a PUE (power usage effectiveness) which Binley is bringing down from around 1.8 towards 1.4, partly by raising the temperature from 19°C (66°F) to 21°C (70°F).
In the blue hall we feel the difference. Engineers are installing racks, but equipment there is already in use, and the hall is noticeably warmer than the others. Here more power is available, and more is going where it is needed. This hall has a potential capacity of 2.2MW, nearly as much as the other three halls combined.
“We needed something that we could start wrapping much heavier workloads into,” Binley explains. “We needed a number of racks where up to 30kW could be accommodated.”
This density demands liquid-cooled back-of-rack chillers, says Binley: “These coolers can handle 35kW and burst to 40kW,” Binley says. “The temperature is 34 degrees (93°F) at the back of the rack, and goes up to 50 or 60 (122-140°F) six inches later. The back-of-rack coolers take it back to half a degree below the inlet temperature.”
This method allows Binley to tailor the cooling in different parts of the room. There are 25 racks with back-of-rack cooling, and half of them were occupied when DCD visited. There are also air-cooled rows, which now have state-of-the-art aisle containment.
Fully populated, the 25 water-cooled racks would consume some 750kW, as much as an entire legacy hall. This still leaves more than 1MW to take up, he says: “We’re planning to use this for the next 15 years, and technology is not going to stand still.” A quick mental sum suggests that he could pretty much fill the room with 30kW racks.
The increase in energy demands from the blue hall might be a concern, as the campus has rather average-quality power. It’s stuck at the end of the grid, Binley says, and suffers occasional brownouts and outages.
One current proposal to deal with unreliable grids is to use a microgrid, where some power is generated locally to increase reliability. The Sanger Institute is on trend here: for more than seven years it’s had a combined cooling, heat and power (CCHP) system on the Morgan building’s roof. It uses natural gas to deliver 2MW of electrical power, while capturing waste heat for use in the buildings, while also providing energy for cooling systems, so it can deliver about 1MW of cooling.
This makes good economic sense, since gas costs around half the price of electricity per kWh, and it could conceivably provide primary power, with the grid switched for backup and office use.
But that’s not how the Wellcome campus works, says Binley: “We are not the only essential service on campus. We've got to keep the sequencers running. If they dry up, our reason to be here goes away.” So the CCHP doesn’t support the data center directly. It puts 2MW into the campus ring, and increases reliability overall.
Another data center efficiency trend fell foul of economics, however. Some data centers that are keen to reduce their environmental footprint are making their waste heat available for use by their neighbors.
It’s normally thought that liquid cooling makes this easier, because it delivers waste heat in a more concentrated and easily usable form. But it’s not that simple, as the Sanger Institute found.
The warm air from the legacy halls was relatively easy to plumb into the CCHP’s heat reclamation system, but in the blue hall, the use of liquid cooling means the waste air is no longer warm enough to consider. “The heat recycling in the legacy hall is legacy heat recycling,” he explains. “It supplements the building‘s heating in the winter. In the new hall, the air is cooler.”
Meanwhile, it turned out recycling the heat from the water wasn’t viable, because of the level of investment required: “This is a £9 million ($11m) room, directly supporting the science,” he explained. Adding heat reuse would have added £4m ($4.8m) to the cost, and saved a much smaller figure.
“That £4m could pay for a number of PhD students,” he says. “They could work on a program like eradicating malaria.
“For us, the important thing is to enable world-class science,” he says. “The Wellcome Trust is a charity. It gives its results away and doesn’t get government grants. We have to be as careful with our money as a commercial organization.”
The future of gene sequencing
All data center managers look to the future, as the demands on their systems develop and the technology evolves. For Binley, there are multiple developments to follow, as the technology of sequencing develops in parallel with the IT.
Once again, the sequencers get priority, and the Institute has to balance the available resources between doing more gene sequencing, and ensuring that there’s enough IT resource to handle the output.
This could have interesting results in the future. As IT evolves into ever more compact forms, Binley thinks he may be able to continue to grow the power of the data center, while shrinking its size.
It could soon be possible to take all three legacy halls and provide a larger IT resource in a single hall. In five years, he can imagine fitting all the resources the campus needs into two halls.
When that happens, he says, it might be possible to switch two halls for lab space, allowing for more sequencers right next to the data halls.
Why would he be considering this? Those machines would have the benefit of close network links to the data center, and they’d be directly on its protected power supply, no longer needing remote dedicated UPS.
Once again, it comes back to the primacy of the research. This could be the largest biosciences data center in Europe - but it’s still just there to serve the scientists.