Managing a data center remotely has always made sense. Facilities are often in out-of-the-way locations, and it is quicker and cheaper to fix problems remotely instead of getting an engineer on site.
At the extreme, it is possible to run a data center with virtually no staff activity - the so-called “lights out” facility. But the reality has often not lived up to the promise. On the one hand, the tools to provide remote control have often been hard to integrate. On the other hand, colocation providers and their customers have been reluctant to trust the remote systems, preferring to touch servers and other equipment directly.
In 2020 all that changed - of necessity. As we go to press, large parts of the world are going in and out of lockdown, with travel restrictions still in place. Getting into a data center is awkward, even though data center staff are generally categorized as “essential” and exempt from the restrictions, because digital infrastructure is essential to the economy. But data center reliability experts at the Uptime Institute have advised that visits to a facility should be minimized.
In colocation facilities, customers must visit the site less, says Uptime SVP Fred Dickerman, and staff access should be restricted too, and handled very carefully: “When teams come on and off-site, they should do handovers from a distance or by phone.”
This feature was written in April for the Scaling with Confidence Colocation supplement. Read it for free today.
In March, colocation giant Equinix responded to the lockdowns that were being applied, and severely restricted customer access to its data centers. Visitors, customers, contractors, and non-critical Equinix staff were banned from Equinix IBX facilities in France, Germany, Italy, and Spain, with other countries moving to an appointment-only regime.
This move placed a heavy requirement on remote functionality, which may have been used rarely in the past, or been incompletely implemented. Products for data center infrastructure management (DCIM) or service management (SM) present themselves as a complete solution, but most betray their origins in one sector or another, or need careful implementation to deliver fully.
When the crisis hit, those who had fully functional systems, and a culture of using the tools available, had a head start in dealing with the crisis.
When the world changed
Brent Bensten, CTO at QTS Data Centers, counts himself lucky. The data center firm deals with a range of companies from small to large, but it has a service delivery platform (SDP) developed from that of Carpathia Hosting, a 2015 acquisition.
The lockdown created a significant change in customer behavior, he said. The number of logins to SDP went up by 30 percent in the first three weeks of restrictions, and the top users nearly doubled the amount of time they spent on the system - going up from 36 minutes to 62 minutes.
Over the same period, customers were still welcome onsite, but visits went down by a similar proportion to the increased traffic on the SDP. “We want them to come if they need to,” Bensten told us in April. “But Covid-19 is a perfect case to use the tools, so they can do remotely what used to be done on-site.”
Statistics are granular, as different sites have a widely varying number of visitors, depending on the profile of the customers and their stage of deployment. QTS’s largest site in Atlanta could have anywhere from 400 to 700 visitors in a month, but normalizing the period with a previous one, he reckons this went down about 40 percent: “The curves mirrored each other.”
If customers are realizing that unnecessary visits are a risk, new procedures may be contributing to this. “We haven’t had to put in place a hard rejection at any site. We require disclosure of where visitors have been, we use biometrics, and sanitizing wipes when they touch things.”
The reduction in customer visits is even more striking against a background of data center hardware which is working harder to meet greater traffic demands: “By every statistic we have, power consumption is up, bandwidth is up significantly. With all those indicators going up, you would normally see visitor profiles go up.”
QTS was fortunate in having a full-featured SDP, said Bensten: “It’s high-touch, high-need, for people to get what they need in the data center without going in there. It’s the single way to integrate with QTS, all the way from buying the service. It’s available for the iPhone, through a portal, or with an API so you can do everything programmatically.”
That range is important. Smaller firms like cloud startups just need a quick check on an app, while big hyperscalers have the resources to get the most out of programmatic access: “How they get used is wildly different. A one-to-two cabinet guy will use his iPhone app. But a large hyperscaler customer with 1MW of capacity will move loads around to consume less energy and keep the service up reliably, based on the data we share through the API. In the old world, they would have needed to go to the site to do that.”
You might expect the tech-savvy big players to adapt to remote use more easily, but that’s not what Bensten found: “The reduction in visits is across the board for every size of customer, including enterprise, and government business.”
A remote check with the SDP can actually be more effective than a site visit, as it has access to more data, he said: “We have a massive data lake built over the years, based on data we collect from the millions of sensors in our customer space.”
It also includes wider world data such as weather patterns, and effectively looks at the “weather” inside the data center: “We have a team of data scientists using advanced analytics, so we can project our power consumption in seven day intervals to predict future patterns - and the data lake can be mined by our customers as well as by us.”
If remote control is good for customers, it’s also good for staff, so QTS implemented home working where possible - using a different view of the same tools: “Our NOC support center is now working remotely, using a mapper with a 3D view of all our buildings down to customers’ cabinets.”
Of course the tools can’t do everything, but when something physical has to happen, it’s best for operator staff to go in and do it for the customer, directed by the support center, said Bensten: “Our employees are considered essential workers. When we need physical things our ‘smart hands’ can do the physical work, so the customer doesn’t need to.”
The work is directed by the SDP, but staff physically open the cabinets: “We don’t have robots yet.” The staff also operate a slightly different shift pattern, but there’s no dramatic change, said Bensten: “The number of our folks on site at a time hasn’t changed.”
QTS also shares its building security, giving customers access to CCTV feeds for their enclosures, said Bensten: “It’s Nest for your cages, you can see who came in and who left.” The operator has the same ability extended to the shared areas, so it can track staff and customers from the entrance through the mantrap to the data halls.
Remote management brings up issues of demarcation for colocation vendors and their customers. The customers want to know about the building facilities, such as cooling and power, but those are under the control of the operator. Meanwhile, the operator draws a line at looking inside the IT at operating systems and workloads, leaving those for the customer to manage.
“We capture the IT as assets, like servers and storage controllers, so the customers can load in IP configurations and VLANs. Our technology doesn’t interrogate their guest OSs.”
Both groups see a different view: “Our employees need to see a macro picture, while customers need to see a more drilled-in micro view.”
Smaller facilities also got a head start on remote working, simply because of the overhead involved in covering multiple small locations.
“Our whole business premise was based on lights out data centers,” said Lance Devin CIO of EdgeConneX, a colocation provider specializing in built-to-order facilities for smaller cities round the world. “We have 2MW sites, not 100MW behemoths. I can’t afford to put three engineers and 17 security people and two maintenance people in a site like that.”
With 600 of these facilities, the company had an incentive to enable remote control from the start. “The business justification was already there - it’s more cost-effective and cheaper.” And moving further to the edge, with the possibility of 100kW or 200kW sites made remote management more important.
But the Covid-19 crisis provided a workout for the company’s EdgeOS data center infrastructure management (DCIM), EdgeOS, Devin told DCD in April. “This is the way we run our business. This was not a change.”
The systems manage EdgeConneX’s equipment and the customer equipment in the racks - but the data views have to be managed. Despite the size of its facilities, EdgeConneX is a wholesale vendor, dealing with cloud players, “Our customers don’t want us to know what is in their stuff or vice versa.”
So EdgeConneX’s system remotely manages equipment like Liebert cooling systems, which have computerized predictive maintenance, showing the equipment’s details, when it was certified and tested, and its history, said Devin.
SCADA monitors everything every 100ms, spots when something is out of line, and then checks the root cause - for instance finding the faulty remote patch panel (RPP) upstream of the PDUs that suddenly show errors. The system them talks to the vendors of the hardware: “Our ops people don’t have to get in the middle, the system automatically sends a ticket directly to the vendors”
The system also communicates to the customer. It knows the location and status of PDUs and other kit, what racks they serve, who will be impacted - and whether it will affect their service level agreement (SLA). “The ticketing system tells our customers the vendor is working on it, automatically.”
EdgeConneX also lets customers monitor their equipment visually, but by integrating their own CCTV cameras into the system. “When you think about everything you’ve seen from automation and remote working. You do have everything you need at your fingertips,” said Devin.
Views and data are carefully controlled: “One tenant may only see Denver, and within that their real-time load and their tickets. They see their cabinets.”
Actual visits are an issue in a lightly-staffed facility, pandemic or no pandemic. “We built a mantrap, and a callbox system that worked with the security system, so we can let people in remotely,” said Devin. “We take a picture of them in the mantrap, and then ask a challenge system for dual authentication or a remote biometric read.”
Their pass has photo ID, but has to be issued securely, and the biometric recognition has to be low-maintenance for a lights-out site: “We tried an iris scanner,” said Devin, but it was too complex, with visitors having to repeat the scan at different distances. “You’ve gotta be kidding, people aren’t that good at following instructions.”
Fingerprints were rejected as the scanners get greasy. EdgeConneX uses a vascular image of the back of the visitor’s hand - “they don’t touch the lens.”
It’s a complex system which EdgeConneX put together from partial solutions. “I looked at four off the shelf DCIM products,” said Devin. “I would guarantee you, any single system, did two things really well. But the reality is there isn’t one system that does it all from ticketing to management to reporting,”
Back at QTS, Bentsen agreed that customers need more than DCIM. “We are a big believer in DCIM - we need it to run our building. But it is a small piece of our platform. We love our DCIM, but without our data lake on top of it, using it in ways DCIM was never intended to be used, our service delivery platform would not be able to do what it does.”
Bensten thinks the pandemic has changed behavior. “We think our toolset is better for the customer - and the pandemic has pushed people to adopt that.”
But what happens after the lockdown? “I guess I hope things won’t go back to the way they were,” said Bensten. “I’ve worked a lot for my career in managed services, and one of my goals is the cloudification of the data center. I want to see the data center working the way the cloud works.
“A few months from now, when this is over, the last thing anyone is going to do is hop on a plane to visit a data center.”