With 1.47 billion daily users on its social network, another 1.5 billion on WhatsApp and 500 million on Instagram, Facebook has had to face scaling issues that few in this industry have ever had to contemplate.

“It feels to me like we’re always laying the tracks in front of the train and moving as quickly as we can so that the software team and the product teams can deliver whatever solutions they want without necessarily being concerned whether they have enough data center capacity,” Facebook’s Infrastructure and Site Services VP Delfina Eberly told DCD.

– Facebook

Engineers on tap

“When you’re running and operating these data centers at the scale that we run and operate, one of the things that is very cool about being part of a software company is having the software engineering talent to be able to develop specific solutions that make data center operations and facility operations more efficient.”

One of the systems the company relies upon is called Facebook Auto-Remediation (FBAR), an automated service for handling hardware and software failures, and the first step towards building a fully self-healing data center.

“Machine learning is not something that you normally would think you would apply to a repair function in a data center, and we’re using it in multiple places in how we run and operate facilities,” Eberly said. “There are just so many things to keep track of, at some point human beings are no longer the most effective thing to use.”

This also extends to logistics, Eberly said: “We’ve done some really innovative things for managing the amount of material moving through a data center: parts that are being replaced and replenished, parts that are being returned, and so on.”

Sometimes the technology is used to augment humans, rather than replace them, for example by building “a single system where you can do a look up and see where a part is at any data center or location around the world. You can see what previous problems existed in that space, without having to go run a specific report or having somebody else give you additional insights into the problem you may be looking at.”

She added: “Logistics isn’t necessarily something a lot of people innovate on and we think it’s a game changer for us - simply that handling and managing of materials, things that have to happen across multiple data centers.”

This has enabled Facebook to have just one data center operator per 25,000 servers, an unprecedented ratio. “We’ve done some very cool things in places where people have historically not focused,” Eberly said.

“I think that’s where we can credit the server-to-tech ratio. And with our use of machine learning, we’re saying ‘let’s try this thing in a place where most people would likely not look first,’ and we’re excited about the early performance of that technology in this space.”

This article appeared in the June/July issue of DCD magazine. Subscribe for free today: