Cookie policy: This site uses cookies (small files stored on your computer) to simplify and improve your experience of this website. Cookies are small text files stored on the device you are using to access this website. For more information on how we use and manage cookies please take a look at our privacy and cookie policies. Some parts of the site may not work properly if you choose not to accept cookies.

sections

Web scale data centers: software defined because they have to be

  • Print
  • Share
  • Comment
  • Save

A lot of innovative ideas IT vendors and service providers eventually productize get implemented first by big Internet or cloud companies (and often originate there) out of necessity. This has been the case with software defined data center technologies. Companies like Google and Microsoft designed their versions of solutions in this class because they needed them but could not find anything on the market that quite fit the bill.

 

Rules change once you reach the scale of a Google or the scale of a Microsoft. The latter, as an example, provides 200 cloud and online services to 1bn individual customers and 20m businesses around the world. Companies of this caliber have realized that nobody is better suited to design the infrastructure to support their scale than themselves.

 

Google’s software defined WAN

Google homegrown hardware includes switches. And the company created its own Software Defined Network (SDN) technology to manage them. While it was getting economies of scale out of its compute and storage infrastructure, the Wide Area Network (WAN) its services rely on so much did not deliver in the scale department. “We need to manage the WAN as a fabric and not as a collection of individual boxes,” Google engineers wrote in a white paper the company released on its SDN approach.

 

Google’s WAN is organized into two backbones: an Internet-facing one and and an internal one that carries traffic between Google data centers. The latter is managed via SDN.Data center sites – each site has multiple switch chassis – are all interconnected and managed through OpenFlow controllers. Google also has a proprietary traffic-engineering service, which collects utilization and topology data from the network and assigns paths for traffic flows, programming switches through OpenFlow.

 

Google admits in its whitepaper that OpenFlow is not perfect, but it does the job for many network applications. Google spokespeople declined to comment further.

 

Microsoft is software defined

Microsoft has gone beyond SDN, building out a global infrastructure managed entirely via software. Christian Belady, its general manager of data center services, says it has outstripped capacity of tools on the market long ago. “We approach every aspect of the physical, software, hardware and operational environment as an integrated system, and use software to engineer in resiliency and provide data analytics for our operations teams,” he says.

 

With advanced telemetry and tools, Microsoft can debug software faster, and its management solutions enable the company to handle incidents very quickly. Because applications are not bound to hardware they are deployed on, workloads are moved easily between data centers in case of failure.

 

“From the cloud platforms to the network to the hardware, our data centers today are more automated and integrated with software, and these solutions are critical in helping us maintain high service availability for customers,” Belady says.

 

Facebook dipping toes in SDN

While Facebook is a much younger company than Google and Microsoft are, its breakneck growth has forced it to develop a serious in-house infrastructure competency very quickly. The company has designed its own servers and storage hardware and has recently kicked off an effort to design its own network switch for reasons similar to Google’s: off-the-shelf switches create a performance bottleneck that prevents it from reaping the full benefits of innovation in other layers of the infrastructure.

 

Najam Ahmad, director of technical operations at Facebook, says the company does not have any definite SDN plans yet, but there is a lot of activity around it. Facebook uses BGP (an Internet routing protocol) heavily in IP networks to make routing decisions, he says.

 

There is never one best network path for services like Facebook’s, so the company needs the ability to manage traffic dynamically, based on some business logic, and BGP does not allow for such capabilities. This is why the company’s engineers are experimenting with some Facebook-specific use cases for having a forwarding plane controlled by software that sits outside of the network switch.

 

Whether the solution they eventually come up with will qualify as SDN or use OpenFlow remains to be seen. Ultimately, that will not matter, since terms like SDN and OpenFlow are simply names for tools, and Facebook’s tool may not be one of them but do the job Facebook needs.

 

It has been clear for a while that Big Internet is becoming less reliant on the traditional vendors and service providers by the minute. It seems that once a company like Facebook reaches a certain critical mass, there is little an outside vendor can do to serve it better than it can serve its own needs.

 

This leaves traditional enterprises and service providers as primary markets for software defined data center technology. As with other products, vendors’ challenge will be in delivering solutions that satisfy everyone – a challenge much greater than the challenge of designing something exclusively for one company’s use.

 

This piece originally ran in the 30th edition of the DatacenterDynamics FOCUS magazine. Subscribe for free on the DCD website.

Related images

  • An aisle at Facebook's Lulea, Sweden, data center

Have your say

Please view our terms and conditions before submitting your comment.

required
required
required
required
required
  • Print
  • Share
  • Comment
  • Save

Webinars

  • Live Customer Roundtable: Optimizing Capacity (12:00 EST)

    Tue, 8 Sep 2015 16:00:00

    The biggest challenge facing many data centers today? Capacity. How to optimize what you have today. And when you need to expand, how to expand your capacity smarter. Learn from the experts about how Data Center Infrastructure Management (DCIM) and Prefabricated Modular Data Centers are driving best practices in how capacity is managed and optimized: - lower costs - improved efficiencies and performance - better IT services delivered to the business - accurate long-range planning Don;t miss out on our LIVE customer roundtable and your chance to pose questions to expert speakers from Commscope, VIRTUS and University of Montana. These enterprises are putting best practices to work today in the only place that counts – the real world.

  • Power Optimization – Can Your Business Survive an Unplanned Outage? (APAC)

    Wed, 26 Aug 2015 05:00:00

    Most outages are accidental; by adopting an intelligent power chain, you can help mitigate them and reduce your mean-time to repair. Join Anixter and DatacenterDynamics for a webinar on the five best practices and measurement techniques to help you obtain the performance data you need to optimize your power chain. Register today!

  • Power Optimization – Can Your Business Survive an Unplanned Outage? (Americas)

    Tue, 25 Aug 2015 18:00:00

    Most outages are accidental; by adopting an intelligent power chain, you can help mitigate them and reduce your mean-time to repair. Join Anixter and DatacenterDynamics for a webinar on the five best practices and measurement techniques to help you obtain the performance data you need to optimize your power chain. Register today!

  • Power Optimization – Can Your Business Survive an Unplanned Outage? (EMEA)

    Tue, 25 Aug 2015 14:00:00

    Most outages are accidental; by adopting an intelligent power chain, you can help mitigate them and reduce your mean-time to repair. Join Anixter and DatacenterDynamics for a webinar on the five best practices and measurement techniques to help you obtain the performance data you need to optimize your power chain. Register today!

  • 5 Reasons Why DCIM Has Failed

    Wed, 15 Jul 2015 10:00:00

    Historically, DCIM systems have over-promised and under-delivered. Vendors have supplied complex and costly solutions which fail to address real business drivers and goals. Yet the rewards can be vast and go well beyond better-informed decision-making, to facilitate continuous improvement and cost savings across the infrastructure. How can vendors, customers and the industry as a whole take a better approach? Find out on our webinar on Wednesday 15 July.

More link