Increasing cloud adoption is driving rapid change in the webscale ecosystem. In response, hyperscale cloud providers are building large, global data centers and backbone networks to support regional markets, positioning data centers in important hubs and at the network edge.

Regional cloud providers are building carrier-neutral and colocation data centers to meet increasing demand for a more direct and higher-performing interconnection infrastructure.

Demand for cloud services is also compelling cloud providers to increase the capacity, performance, and resiliency of the network interconnection infrastructures that support these data centers. Providers want to reduce operational complexity and cost per bit at the same time.

Equinix’s most recent Global Interconnection Index projects that global interconnection bandwidth will grow at a 40 percent five-year CAGR, reaching 27,762 Tbps – equivalent to 110 zettabytes of data exchanged annually.

Operational challenges

The latest IP routing and optical technology will help cloud providers meet new capacity and performance demands. But the more daunting challenge may be to address new operational requirements that are being amplified by:

  • The need to manage a much larger volume and scale of applications with the same operational staff and tooling.
  • Cloud-native application architectures that require more responsiveness and agility from the network.
  • Sky-high service quality and performance expectations for applications that used to be delivered with best effort.
  • A dramatic increase in network complexity and connectivity requirements because of the large growth in regional and edge data centers.
  • An increase in configuration errors, bugs and other network maintenance issues.

Cloud infrastructure providers need new solutions that will enable them to address these challenges and accelerate business growth.

Network automation to the rescue

Operational requirements are increasing exponentially, so simply adding more staffing resources won’t work. A different approach is needed.

To tackle these requirements, cloud providers need automation that can increase the network’s operational productivity. Network automation will provide improvements in three key areas: network lifecycle management, service assurance, and path control and traffic optimization.

Network lifecycle management

The manual effort needed for configuration, provisioning and maintenance is growing fast in webscale networks. Cloud providers can use some tools and scripts for different phases of the network’s lifecycle (day 0 design, day 1 deployment and day 2+ operations), but this is not a scalable approach.

A single automation platform that spans the entire network and all its lifecycle phases can enable cloud providers to decrease OPEX, boost operational productivity and reduce human error.

Intent-based automation enables cloud providers to abstract or simplify the complexity of manual configuration and provisioning tasks. It specifies higher-level parameters for the desired end state of the network in a file that contains the “single source of truth” for the network.

This intent file purposely does not include all the configuration details. Instead, it defines how the network should behave. The configuration details are auto-generated and deployed in the network to achieve the desired behavior. This approach can eliminate the need to configure hundreds or thousands of lines of configuration across the network for routine and highly repeatable tasks.

Service assurance

Automation can simplify and augment service assurance by using reporting and prediction tools to efficiently troubleshoot network problems, pinpoint root causes and improve network performance. It can also use machine learning to detect and diagnose issues fast and proactively prevent issues in some cases.

Nokia 18-1.jpg
– Nokia

For example, cloud providers can improve service assurance with closed-loop automation that constantly monitors the network and compares its behavior and state with pre-programmed intent.

When the automation platform identifies a discrepancy, it can raise a flag or alarm for remedial actions. Actions such as reconfiguring or upgrading software on a network device can be automated to complete the other part of the loop.

Closed-loop automation will become significantly more effective as network telemetry information and AI/ML capabilities continue to improve. In short, with automation powered by AI/ML, the network will soon be on autopilot and self-healing capabilities will be mainstream.

Path control and traffic optimization

Networks are becoming more complex, and traffic can take multiple routes from source to destination. Cloud providers need the ability to control traffic and optimize its path through the network. This will allow them to increase network’s virtual capacity and improve quality of experience for end users by avoiding congestion and anticipating capacity constraints.

Automation can address this issue through centralized policy control that maintains an active view of the network state. The use of a standards-based Path Computation Element (PCE) ensures the best placement for network-wide paths and tunnels and steers traffic to avoid delays and congestion.

This capability requires careful monitoring of the network along with measurement of KPIs for the various network paths and their suitability to meet the SLAs of the overall service.

IP-optical automation

The growing availability of pluggable optical modules for routers has made physical IP-optical integration a reality. This transition is transforming the relationship between IP routing and optical transport while simultaneously restructuring some operational responsibilities and practices.

As IP routers cross over into the domain of coherent optical transport, they are exposed to new port and link management practices that were traditionally in the jurisdiction of the optical transport network.

Operational tasks such as optical channel provisioning, connectivity discovery, and the maintenance and troubleshooting of optical connectivity will be incorporated into the scope of overall router management.

Automation must address this change by unifying network visualization and coordination across the IP and optical layers. It needs to leverage powerful correlation capabilities to make these multi-layer, multi-domain networks more efficient and resilient.

The recipe for automation success

To add real value, the automation capabilities described above need to be executed from a single automation platform. Any other approach will defeat the two main aims of automation: simplification and abstraction. An effective automation platform will be:

  • Comprehensive, offering the ability to orchestrate and control optical transport and IP routing networks across multiple domains, technologies and equipment vendors.
  • Adaptable, supporting a modular approach that allows cloud providers to match current and future deployment requirements while interworking with existing systems.
  • Open, aligning with the evolution towards multivendor environments to provide more flexibility in building and growing the network and its services.
  • Simple to use, featuring intuitive, user-friendly capabilities that make jobs quicker and more productive.

A platform that has these characteristics represents the right foundation to address the need to abstract and simplify today’s webscale operational landscape. It is the foundation to implement the automation tools needed to manage, assure, and optimize webscale networks now and into the future.