Network test automation is one of the hot topics in the telecommunications industry, and data center networks are not an exception.
Automated network test takes even more importance in exceptional situations such as what we are all experiencing worldwide, as network operations and maintenance teams may be forced to work remotely.
The perfect storm
While the industry in general is well aware that network traffic continues to expand exponentially, no one expected (or at least planned for) the situation caused by the Covid-19 pandemic.
The challenges can make you dizzy: network capacity reaching their limits, supply chains struggling with production/demand, and teams responsible for turn-up and maintenance of the network working from home using networked services. It’s the perfect storm to impact network performance in the event of an outage. Content like Netflix, interactive gaming like Fortnite, communications services like Skype or Zoom, and the massive Enterprise VPN access has raised global network traffic 30 percent on average. This is putting service provider and data center networks under strain and with very little margin for error.
Data centers are ready for disaster recovery on many aspects like power, cooling and even network connectivity. And this is generally true on paper. In the real world, even with redundant mechanisms, outages may cause a serious business impact on providers and their customers. If we look at data center interconnects (DCIs), waiting for a problem to occur is no longer an option.
In the event of service degradation on a DCI fiber link, the network management systems will automatically switch over to another route to safely carry that traffic - it may not be the fastest though, and in the case of load balancing, it may push that remaining link to its limits performance-wise).
So latency performance plays an important role here. Let’s face it, nobody likes re-routing, and it should be a last-resort as far as service-availability is concerned. So how to foresee this kind of situations? In case any section of the fiber is damaged, or simply underperforming, dispatching technicians to a specific location or DC may be needed.
During these days of confinement you count yourself lucky if staff is available, and it may take time to arrange truck rolls and troubleshoot - this is what we understand as MTTR (mean-time-to-repair), and minimizing the impact of this response time is maximizing the revenue of the organizations.
The importance of Test Automation and Predictability
Data centers know that to reduce MTTR and build visibility into the network, they need to integrate - and automate - network test functions as much as they can. Reacting to network events is too risky as the business impact is unknown, so the best strategy is to invest in network fault prevention to increase visibility and control using automated test functions.
The cost of an outage is high (average $9,000 per minute, according to the Ponemon Institute). The goal is to achieve 100 percent network uptime, so investing in network visibility is not an option - it is a must.
If we analyse typical network issues we see today, a large percentage of problems are still found in the physical layer. These issues are often due to external factors and outside of our control. Building automated test functions to proactively test the fiber interconnects will help to prevent performance degradation and ultimately outages. In addition, it’s critical to have the ability to monitor the live network without impacting traffic - in other words, “in-service” testing.
But the story does not end there. Engineers can use these functions to build machine learning mechanisms to make decisions based on network data (optical power, link degradation, latency, etc.) in a way that’s not possible by taking a reactive test approach to the network. Similarly, artificial intelligence algorithms require data, and network status data can’t be excluded from the equation at all. In short, predictability will be an essential pillar in new generation networks moving forward.
And how this will impact network engineering teams? Will they disappear? Definitively not. Human intervention will be always needed, but in a more productive way. As mentioned above, engineers can’t afford to wait until a network problem occurs, and their value is not in spending hours randomly testing the network, but building a more intelligent network capable of making smart decisions to ensure optimal performance.
Data centers today work in majority with open systems - open hardware and software - that make their life easier when it comes to the obstacles of interoperability between vendors as well as scalability of their fast-growing networks. Engineers need to spend time not just optimizing network parameters but building test automation. Network test functions need to definitively be part of this strategy to integrate network visibility and intelligence.
Data center operators will continue to demand a huge amount of network capacity to support their customers. This will add more complexity to maintenance teams. Eventually, network test automation needs to be part of their network expansion plans to best leverage human and tech resources. We can no longer afford to take a reactive test approach to the network, so engineers will need to implement test routines in their network expansion plans that drive automation and improve fault predictability.