European cloud provider OVHcloud suffered a 40-minute network outage on Monday March 30. The glitch impacted services to customers for around two hours.
OVHcloud provides infrastructure-as-a-Service (IaaS), offering hosted cloud, and private servers. The outage hit a lot of French customers operating on the platform, but appears to be fixed now. It is believed that the cause was a problem with network hardware at the company's flagship data center in Roubaix. A post mortem is due later today.
Panne du réseau
"At 5.01pm [French time] we noticed a network incident at Roubaix," said a spokesperson. "It impacted access to services - not services directly." The network problem was fixed by 5.40pm French time (Central European Summer Time), OFH says,. DownDetector shows that the service rapidly returned, with a few effects lingering till around 7pm.
The outage appears to be due to a faulty piece of network hardware, rather than an overload by home-workers. OVHcloud's network status updates report that there was a fault in a piece of network equipment, which was isolated. Links were rerouted, so normal service could be resumed. "Following the diagnostic confirmation by the vendor support, we are re-activating gradually the infrastructure links that we had to totally isolate," reported the company's status update page.
It seems that the equipment has already been replaced. Another status update reports that a switch managing top-of-rack switches has been replaced: "There is no expected impact," it says because of OVHcloud's active/active infrastructure. "Some perturbations may be observed between March 31st 2am and 4am CEST."
OVHcloud itself seems to have weathered the current lockdown in France, due to the Covid-19 pandemic . "All our offices round the world shifted to remote working two weeks ago," said the spokesperson.
"We saw the situation evolve from Singapore to Milan, and we have been able to anticipate this and to deploy the most relevant infrastructure as soon as guidance was issued. Everything switched pretty smoothly, with no impact on customer service.
As well as France, OVHcloud has data centers in Sydney, Singapore, Warsaw, and Frankfurt, along with Virginia and Oregon in the US. The company uses its own liquid cooling system and rack infrastructure, with standard GPUs and CPUs. The company rebranded from plain OVH, late in 2019.
Update: OVHcloud's final post mortem on the event says the cause of the fault was a malfunction in one of the connection cards of a backbone router at Roubaix. The fault was partial, and the card appeared to be working, so "traffic was not efficiently re-routed automatically, causing packet and connection losses."
In response OVHcloud isolated the router, and restored access to services, later replacing the faulty equipment.
"Only access to services was impacted, no data was affected or lost during the incident," says the OVHcloud statement. "The abnormal functioning of the router card has been reported to the manufacturer and is the subject of a 'case' being analyzed by the manufacturer's support functions."