Critics say a single engineer shouldn’t be able to bring down the entire network
Australia’s largest telecoms company, Telstra, has issued an apology following a major outage that affected millions of customers on Tuesday.
According to the Sydney Morning Herald, the mobile network went down due to an “embarrassing human error” after an engineer took a malfunctioning network node offline without first provisioning a replacement.
According to Telstra, the outage lasted around four hours. During this time, thousands of people went online to vent their frustration though social media.
Under the bus
Mobile network base stations
Source: Thinkstock / sezer66
From 12:45 on Tuesday, millions of Telstra customers lost the ability to make calls or use their mobile data allowance. The outage impacted all major Australian cities including Brisbane, Sydney, Melbourne, Adelaide and Perth.
Telstra’s Australian network is broken up into ten major connection points it calls ‘nodes’ which host all of the hardware necessary to deliver voice and data services. This structure was built with redundancy in mind, so when a few nodes go offline, the rest can continue operating as normal.
According to Telstra – which admitted it hasn’t had the time to conduct a thorough investigation - the outage began when one of the nodes was taken offline to fix a non-critical fault, but the staff responsible did not follow the correct procedure for rerouting traffic.
“We took that node down, unfortunately the individual that was managing that issue did not follow the correct procedure, and he reconnected the customers to the malfunctioning node, rather than transferring them to the nine other redundant nodes that he should have transferred people to,” explained Kate McKenzie, Telstra’s chief operating officer.
“Normally we could take down three or four of those nodes and do work on them fix them up and it would have no impact, but on this occasion … the correct procedure was unfortunately not followed and the consequences you can see.”
The service was completely restored by 4pm. The same evening, the company announced it would be giving all of its mobile customers a day of free data on Sunday.
Some observers have criticized Telstra for being too quick in shifting all the blame to an unnamed engineer, instead of looking at its network architecture and asking whether the outage could have been prevented through best practices and use of automated systems.