While code hosting company Bitbucket is not considering taking legal action against Amazon for the money lost during prolonged downtime caused by a DDoS attack on one of its server instances hosted in the company’s Elastic Compute Cloud (EC2), it is not ruling out switching hosts.
“One thing’s for sure: we’re investing a lot of man-hours into making sure this won’t happen again,” Butbucked CEO Jesper Nohr wrote in his detailed account of the incident and the recovery process. “If this means moving to a different host, so be it. We haven’t decided yet.”
Several hosting firms pitched their services to Bitbucket right after the incident, some of which were “quite tempting.”
Still, Nohr said in an interview that he understood why it took the provider’s team so long to identify and resolve the issue and added that he was satisfied with the adjustments Amazon representatives said they would undertake to prevent similar problems in the future.
What happened and what took so long?
The attacker flooded the aforementioned instance with a large amount of UDP packets, taking up bandwidth and preventing the instance from being able to exchange traffic with the company’s storage capacity hosted in Amazon’s Elastic Block Store which provides storage volumes for use with EC2 instances.
As a result, the Web site was down for about 16 hours after its team reported the problem to Amazon on October 2 and then for another two-plus hours on October 4. It was down for another eight hours which the Bitbucket team spent trying to fix the problem itself before it “decided to shell out for more expensive (Amazon) support.”
Bitbucket was the only EC2 customer that was affected by the attack, Amazon spokesperson Kay Kinton said.
“We did not immediately look beyond the reported problem and spent too much time focusing on what was believed to be an issue with the Amazon EBS volume,” Kinton explained what Nohr said was an eight-hour delay between the time the client reported the issue and the time Amazon identified the cause.
“While the customer perceived this issue to be slowness of their EBS volume, what we ultimately found was not a problem with Amazon EBS, but rather that the customer’s Amazon EC2 instance was receiving a very large amount of network traffic.”
What now?
Amazon representatives conducted a “post-mortem” meeting with Nohr’s team after the problem was resolved to talk about measures the provider was planning to take to prevent a replay.
“We sort of initiated the meeting but they seemed to want to do it as well,” Bitbucket CEO said. “They were very apologetic about what happened and assured us that many of the procedures (were) going to be changed. That’s good enough for me at this point.”
Amazon EC2 VP Peter DeSantis said the changes would primarily be focused around increasing network visibility during the customer support process.
“We need to get better visibility on the network traffic going into the customer’s firewall,” he said, explaining that at this point, customers cannot easily see such traffic.
Existing preventative measures
Within the past six months, Amazon EC2 has also introduced a series of features that can potentially avert downtime during unexpected traffic spikes. These tools provide clients with traffic monitoring, automatic scaling and load balancing capabilities.
“In dealing with unanticipated traffic, adding capacity by adding additional servers is kind of a fundamental first tool,” DeSantis said.
While there are countless cloud computing service providers, Amazon and Google lead in their ability to provide and execute a complete solution, according to a recent report by IT industry market research and analysis group Evans Data Corporation. Amazon is slightly ahead of Google in terms of completeness of solution but behind in terms of ability to execute.
Amazon does not disclose the number of clients that use its cloud services.
Related news: Horizon launches private cloud service
Related feature: Online back-up or cloud recovery?
Related analysis: How the enterprise cloud will impact data center operations
Keywords: Amazon, EC2, Elastic Compute Cloud, cloud downtime, Bitbucket outage, cloud computing, Elastic Block Store, EBS |