Home
auf Deutsch           
Sign In / Register Advanced Search 
You are here:

IT Architecture

The latest news and information on Information Technology and how it impacts the data center


Lessons learned from one of Amazon’s cloud computing client’s prolonged downtime
EC2 executive says adjustments will be made to increase customer visibility into network traffic

While code hosting company Bitbucket is not considering taking legal action against Amazon for the money lost during prolonged downtime caused by a DDoS attack on one of its server instances hosted in the company’s Elastic Compute Cloud (EC2), it is not ruling out switching hosts.

“One thing’s for sure: we’re investing a lot of man-hours into making sure this won’t happen again,” Butbucked CEO Jesper Nohr wrote in his detailed account of the incident and the recovery process. “If this means moving to a different host, so be it. We haven’t decided yet.”

Several hosting firms pitched their services to Bitbucket right after the incident, some of which were “quite tempting.”

Still, Nohr said in an interview that he understood why it took the provider’s team so long to identify and resolve the issue and added that he was satisfied with the adjustments Amazon representatives said they would undertake to prevent similar problems in the future.

What happened and what took so long?
The attacker flooded the aforementioned instance with a large amount of UDP packets, taking up bandwidth and preventing the instance from being able to exchange traffic with the company’s storage capacity hosted in Amazon’s Elastic Block Store which provides storage volumes for use with EC2 instances.

As a result, the Web site was down for about 16 hours after its team reported the problem to Amazon on October 2 and then for another two-plus hours on October 4. It was down for another eight hours which the Bitbucket team spent trying to fix the problem itself before it “decided to shell out for more expensive (Amazon) support.”

Bitbucket was the only EC2 customer that was affected by the attack, Amazon spokesperson Kay Kinton said.

“We did not immediately look beyond the reported problem and spent too much time focusing on what was believed to be an issue with the Amazon EBS volume,” Kinton explained what Nohr said was an eight-hour delay between the time the client reported the issue and the time Amazon identified the cause.

“While the customer perceived this issue to be slowness of their EBS volume, what we ultimately found was not a problem with Amazon EBS, but rather that the customer’s Amazon EC2 instance was receiving a very large amount of network traffic.”

What now?
Amazon representatives conducted a “post-mortem” meeting with Nohr’s team after the problem was resolved to talk about measures the provider was planning to take to prevent a replay.

“We sort of initiated the meeting but they seemed to want to do it as well,” Bitbucket CEO said. “They were very apologetic about what happened and assured us that many of the procedures (were) going to be changed. That’s good enough for me at this point.”

Amazon EC2 VP Peter DeSantis said the changes would primarily be focused around increasing network visibility during the customer support process.

“We need to get better visibility on the network traffic going into the customer’s firewall,” he said, explaining that at this point, customers cannot easily see such traffic.

Existing preventative measures
Within the past six months, Amazon EC2 has also introduced a series of features that can potentially avert downtime during unexpected traffic spikes. These tools provide clients with traffic monitoring, automatic scaling and load balancing capabilities.

“In dealing with unanticipated traffic, adding capacity by adding additional servers is kind of a fundamental first tool,” DeSantis said.

While there are countless cloud computing service providers, Amazon and Google lead in their ability to provide and execute a complete solution, according to a recent report by IT industry market research and analysis group Evans Data Corporation. Amazon is slightly ahead of Google in terms of completeness of solution but behind in terms of ability to execute.

Amazon does not disclose the number of clients that use its cloud services.

Related news: Horizon launches private cloud service
Related feature: Online back-up or cloud recovery?
Related analysis: How the enterprise cloud will impact data center operations

Keywords: Amazon, EC2, Elastic Compute Cloud, cloud downtime, Bitbucket outage, cloud computing, Elastic Block Store, EBS

Comment Box
 
You must sign in to post
 
Username 
Password 
No Blogger account? Sign up here.
CAPTCHA Validation
Retype the code from the picture
CAPTCHA Code Image
Speak the code Change the code
 
Articles:
  • The dawn of cloud-ready server architectures
  • Governance for infrastructure in the cloud
  • Building a cloud platform to support space missions
  • HP tackles sprawl and complexity with higher compute densities, automation and network convergence
  • Keeping the traffic moving in the French city of Orleans requires 50TB of Solid State Disk
  • Cisco says new data center roadmap works and has numbers to prove it
  • Ballmer Commits Microsoft to the Cloud
  • The channel carves itself a path in the data center supply chain
  • IBM explains why test and development is 'ideal' for the cloud
  • VIRTUALISATION : Very far from ’home and dry’
News:
  • VMworld: VMware to pursue enabling delivery of IT as a service
  • HP raises offer to buy 3Par to $2bn
  • UPDATE: HP counters Dell’s offer to buy 3Par with a $1.8bn bid
  • UK’s Trident atomic weapon system gets new supercomputer
  • EMC launches new management software for virtualized storage
  • Arista and VMware merge and automate management of virtual and physical infrastructure
  • HP outbids Dell in 3Par buyout offer, wants seller to cancel deal with Dell
  • Dell says data center is driving growth
  • Roy Newell, founding member of BCS, passes away
  • Earnings round-up: NetApp and Brocade
Download Library:
  • Assessing Trends Over Time In Performace, Costs And Energy Use For Servers
  • Increasing Storage Efficiency and Reducing Costs in the Data Center Through Effective Capacity Management
  • Improving data center storage energy efficiency
  • How Server And Network Virtualization Make Data Centers More Dynamic
  • CA: Virtualization and Automation Drive Dynamic Data Centers
  • Performance and Energy Advantages of Dell Energy Smart Servers and Liebert Cooling Systems
  • Optimizing Data Centers for High-Density Computing
  • Virtualization for Consolidation and Optimization of System Resources
  • Energy Efficient Infrastructures for Data Centers
  • Virtual Servers - Impact on Data Centres Space, Power & Loading
 

The IT Architecture Knowledge bank contains news, articles and features that track the impact of new computing technologies and applications on the data center.
Keywords: blade servers, cloud, utility computing, consolidation, virtualisation, high density, flops, performance, strategy.

© DatacenterDynamics 2010