Cookie policy: This site uses cookies (small files stored on your computer) to simplify and improve your experience of this website. Cookies are small text files stored on the device you are using to access this website. For more information on how we use and manage cookies please take a look at our privacy and cookie policies. Some parts of the site may not work properly if you choose not to accept cookies.


Verizon data center failure causes JetBlue air travel delays

Data center maintenance causes unexpected shutdown

Customers of the US low-cost airline JetBlue suffered tavel delays for several hours, after a power outage took down a data center operated by Verizon this week. The incident suggests that the airline may not have failover provisions.

All JetBlue data center based cervices failed on January 14th after a maintenance issue caused a power failure at the airline’s Verizon owned and operated data center. Problems were reported by stranded travelers starting at approximately 11:30 AM and Jetblue reported that all services from the facility were restored by 8 PM, with major services coming back up throughout the day, starting by 2:30 PM ET.

Bumpy landing

Verizon reported that the data center power outage occurred at 11:37 AM and JetBlue was able to get its online reservation service and airport check-in operational by 2:30; approximately 40 minutes after they reported that power had been restored in their online blog. Their complete suite of online services, including features such as flight tracking, was not restored until almost 8 PM, however.

JetBlue attributed the failure completely to its data center partner Verizon and directed questions as to the actual cause of the failure to Verizon. However no information has been forthcoming from Verizon beyond the JetBlue report that a maintenance issue caused the failure.

As with any flight service interruption, problems that delay flights cause a cascade of problems with travellerss reporting long lines at JetBlue hubs, some flights cancelled, and many flights delayed. Various news reports indicate that JetBlue did a good job of keeping customers informed as to the status of the problems, but it would appear that the firm has been keeping all of its data center eggs in one basket, as the failure occurring at the single Verizon site brought down the customer support infrastructure, which included airport check-in and gate operations, until that location was restored and the application running at the site were brought back online.

The fact that JetBlue  apparently uses a single site with no failover for its operations may highlight one area where discount airlines are cutting corners to keep their prices low. Maintaining a failover site for the customer support services could have prevented the traveler delays and flight cancellations. However, the risk of revenue loss would have to be balanced with the cost of maintaining that second site. The apparent reliability of their single site operation so far could have made failover operations seem like a poor investment for the company.


Readers' comments (3)

  • Never mind 'failover', they clearly run in a non-concurrently maintainable facility - so if they rely on such a third-class why would they have a DR solution?

    Unsuitable or offensive? Report this comment

  • I am sorry to hear this happened and how it effected many of their customers. Any applications that directly relates to customer services should have a priority level that maintains full availability. There may be some tangible cost from this incident JetBlue can point to, but it’s the larger costs of clients not returning for future flights, and saying to all their associates, “I will not fly with them again”. That really hurts for the long term.
    When doing failure analysis for clients, I have seen very redundant data centers fail because are complicated. A basics data center can be designed to be concurrently maintainable.
    And let’s remember that human intervention is a common cause for failures, which could very well be the issue in this case. BUT! Whenever a maintenance procedure is taking place on a critical facility, there should always be a contingency plan, or back-up in place to abort the maintenance and get the facility back on-line quickly. That’s even if someone messed up. Five to seven hours is unacceptable

    Unsuitable or offensive? Report this comment

  • As a data center professional, it would be very helpful if we eventually knew what maintenance caused the outage. Without that information, we can speculate all day long as to what should have been/needs to be done in the future.

    There is no doubt about the impact to their customers and business operations, but how can we learn as an industry if experiences aren't openly shared?

    Unsuitable or offensive? Report this comment

  • Very good point. Until causes of failure are shared more openly, the industry will keep repeating them.

Have your say

Please view our terms and conditions before submitting your comment.



More link