A hardware issue at one of MailChimp’s three data centers in the US caused outages for hundreds of thousands of customers of the email marketing services company on Monday.
In a blog post on its website, the company’s representative said three groups of servers supporting databases in the company’s first and oldest data center failed. The failure lead the team to disable user access to these servers, affecting about 400,000 customers.
“We then began the long, painstaking process of replacing hardware and then restoring data,” the MailChimp staff member wrote.
About a third of those affected had their data and services restored “nearly perfrectly with no campaign or data loss”. The other two-thirds, campaigns created or sent between about 1 am EST and 3 am EST were lost because the team reverted to backups from 1 am that morning for these users.
While investigation to identify exact cause of the failure continues, the servers that failed happened to be among the ones MailChimp installed to replace older machines to prepare for capacity spikes during the holiday season. These newer servers were equipped with Solid State Disk (SSD) drives.
“The upgrades did manage to sustain delivering [more than 100] million emails per day during peak periods, but the RAID controllers for the SSDs weren’t working as reliably as we hoped,” the MailChimp team member wrote in the blog post.
“When those things fail, they apparently break the SSDs along with them.”
Now that the holiday season is over, MailChimp is planning to switch back from SSDs.
A MailChimp spokesperson did not respond to a request for comment in time for publication.