A study points at the real enemy of hard drive reliability
Humidity is a greater threat to hard drive reliability than temperature variations, according to a study led by Rutgers University in New Jersey. Failures caused by humidity could have a knock-on effect, reducing the cost benefits of free-cooled systems in particular.
Free cooling is meant to lower data center costs while reducing environmental impact but the full effects of this method on hardware failure rates and resulting costs are not so clearly understood. The research team from Rutgers University, Ioannis Manousakis and Thu Nguyen, were joined by Sriram Sankar from GoDaddy, and Gregg McKnight with Ricardo Bianchini from Microsoft have been totting up the statistics in order to find out the truth.
In their paper Environmental Conditions and Disk Reliability in Free-cooled Datacenters, the team said the most notable result was that, despite variable conditions between the nine large datacenters studied, it was clear that the effects on controllers and adapters increased as humidity levels rose.
Wet and windy
HDD suffering from corrosion
According to the study, disk failures accounted for 89 percent of component failures in a datacenter, with DIMMs coming second at only 10 percent, then CPUs (5 percent), and PSUs (2 percent).
The findings suggest that relative humidity is the biggest negative factor in disk reliability, even when the data center is operating within industry standards. As humidity levels rise, the incidence of disk controller and connectivity failures increase. The mechanical parts are less affected.
The humidity-related failures were so marked that annualized failure rate (AFR) statistics could be used to distinguish between free-cooled facilities with humidity controls and those without.
The results of tests carried out on more than a million drives, used for periods ranging from 18 months to four years, led the team to suggest several considerations that should be taken into account when deploying a free-cooling data center.
Hard drives in free-cooled datacenters would benefit from humidity control in regions where high levels of humidity occur naturally. However, positioning of drives in the hot region at the back of a server improved reliability under those conditions. The higher temperatures were not harmless, but proved less destructive than the humidity factor.
The extra costs of replacing drives could be tolerated if a compromise is accepted. Humidity controls can be more expensive than an acceptance of disk failure, so software control could be used to ensure data security in this situation. The software could be set to manage data redundancy more aggressively, especially in large data centers with hundreds of thousands of hardware components where failures are more common.
In free cooling environments in challenging locations, where AFRs can occur about three times more often than normal, failure rates are still not large enough to threaten the reliability of stored data, the researchers found. In the worst case, compensating within applications would only add slightly more redundancy to the data and the extra storage space required would be tolerable.
If software cannot manage the impact of the higher AFRs, datacenter operators must tightly control the relative humidity or relocate their datacenters in more temperate regions.