Much focus is put on controlling and monitoring the data center environment up to the front of the rack. But the following research shows that around two thirds of high ITE inlet temperatures are caused, either wholly or in part, by effects inside the rack, hidden from view. These silent problems have big consequences in reducing the efficiency and utilization of the data center. Using CFD simulation as a test lab to check rack stack-ups will give visibility into these issues, and allow most of them to be avoided.
The full rack stack
A recent survey of over 16,500 racks showed 11 percent were above the ASHRAE recommended range of 27°C/80.6°F. An impressive piece of research, but having put pen to paper a few times about the tendency for monitoring to miss some information, I wondered whether the survey really gave the full picture. In a recent DCD opinion piece, Ian Bitterlin highlights two big concerns with simply relying on ASHRAE compliance at the rack door.
Firstly, do you need to be ASHRAE compliant? Certainly, lots of modern IT equipment is rated up to 40°C/104°F or higher. However, the increased fan speeds required to maintain performance at elevated temperatures can cause some big headaches, not only from an airflow perspective but also increasing power draw, noise and general wear on components. And remember, there is still plenty of equipment in use today that will start to fail at 32°C/89.6°F. In fact, there’s at least one new Dell server that, in certain configurations, requires inlet temperatures below 25°C/77°F!
Secondly - and more importantly - it’s the air temperature at the server inlet that really matters, not at the rack door. When we perform our top-tier Platinum assessment service for clients we are simulating down to the server inlets. In the 10-odd years I’ve been doing data center airflow assessments, our untested assertion has always been that half of the thermal problems we face are internal rack issues while the other half are caused by room problems. But this has always been an rule of thumb and, to be honest, we are in the business of showing that rules of thumb rarely work with airflow! So I decided to apply some science to results from our recent work to see what the data said. It is quite surprising!
Out of the 49,000 pieces of IT equipment I looked at, 8 percent (4,168) had inlet temperatures above the ASHRAE recommended maximum of 27°C/80.6°F. This would seem to agree with the rack level survey results and certainly matches with my experience (I should stress, as Ian noted, this equipment probably will be able to operate at those temperatures and may not even be giving an alarm). The interesting part is that of that number, 70 percent (2,918) were installed in racks where the air entering the front door was less than 27°C/80.6°F (i.e. compliant with ASHRAE recommendations).
Let’s think about what that means for a second. These are the things that spring to my mind.
- If the temperature is only being monitored at the rack door, 70 percent of high temperature server issues go undetected.
- While that equipment might not be alarming now, it is all operating with significantly less headroom than thought. This means changes to the environment such as CRAC maintenance, new equipment deployments or even just increased workload could push it over the edge.
- Once equipment is alarming, there is a tendency to reduce CRAC supply temperatures and or freeze further deployments to those racks. This kills efficiency and leaves you with stranded capacity within your racks. Out of the 16 data halls I analyzed, only five were supplying above 20°C. Another five had average supply temperatures of below 18°C, and the average capacity utilization across all the halls was only 53 percent.
What causes these issues and how can they be avoided? Well, the simple answer is: inadequate airflow management. And the solution is better segregation of hot and cold air, but in my experience that is rarely a simple challenge. The infinite combinations of IT equipment, airflow paths, equipment stack and rack design make it tricky to get rack level airflow management right first time, every time. Blanket application of standard best practices can, unfortunately, have a negative impact. There was at least one data hall I analyzed where a lot of work had gone into designing the racks to segregate the air within, ready for classic front-in-back-out servers, only for side breathing switches and other network equipment to be installed in many!
Ideally, each deployment would be tested in a lab and any tweaks made upfront, but doing this physically is impractical (if not impossible for many). This is where CFD simulation comes into its own. Rapid prototyping of IT stacking options can be done in a virtual test lab, using a range of airflow conditions, with the results being generated quickly. This allows rack level issues to be identified and airflow management solutions devised and tested before deployment takes place. It may not be possible to eliminate all rack level issues* but the majority should be simple to fix and significant improvements can be made.
The data I have gathered shows that what is going on inside the rack is equally as important as what is happening outside the rack. If you’re a co-location customer, consider the implications on your responsibility to avoid environmental issues if your SLA is stated up to the rack door. A good room cannot make a bad rack work. Worse, it is likely that most rack level problems go undetected by monitoring systems and only show themselves when they go critical. However, the tools to remedy these issues are available today. IT-level temperature monitoring and CFD simulation can both give you visibility into these “silent” issues. What’s more, by using CFD’s ability to simulate and test future layouts, you can proactively circumvent any new silent issues before they occur..
*I have worked on at least one project where the combination of the equipment in the rack, the cabling requirements and the construction of the rack itself mean that that it was impossible to fully segregate hot and cold air. The client had to take a hit on efficiency and oversupply the cold aisles to mitigate the risk caused by the lack of full segregation.
A quick word on where I got the data, and how I analyzed it: I looked through models we have made for clients at our Platinum service level. A platinum Virtual Facility model requires our engineers to spend considerable time on site taking thorough measurements of airflow, pressure and temperature including the use of a thermal camera. These results are then compared against the simulation data as part of our model calibration. I also stuck to models we have made in the last three years.
The 16 data halls I analyzed were from the US, UK, Europe and Middle East and ranged from 90 racks up to 900. In total I looked at 5,300 racks and 49,000 pieces of IT equipment, which equated to 10.8MW of load.
David King is product manager at Future Facilities