As physical infrastructure systems age beyond their warranties, software tools no longer reflect or comprehend reality, and operations & maintenance (O&M) programs grow outdated and/or become under-staffed, the risk of an interruption in service goes up significantly.
Aging data centers must either be modernized or have its IT outsourced to cloud service and colocation providers to minimize the risk of business disruption. Remaining sites that delay modernizing also fail to benefit from recent technological advances. These improvements make data centers simpler, more efficient, easier to manage, and less expensive to operate today. This article lays out a simple four-step framework for how to go about modernizing a facility.
It begins first with defining a design and operations standard. This is then used to perform a gap analysis identifying risks and needs. This approach, developed by a team of Solution Architects at Schneider Electric, should be used to cover the three key domains of data center modernization: (1) equipment hardware (electrical & mechanical), (2) software systems, and (3) operations & maintenance programs.
Keeping the IT systems running depends on all three of these domains. So, it is critical that all are considered in a modernization project.
Four steps of the modernization framework
Following these fundamental steps helps ensure a measured and methodical approach to figuring out what and how to modernize regardless of what you might need or where you are in the process.
1. Develop design standards - It is important to first start with documenting the specific objectives and goals of the modernization project. What do you want the data center to look like at the end of the project? How should it perform and what is needed to achieve that? It is useful to start with the larger business and IT objectives. These may well have changed since you first built the data center. The criticality and power capacity needs may have changed significantly.
Re-evaluating your needs in the context of today’s organizational objectives will help you figure out, for example, what level of electrical redundancy is really needed or what the operations team staffing levels should be at a given site. A design standard for each of the key domains should be written down and documented. If, for example, the decision is that the data center should meet a particular tier or criticality standard, then what it takes specifically to meet those requirements should be documented in the design standard. Make sure you have buy-in from all the key stakeholders and an understanding of what the IT outsourcing strategy is. An example design standard is shown below for the electrical distribution and UPS.
2. Benchmark performance - With the design standards clearly documenting in detail where you want to be, the next step is to evaluate what the current state of the data center is across all three domains. This involves physically investigating the infrastructure equipment and their interconnections.
You want to understand each device’s age, maintenance contract status, load vs. capacity, etc. It means interviewing the O&M team and reviewing their methods of procedure and training documentation. You should not just rely on drawings or written reports. Data center infrastructure management (DCIM) tools should be checked against the equipment benchmark to see how well the software map of assets and their interconnections match reality. Use the design standard documents as scorecards to record the current reality.
3. Identify gaps and consider options - With the current situation documented, the next step is to identify the gaps, i.e., where the current reality or performance is not meeting the future requirements of the data center. Consider and document what it would take to bridge each of the gaps.
Vendors and consulting engineers may be needed to clearly understand what your options are, as well as their costs. This effort will begin to form a picture of what it will take time, money, and labor-wise to achieve the project goals. This, in turn, could cause you to re-evaluate the design standards. And that’s OK, this is designed to be an iterative process.
4. Prioritize needs - The last step before the actual implementation of upgrades and replacements begins, is to prioritize the actions needed to close the gaps to bring the data center to the performance levels spelled out in the design standards. Being a (presumably) mission critical data center, all gaps need to be evaluated based on the amount of risk they represent to the continued functioning of the IT. For each gap uncovered in the audit, you must calculate the risk of not addressing it.
Obviously, gaps with the biggest risk go to the top of your list of needs to focus on. This risk needs to be balanced against cost, time, how disruptive it might be to on-going operations, and any other objectives deemed important, such as energy efficiency goals.
Note there are 3rd party vendors who can assist you or even lead this evaluation process. Not only would they simplify and likely accelerate the process for you, but you would benefit from their having experience with many data centers. Also, their independence might make for a more accurate, unbiased judgement of what risks might exist in your facility.
Identify & address the basics (low-hanging fruit)
During the processes of creating the design standards and benchmarking performance, you will likely uncover easy-to-fix issues, i.e., items involving relatively little to no CAPEX and time to implement. These should be addressed right away, of course. Low-hanging-fruit actions we often see include:
- Power: conducting preventative maintenance (PM) services on units that are past due, removing unused power modules from UPSs, redistributing unbalanced loads, correcting mistakes in PDU/Rack PDU assignment if redundancy rules are found to be broken, etc.
- Cooling: conducting past-due PM services, adding blanking panels to racks, plugging holes in raised floors, removing obstructions from underfloor air pathways, making sure floor tiles are in the right places, making sure racks are aligned properly, etc.
- Operations: updating/correcting as-built drawings, ensure MOPs and EOPs are correct and in the right places, verify staff is properly trained on emergency procedures
- Software systems: reviewing and making sure all software tools have an accurate map of assets, resources and their dependencies are mapped correctly; reviewing alarm thresholds and notification policies.
Following and adhering to this framework will simplify the process and reduce risk. It will optimize costs by focusing spending on process improvements, hardware upgrades, and replacements that have the biggest impact on reducing critical incidents and failures that can cause downtime of the IT systems and applications. And new business requirements may mean the infrastructure needed today may be much less than what you needed when it was first built.
When you combine that with the likely efficiency gains that modern infrastructure and their management tools bring, the real total cost of ownership of a newly modernized facility is often less than you expected.