The rapid adoption of new cooling and power technologies for artificial intelligence (AI) and high-performance computing (HPC) has exacerbated resource availability challenges in data center maintenance, specifically capability and capacity.
As technology and operational demands evolve, the traditional interval-based preventive maintenance approach can be improved to reduce costly equipment outage risks further. The development of AI with machine learning (ML) algorithms has laid the groundwork for helping maintenance scheduling to become predictive.
Condition-based maintenance (CBM) and advanced monitoring services leverage equipment data to generate health scores and alerts. Site staff can use this information to assess the condition of assets and schedule maintenance as needed, improving on the typical method of fixed intervals. With advanced monitoring and data center services, operators can enhance operational efficiency, reduce downtime risks, and improve risk management.
Understanding condition-based maintenance functionality
Condition-based maintenance and advanced monitoring services support data centers in optimizing maintenance activities, and enhancing asset availability. This approach involves monitoring, capturing equipment data, and alerting staff to potential issues.
1. Real-time connectivity and data collection
Data center operators can employ Condition-Based Maintenance and Advanced Monitoring Services by enabling 24/7 connectivity via secure Vertiv Life Services Gateway or direct IoT supported by a communications card, transmitting asset health data to a cloud-based platform. Vertiv collects detailed data at appropriate intervals for precise monitoring and superior analytics.
2. Data centralization and processing
Once the cloud-based monitoring platform captures data from the equipment, the system centralizes and transfers this information into a private and secure global data lake. The data is curated and transformed for advanced analysis using the extract, load, transform (ELT) process. Leveraging AI and ML tools, algorithms process the relevant information and generate outputs by:
- Defining clear problem statements from subject matter experts (SMEs) and component manufacturers to guide analysis
- Using equipment behavior and feedback from operating conditions to refine algorithms continually
- Benchmarking devices with the same design, installation, and configuration to identify potential anomalies
3. Alert generation and action trigger
Vertiv's proprietary OEM algorithms generate analytics like health scores, remaining useful life, anomaly detection, and quality assessment, highlighting deviations that signal potential performance deterioration. Each alert is tailored to the specific component and technology, considering unique working conditions and degradation patterns. When an anomaly is detected, an alert is sent to operators at the Network Operating Centers, who manage the output and direct the required service response:
- Trend performance
- Investigate at the next planned maintenance visit, or
- Immediate intervention
4. Operational implementation
Condition-based maintenance and advanced monitoring services provide operators with more information about the condition and behavior of assets within the system, including insights into how environmental factors, controls, and usage drive service needs.
The ability to recommend actions for preventing downtime and extending asset life allows a focus on high-impact items instead of tasks that don't immediately affect asset reliability or lifespan. These items include lifecycle parts replacement, optimizing preventive maintenance schedules, managing parts inventories, and optimizing control logic. The effectiveness of a service visit can subsequently be validated as the actions taken are reflected in asset health analyses.
Advanced AI data center reporting
Condition-based maintenance and advanced monitoring services include a customer portal for efficient equipment health reporting. Detailed dashboards display site health scores, critical events, and degradation patterns.
The typical view of the portal includes the following information:
- Health score: Overview of the overall data center campus’s current state, including the component, equipment, and site condition
- Health score trendlines: Graphical representations of the rapid or gradual declines in health scores
- Health score per site: Health scores by site, highlighting healthy and potentially problematic locations
- Average health score by site trendlines: Graphical representations of the rapid or gradual declines in health scores by site
- Number of critical events by site: Display of sites with frequent critical events
- Critical alarms: List of alarms needing immediate attention by Vertiv's teams
- Number of critical alarms by description: Types of frequent critical events for better preparation and prevention (i.e. stock parts)
Harnessing AI/ML in data centers with condition-based maintenance
Customers can gain viewing access to health scores and early warnings processed by Vertiv's proprietary OEM algorithms. Meanwhile, Vertiv services use this information to enable proactive maintenance, helping data centers enhance performance.
1. Reduce equipment downtime risk
Equipment downtime in data centers causes financial losses and decreases customer satisfaction due to service disruptions. Condition-Based Maintenance and Advanced Monitoring Services help data centers reduce these risks by enabling proactive maintenance schedules. Leveraging AI/ML for advanced asset monitoring, Vertiv teams can assess health scores and identify issues before they lead to failures. This allows them to help operators plan repairs and replacements proactively, maximizing asset life.
2. Boost operational efficiency
These solutions maximize the benefits of maintenance visits by focusing on assets with critical events beyond the usual checklist. This targeted approach allows engineers to monitor degradation and irregular patterns, estimate useful life, plan maintenance, and extend asset longevity more efficiently, reducing premature equipment handling.
3. Streamline lifecycle management
The discussed services calculate the useful life remaining of equipment and its lifecycle component replacement requirements. Vertiv field engineers use this information to help data centers plan replacements and repairs more effectively. Operators can leverage analytical insights to enhance service part availability and minimize supply chain disruption risks, which can reduce the mean time to repair assets.
4. Enhance asset management
With the help of these advanced monitoring solutions, data centers can benchmark equipment performance by providing health scores and comprehensive data analytics. This systematic approach to monitoring performance helps customers to make more informed decisions on load management and temperature adjustments, enabling readiness and efficiency.
5. Improve energy efficiency
Continual monitoring through these services helps data center operators identify and correct inefficiencies and energy consumption trends while taking corrective actions to optimize equipment performance. Valuable insights from the system can be used to:
- Maximize airflow
- Adjust temperature set points
- Implement targeted energy-saving measures in specific hotspots
- Replace worn components
- Recalibrate equipment to peak efficiency
Optimize equipment care for improved efficiency and reliability
The future of data center maintenance is here – smarter, more efficient, and more reliable than ever. With condition-based maintenance and advanced monitoring services, data centers can anticipate risks and benchmark assets, leading to improved risk management and enhanced availability.
An advanced AI model can gather data from many assets and provide composite health scores across units and systems. These reports can empower data center stakeholders to make more proactive and informed decisions, potentially achieving unprecedented levels of operational efficiency and resilience.
More from Vertiv
-
Sponsored The future of IT infrastructure: Embracing AI and innovation
As the digital world races ahead, AI is driving the transformation, particularly within IT infrastructure
-
Sponsored The evolution of Edge computing
Vertiv experts unveil the future of Edge computing, exploring innovations in liquid cooling, AI applications, and sustainable infrastructure to meet modern demands
-
Sponsored Navigating next-generation power challenges with HPC, AI expansions
The critical role of advanced power technologies in supporting the growing demands of AI and HPC workloads, and how integrated power train solutions ensure operational efficiency and reliability