As enterprises move towards the next phase of their digital transformation journey, the role of the data center is now more crucial than ever. Vital facilities for hosting IT infrastructure and data storage repositories of strategic importance, data centers that are well-run represent a competitive advantage as industries embrace and accelerate digitalization.

Intelligent O&M should make data center management easier and more effective – iStock (exclusively for Huawei)

And as organizations increasingly seek to extract insights from data, data center operations and maintenance (O&M) systems can no longer stay the same. Typically seen as immutable systems that run unchanged for years, O&M must evolve to incorporate greater agility to meet fluid, demanding new requirements without compromising their innate reliability.

For a start, O&M systems are no longer passive support systems, but seen as part of core production systems. New capabilities and a process of integration are required to bring new value to services, which may be a bridge too far with traditional systems.

Another transition happening is the shift from human-centric processes to autonomous ones. Where O&M processes used to revolve around machines assisting humans, the new paradigm sees AI-based machines reconstructing the service process and even autonomously driving IT O&M. Manual interventions, if required at all, are specific and centered on novel challenges. In a sense, it is humans who will be assisting the machines.

Intelligent O&M with Huawei DME

Huawei Data Management Engine (DME) is an intelligent O&M platform for modern data infrastructure. It enables full data storage lifecycle management and automation that covers planning, construction, optimization, and O&M. To deliver the versatility, integration, and autonomous O&M for advanced data centers, DME incorporates various built-in AI capabilities that are tightly integrated across various aspects of its operation to support AIOps or Artificial Intelligence for IT Operations.

First proposed by Gartner, AIOps envisions the use of big data analytics and machine learning to automate IT operations processes. AIOps helps IT O&M personnel process massive volumes of data to determine the root causes of errors and proactively predict the risks from existing systems. With its ability to enhance decision-making by contextualizing large volumes of operational data, Gartner says AIOps adoption is growing rapidly across enterprises1.

AIOps systems also automate mundane software maintenance activities and orchestrate the many layers of IT systems, enabling them to become increasingly autonomous and self-regulating. Indeed, IDC predicts that AIOps will become an important capability for O&M with an adoption rate of at least 50% by large enterprises by 2024 and become the new normal for IT operations2.

Enabling AIOps

Huawei DME uses AI technologies to enable various capabilities, from predicting performance and capacity, to quickly identifying or even predicting impending faults. By delivering continuous insights and optimization of IT infrastructure and services, IT O&M personnel are hence able to identify system exceptions and quickly locate root causes, proactively predicts system running risks, and generate alarms.

Some of the AI-powered capabilities in Huawei DME.

  • Performance warning: By looking at the performance trend model, AI algorithms determine the presence of performance bottlenecks. As needed, administrators are instructed to perform capacity expansion, service balancing, or migration to address performance risks.
  • Load imbalance detection: DME periodically checks on the loading of key components such as storage controllers and fiber channel (FC) ports. Administrators are alerted to optimize the configuration should user-defined thresholds be exceeded.
  • Fault warning: AI algorithms periodically check resource and hardware health indicators such as hard disk drive (HDD) S.M.A.R.T. status and solid-state (SSD) drive wear indicators. Where necessary, DME instructs users to optimize configuration or replace failing hardware.
  • Root cause analysis: DME constructs a knowledge graph of all objects in the storage subnet and association with subnet objects. By including ancillary information such as historical alarms, cause of alarms, and recovery actions, DME can leverage it to infer the cause of root alarms and determine the most likely recovery actions and suggestions.

Intelligent O&M in action

Intelligent O&M applies to different industries, including industries as diverse as finance and telecommunications, among others. One of the largest banks in China with more than 540 branches replaced its traditional storage system with Huawei’s high-end OceanStor Dorado all-flash storage system in 2020. Huawei DME was implemented to manage its converged systems.

Previously, meeting service requests is a challenging and time-consuming process. Due to the presence of multiple systems and interfaces, bank O&M personnel must spend up to an hour planning on the requested configuration, then work for another hour to prepare the requisite configuration scripts. The high level of complexity necessitates a separate review by the storage administrator before it is queued for manual execution by on-duty personnel in the evening.

With Huawei DME, service change requests are performed on a single unified management interface. No cumbersome scripts are needed, and changes can be queued for unattended execution for a significant improvement in terms of efficiency. Moreover, resource dependency self-detection and automatic resource provisioning ensure that new configurations don’t fail due to insufficient allocation of resources.

The largest carrier in Indonesia holds nearly 50 percent of the market share in the country. After years of growth, it faced a growing total cost of ownership (TCO) and low efficiency in its IT system. With systems spread across five data centers and storage systems from different vendors on its production network, management was challenging and inefficient – and made its journey to incorporate cloud resources more difficult.

After adopting Huawei DME, the carrier was able to manage its O&M with a single storage pool, incorporating IT hardware and resources among different public clouds platforms within one management console. AIOps also enabled automatic resource provisioning to transform what used to be a 6-hour task into just five minutes, giving the telecommunications operator a greater than 10-fold increase in efficiency, and enabling a smooth journey to the cloud.

As we build the next generation of data centers to support accelerating digitalization, there is no question that AIOps will be a vital driving force in delivering the agility to meet new requirements, supporting efficient operations, and enhancing reliability. Designed to simplify and automate storage management with the power of AI, Huawei DME centrally manages Huawei storage, third-party storage systems through a unified management interface.

Learn more about Huawei DME here