Making a smart data center is one thing. Making smart data infrastructure is another. That’s the job that Indian bank ICICI took on when it upgraded infrastructure which drives more than 14,000 ATMs and 4,800 branches around the world. The result won it DCD’s Smart Data Center Award in 2017.
Smart data centers apply automation at all levels: they instrument the hardware to improve performance, and gather and harness data so the facility itself can learn and improve (see box).
ICICI (the Industrial Credit and Investment Corporation of India) embodied all of this in a project which collected data and automated processes at multiple data centers. The job combined IoT-based data center environmental management, centralized building management systems (BMS), adaptive capacity management and predictive analytics into a single software defined data center (SDDC) approach.
To complete their vision, the team had to build an application management tool - called App360. The end result is a more stable data center ecosystem, which operates more efficiently, requiring less energy. It also demands less management effort, as it handles requests and incidents automatically instead of manually, and can even fix problems before they happen.
The project was all about the results, not technology for its own sake, ICICI managing partner Imran Shaik told DCD in ICICI’s Award submission: “Being one of the topmost banks in India, it is imperative to ensure that the availability and performance are given the topmost priority since technology is the backbone of all banking services.”
The bank has two data centers: a primary facility and a disaster recovery data center. Its IT hardware is virtualized using VMware and delivered as a service - apart from five percent of the x86 systems, which are not virtualized because of requirements for compliance or performance.
The bank has already deployed hyper-converged infrastructure using software-defined all-flash storage arrays, and software-defined networks (SDN) within the data center and across all the branches.
ICICI has unified performance management, linked to business availability, for the infrastructure and the application workloads which run on this. Across the bank, services are backed by a service level agreement (SLA), with tiered priority similar to the platinum, gold or silver service tiers offered by commercial data centers. The cost of usage is apportioned to each group within the company and recovered with a chargeback mechanism.
To enable all this, numerous tools have been implemented. IoT sensors were installed for environmental monitoring and capacity planning, managed by Vertiv’s Trellis data center infrastructure management (DCIM). The Trellis tool aggregates data from temperature and humidity sensors to create real-time thermal mapping and visualization at the rack level, so the heat load can be monitored and space usage optimized.
“Implementation of DCIM enabled us to ensure online real-time measurement, making power and performance trade-offs while focusing on uptime, availability, performance and power usage,” said Shaik.
Trellis helps explore the data center’s total consumption, energy costs, and PUE. Combined with intelligent PDUs this helps ICICI to manage power and efficiency pro-actively. The DCIM operates as a closed loop system generating SNMP traps and various alerts for upstream systems such as the centralized building management system.
The BMS monitors and controls the mechanical and electrical components for five buildings across the country in one place. This includes chillers, UPS, diesel generators, precision air conditioning (PAC) units and safety equipment.
To track and manage applications, ICICI Bank developed its own customized application monitoring tool called App360, which acts as a single repository for all application details, backup policies and purging policies.
App360 provides a complete mapping of all applications, virtual machines, physical servers, storage devices, backup systems and networks. It has a built-in alerting mechanism for events like SSL certificate expiry, produces reports, and tracks server activity and incidents. It also has automatic scripts and sends reminder emails: “No such consolidated tool existed in the market place,” said Shaik.
The App360 tool ensures any unforeseen eventualities can be managed efficiently, avoiding the business impact of downtime. In an incident, the support team can check App360 for infrastructure details: when a base server goes down, it provides information about the applications which are hosted on it, so the right teams can be notified.
The bank has an incident management process with a centralized IT command center which uses multiple tools and a customized Service Manager tool from HP, to give management control over incidents and their follow-ups. The tools include Oracle Enterprise Manager, HP Operations Manager, Windows SCOM, NetApp OCI, Appnomics, Appdynamics, Dynatrace, HPOVM, SCOM, and OpsCentre - all cascaded with the in-house App360 tool.
Rack level power is handled by Sentry Power Manager (SPM), which enables socket level monitoring. Predictive analysis enables power management and capacity planning. The bank deployed modular power distribution units (PDUs), which helped manage the dynamic load, and provide a further cost saving on capex.
The monitoring systems provide historical data, and enable trend analysis of incidents, so users can identify issues and choke points, and take proactive actions before actual failures.
All this data is held in a Hadoop data lake, including structured, un-structured and semi-structured data, with data discovery, optimization and analysis. It can be accessed quickly, and searched by multiple factors, including the IP address of servers, appliances, load balancers, switches and storage, all from a single menu. All this helps staff to generate reports quickly and respond to contingencies.
All this has brought results. The bank has reduced the man-hours spent on operations by 20 percent while delivering faster responses to new requirements, enabling it to operate more flexibly.
The intelligent system predicts outages, and fixes them before they happen with proactive, preventive measures.
Rack level metrics have inspired the operations team to balance cooling and the IT heat load, alarm management, notification, and set thresholds for environmental sensors.
“Hotspots in any data center are always the biggest concern,” said Shaik. “Hotspots identified through DCIM are addressed through optimizing IT equipment placement, realignment of the raised floor tiles, adding active tiles, adding baffling/blanking panels, deploying rack mount fan trays for hot air exhaust and fine tuning the cold aisle containment.”
This leaves the facilities performing at optimal environmental conditions meeting ASHRAE standards. The precision air conditioning (PAC) units are using 48 percent less power, and the chiller power consumption has been cut by 13 percent - generating a direct benefit to operating expenditure (opex).
Even in the hot and semi-arid climate of Hyderabad, India, the data center has achieved a PUE of less than 1.5 - an achievement of which the bank can be justly proud.
This article appeared in the April/May issue of DCD Magazine. Subscribe to the digital and print editions for free here: