Home
auf Deutsch           
Sign In / Register Advanced Search 
You are here:

Design & Build

The latest news and information on where and how data centers are being built

Designing two-site and three-site data centre topologies
Part 3 of IBM Global Technology Services’ paper on data centre topology design

Deciding on a data centre topology design is an essential element of planning an organisation’s mission-critical IT infrastructure. In Part 1 of the series, IBM engineers explored in detail the factors that must be taken into consideration when designing an organisation’s data centre topology. Part 2  explained key measures that need to be taken when designing for availability. Today’s instalment of the five-part series takes a deep look at two-site topologies and three-site topologies.

Two-site topologies

The most common data centre topology in use today for mission-critical business systems is the two-site topology, with two sites within 50 km of each other and operated in either active/standby or active/active mode.

In an active/standby configuration, production workload is placed in the primary (active) site A and non-production workload, such as application development or disaster recovery testing, is placed in the secondary (standby) site B (see Figure 3).

Unidirectional synchronous disk replication is used to replicate the production data from the primary to the secondary site. Capacity of the secondary site is sized to support production workload in case of the primary site fails. Recovery at the secondary site involves restarting the mission-critical business system and reconnecting end users.

In an active/active configuration, production workload is split between the two sites, using either a single (clustered) application instance distributed over a two-node cluster or two separate application instances with load balancing.

The two-site, active/active shared configuration changes the business logic – shown in Figure 3 site B – from standby to active. Clustering in the application tier is used to route transactions to active business logic at the primary and secondary sites A and B. Clustering in the database tier is accomplished with a shared database and a split mirror disk subsystem.

Under normal operation, the primary-site database manager updates the database. Then, unidirectional synchronous disk replication is used to transmit a copy of the production data electronically from primary site to the secondary one.

Following a significant primary-site failure event, recovery at the secondary site involves restarting the database using the secondary-site database manager and reconnecting end users.

With two separate application instances (under normal operation) intelligent network load balancing directs transactions to either the primary site A or the secondary site B, as shown in Figure 4. The database tier uses bidirectional asynchronous data replication to ensure that the two independent database instances have identical content. Such configuration with load balancing and separate databases is the shared-nothing approach. Following a significant site failure, the intelligent network load balancer directs all transactions to the surviving site.

Clearly, additional capacity has to be provided at each site to support the entire workload in case of site failure. The main weakness of two-site topology is site proximity, which leaves mission-critical business systems vulnerable to a simultaneous failure in the event of a regional disaster, such as a hurricane.

Three-site topologies

A three-site topology adds a third out-of-region site to the two-site topology. It may be operated in either active/active/standby or all-active configuration.

The active/active/standby configuration is illustrated in Figure 5. The two in-region sites A and B operate as the two-site active/active (shared) configuration described previously. Under normal operation, the out-of-region site C acts as the standby site. Non-production workload such as application development is supported by the standby site. Database updates are sent asynchronously from production site B to the standby site C. The capacity of standby site C is sized to support the entire production workload in case of a regional failure event that renders A and B non-operational.

Start-up at site C involves restarting the mission-critical business system and reconnecting end users. Because database updates are sent asynchronously from B to C, there is a risk of some data loss. Switching to the out-of-region site C is more complex and therefore takes longer than switching to an in-region site. This configuration ensures high availability for most outages that affect only one site, while ensuring high availability for those risks that affect both in-region sites.

The three-site all-active configuration adds a third active site to the previously described two-site active/active, shared-nothing configuration. Capacity of each site is sized to support half of the entire production workload.

Following a site failure, an intelligent network load balancer redirects transactions to the surviving two sites.

Organisations that deploy three-site configurations expect additional protection against failures. The main weakness of three-site topologies is vulnerability to a cascading failure, which occurs after one site fails and the remaining sites do not have sufficient capacity and fail under the workload surge.

Depending on the nature of the first failure event, the affected sites may not be operational for six months or longer, during which time a single point of failure exists. A subsequent failure event affecting the surviving data centres may leave insufficient data centre capacity to run the full production workload.

In Part 4 of the series (coming Monday), IBM Global Technology Services engineers explore four-site topologies and compare all three types of topologies presented in the series.

Authors: Richard Cocchiara, Distinguished Engineer and the Chief Technology Officer for Business Continuity and Resiliency Services at IBM
Dr. Hugh Davis, Lead Architect in IBM’s Global Business Resilience Consulting Practice
Doug Kinnaird, Executive IT Architect in IT Strategy and Architecture Practice at IBM


Comment Box
 
You must sign in to post
 
Username 
Password 
No Blogger account? Sign up here.
CAPTCHA Validation
Retype the code from the picture
CAPTCHA Code Image
Speak the code Change the code
 

The Design & Build Knowledge Bank contains the latest articles, news and features on how, where and when data centers are being built.
Keywords: Capacity management, construction, raised floor, Tier classification I II III IV, mega data centers, sustainable design, containers, modular, site selection, location, power, mission critical facilities.

© DatacenterDynamics 2010