Understanding Stretched Clustering and Disaster Avoidance

Stretched clustering is one of the most challenging topics I get when meeting with customers. Many customers think that stretched clustering is the ultimate disaster recovery solution and that it makes SRM obsolete. This is due to the fact that people think that HA will solve all their problems when it comes down to DR and that they still have the advantage of vMotion to have workload mobility between two data centers.

This is NOT true. BUT stretching it makes a very good disaster avoidance solution!!!

a vSphere Metro Storage Cluster (vMSC) is a typical solution which still needs a good DR recovery solution (most of the time)

Disaster Avoidance

This is a process that allows proactive behavior to avoid an impending outage to services. Disasters tend to affect an entire site or have an impact on the services of the entire site even if only a partial site failure is sustained. Disaster avoidance technologies allow for configuration of a vSphere host, cluster or an entire site in such a fashion that irrespective of disaster, the services being provided will continue with minimum interruption. In most cases, disaster avoidance involves brief outages to services at a site followed by an orderly restart at a recovery site. A minimum outage sustained under controlled circumstances is typically considered acceptable as an alternative to sustaining an uncontrolled and extended outage associated with a true disaster.

Downtime Avoidance

Downtime avoidance differs from disaster avoidance as the former migrates the workloads between systems or sites with no downtime and no loss of data. vSphere technologies such as vMotion and Storage vMotion facilitate moving virtual machines or virtual machine storage with no interruption of the services they provide. Configuring vMotion and Storage vMotion requires that vSphere hosts are managed within a single VMware vCenter Server datacenter object and are configured with shared access to storage and network segments.

Disaster Recovery

This process assists rapid recovery from unplanned outages that bring down services in a fashion that makes local recovery within an acceptable time unlikely. In disaster recovery scenarios the goal is to rapidly return to operational status of the services, usually in a different datacenter in a safe location. Disaster recovery solutions will help automate return to operations of services that have stopped due to catastrophic failure of infrastructure.

Host Level

  • Disaster avoidance = vMotion to avoid disaster and outage (non-disruptive)
  • Disaster recovery = HA restarts VMs (disruptive)

Site Level

  • Disaster avoidance = vMotion over distance to avoid disaster and outage (non-disruptive)
  • Disaster recovery = SRM or scripted register/power-on of VMs at recovery site (disruptive)

Types of vSphere Metro Storage Cluster (vMSC) Implementations

Single stretched vSphere cluster

  • Intra-cluster vMotions are parallelized
  • vMotion network requirements = 622Mbps/5ms RTT, L2 equivalence for VMkernel (support requirement) and VM network traffic (operational requirement) (10 ms with vSphere 5 Enterprise Plus/Metro vMotion) This is round-trip time without factoring in replication traffic.

Multiple vSphere clusters

  • Inter-cluster vMotions are serialized
  • vMotion network requirements = 622Mbps/5ms RTT, L2 equivalence for VMkernel (support requirement) and VM network traffic (operational requirement) (10 ms with vSphere 5 Enterprise Plus/Metro vMotion) This is round-trip time without factoring in replication traffic.

My Experience

On a previous project we implemented a stretched cluster solution onto a greenfield container terminal. A typical use case for a vSphere Metro Storage Cluster (vMSC) solution! We build 2 datacenters 7 km apart and established a very low RPO and RTO. The need of these two datacenters to be close to the key cranes (7km apart) makes this a perfect solution for stretched clustering.

questions that came to my mind where: – What happens when there is a big disaster and we lose the key cranes? There is no operation possible what so ever!.

If the complete port is gone, we can allow for a much longer RTO (Recovery TIME Objective) but we don’t allow much data to be gone (RPO)

This allowed us to allow the DR solution to be replicated backups to a second port 100km away, and use stretched clustering on the site itself to be very flexible and have a very good RPO and RTO in case of smaller disasters (let’s say a fire in one datacenter or a lose of one building one of the datacenters is located.

“Sidedness / preferred side” and other tips

If the dedicated connectivity between VPLEX Metro Clusters is lost, but both Clusters are still up, the very real possibility for split brain exists.


To prevent this split brain scenario and ensure that only one side of the Metro Cluster continues to allow writes to the stretched LUN, VPLEX introduces the concepts of preferred LUNS and


also without running VM’s on a preferred side, VM’s in one site could be accessing storage in another site – Creates additional latency for every I/O operation. (in case of cross connect)

With Sidedness: – VM’s run on their preferred side and storage is accessed locally.

Prior to and including vSphere4.1, you can’t control HA/DRS behavior for “sidedness”

There is no supported way to control VMware HA primary/secondary node selection with vSphere 4.x – Limits cluster size to 8 hosts (4 in each site) – No supported mechanism for controlling/specifying primary/secondary node selection. Methods for increasing the number of primary nodes also not supported by VMware.

As from vSphere 5.*, you can use DRS host affinity rules to control HA/DRS behavior.

vSphere 5 VM HA implementation changes things.

You’ll need to use multiple isolation addresses in your VMware HA configuration! minimal one on each side.

Downside, it needs smart people… what if you’re the smartest person in the room and your organization requires operational simplicity if you’re involved in the disaster? SRM is an easy push-button mechanism.

Downside2, Stretched HA/DRS clusters (and inter-cluster vMotion also) require a stretched Layer 2 network. Complicates the network infrastructure.

The network lacks site awareness, so stretched clusters introduce new networking challenges!

I have collected a lot of documents and links to share.

VPLEX Metro with VMware HA


VPLEX Validating Host Multipathing

VPLEX Hardware Installation Guide

VPLEX Security Configuration Guide

VPLEX Hardware Installation Guide

Understanding vSphere Stretched Clusters, Disaster Recovery

Chad explains: Understanding vSphere Disaster Recovery/Avoidance options Virtual Geek

EMC VPLEX Metro Witness Technology and High Availability EMC

Guide to multisite disaster recovery EMC

Implementing vSphere Metro Storage Cluster (vMSC) using EMC VPLEX VMware

Using VMware vSphere with EMC VPLEX Best Practices EMC

vSphere 5.0 HA and metro / stretched cluster solutions Yellow-Bricks

vSphere 5.0–update 1 and stretched clusters Virtual Geek

Originally Posted on Virtual Clouds.

One Response to Understanding Stretched Clustering and Disaster Avoidance

  1. Gaja Vaidyanatha May 8, 2012 at 11:27 am #

    Nice write up! The issue of “sidedness” that you surface is very relevant and is complex when write-intensive databases are involved. In my experience, it is simpler/easier to control the “write affnity” and also the problem of “avoiding a split brain”, by maintaining control outside the scope of vSphere. This is because “write affinity” needs to be controlled by the application and the database that supports it (if and when relevant).

    If we go down the path that databases need to be the “single source of truth” on whether or not it is “read/write” or “read only”, it opens some very interesting options in managing this issue. The addition of network traffic management (local and global) using technology such as intelligent switches provided by F5, allows for a very simple yet powerful abstraction layer, which directs traffic ONLY to where it belongs. The issue of “split brain” is very complex when looked at the context of databases, as data integrity is key to the consistency of any database and is difficult to control/maintain in a distributed environment without the right controls in place.

    Controlling writes in a “single location/database” at any given time and controlling the traffic that is directed to this “current read write database” is a KEY factor in maintaining consistency across multiple environments and overcoming this problem. Let me also add that we did not require the added functionality that VPLEX provided, as the replication of data was controlled by the relevant databases in question, across multiple geographically distributed private clouds.