Nowadays more and more companies have or are considering a 2nd datacenter in another site. Mostly the main reason for this 2nd datacenter is for disaster recovery purposes. There are 2 ways to utilize this 2nd datacenter. First one could choose for an active-passive setup, where in the 2nd datacenter bare-metal servers are waiting to be utilized when disaster strikes. On the other hand one could choose for an active-active setup. This way you can spread all active servers across the datacenters, which decreases impact when disaster strikes in one of the datacenters.
Implementing an active-active datacenter setup is also another step forward in the direction of true cloud computing, because IMHO a single datacenter can’t be a cloud by itself. As it comes to VMware in an active-active datacenter setup, there are several important things to keep in mind.
On June 29th VMware and Cisco posted a proof of concept document for VMotion between datacenters. As VMware and Cisco mention in their article you have to stretch the L2 networking domain between the sites. This is one of the most important requirements if you want to stretch you active footprint across datacenters. VMware and Cisco also mention that after the VMotion the VM has to remotely access its disk in the other site until a Storage VMotion occurs. This storage part is one of the major challenges you encounter when designing your virtual infrastructure across sites.
Running an active-active datacenter in a VMware environment got me thinking. What are the possibilities and what is smart using nowaday’s technology. I already mentioned the first requirement, which is stretching the L2 networking domain across the datacenters. So let’s assume that this requirement is already in place.
Figure 1
Let’s take it back one step and have a look at an active-passive setup. These setups have some sort of storage replication in place. The most common design I encounter is showed in figure 1. In the main datacenter there’s an ESX cluster with some sort of SAN based replication/mirroring to a second datacenter. In the second datacenter there is a passive ESX cluster available to start-up the virtual servers in case of disaster. Let’s use this setup as a starting point and turn this active-passive into an active-active setup.
The solid blue lines represent active storage connections and the dashed brown lines represent passive storage connections.
Scenario 1: Divide ESX cluster between datacenters
Figure 2
This design mirrors the active-passive setup example and just divides complete ESX clusters between datacenters. This design of course requires you to have multiple ESX clusters. Using this design you lower the impact when disaster strikes, because you have divided your active ESX clusters between 2 datacenters.
Advantage:
- Quite simple design because it’s a mirror of the active-passive design.
- There is no need to change existing cluster setups. Just move a complete cluster to the 2nd datacenter.
- All active disk-IO stays local to the datacenter.
Disadvantage:
- Requires an extra passive ESX cluster in the main datacenter for disaster recovery.
Scenario 2: Stretched ESX cluster 1
This design stretches the ESX cluster over 2 datacenters by putting half of the ESX hosts in the cluster in datacenter 2. All active storage stays in datacenter 1, which is still a single point of failure, because when datacenter 1 goes down, the VMs in datacenter 2 will go down with it. Because of the SAN based replication you could assign the mirrored storage to the cluster in datacenter 2 and startup the VMs as a disaster recovery solution. If datacenter 2 goes down, the ESX cluster in datacenter 1 can take over the crashed VMs if your resources in datacenter 1 allow it.
Another point of concern is the placement of your HA primaries. If you have a large cluster and have more than 4 ESX hosts in a datacenter, it is theoretically possible that all 5 HA primaries reside on one datacenter. If that particular datacenter goes down, VMware HA will not work. The maximum supported number of HA primaries is 5 and you cannot control their placement, although there are some undocumented and unsupported possibilities.
Figure 3
Because the active storage resides in datacenter 1, VMs running in datacenter 2 have to access their storage across the WAN link. This introduces additional latency and will degrade performance. If you use synchronous SAN replication or mirroring this latency is multiplied by 2 for all writes because of the extra write-back you have to wait for (synchronous copy) to datacenter 2. This means that every write IO operation will suffer 4 times the latency of the WAN link. When using an asynchronous replication technology, you don’t have to wait for the replication to finish and every write IO operation will suffer only 2 times the latency of the WAN link.
The path a write IO operation follows is illustrated as the numbered red lines in figure 3:
1. Write IO operation from the VM to the disk (primary storage box)
2. Remote copy write operation from the primary storage box to the remote disk
3. Write IO acknowledgement from remote storage box to primary storage box
4. Write IO acknowledgement from primary storage box to the VM
Advantage:
- No passive ESX hosts needed in both datacenters, but you might need extra capacity dependant on your disaster recovery requirements. Although the extra capacity is shown as passive in figure 3, it can of course be fully utilized as active in the cluster.
Disadvantage:
- Shared storage is active in only one location, which is a single point of failure for the running VMs. If this location goes down, all VMs go down.
- There is no control over your HA primaries, which could result in VMware HA not working. To ensure that HA primaries reside in both datacenters, your cluster can’t exceed 4 hosts per datacenter if you stretch across 2 datacenters.
- All VMs in datacenter 2 have to access their storage in datacenter 1, which will decrease performance. If you use synchronous SAN mirroring this latency is multiplied by 2 for all writes.
- If VMware DRS is enabled, VMs can be automatically moved between datacenters, which impacts performance if the VM is moved to a host that is not local to the storage.
Scenario 3: Stretched ESX cluster 2
This design is similar to scenario 2, but now we also divide the active storage between the two datacenters. This way every VM accesses its storage local to the datacenter.
Figure 4
DRS in this setup can be killing. Because DRS in unaware of the stretched design, it could VMotion a VM to the other datacenter.
Advantage:
- No passive ESX hosts needed in both datacenters, but you might need extra capacity dependant on your disaster recovery requirements.
- Active storage in both datacenters, so active disk-IO stays local to the datacenter.
Disadvantage:
- As in scenario 2, there is no control over your HA primaries, which could result in VMware HA not working. To ensure that HA primaries reside in both datacenters, your cluster can’t exceed 4 hosts per datacenter if you stretch across 2 datacenters.
- You can’t use DRS, as DRS has no such thing as site affinity. If you enable DRS, DRS might VMotion a VM to a host in the other datacenter which results to all active disk-IO for that VM traveling across the WAN link and consequently impacting VM performance.
Scenario 4: Split ESX cluster 1
Figure 5
This design simply splits an ESX cluster into 2 separate ESX clusters which are divided across the datacenters. If we look closer to this design, this design is in fact a variant of scenario1. Besides splitting an ESX cluster you can also combine two separate clusters if business policy allows it, but you need to make sure that storage and networking for both clusters is equally configured before combining the clusters or else they can’t be each other’s failover in case of disaster.
Advantage:
- No passive ESX hosts needed in both datacenters, but you might need extra capacity dependant on your disaster recovery requirements.
- Active storage in both datacenters, so active disk-IO stays local to the datacenter.
Disadvantage:
- You can’t split ESX clusters with less than 4 hosts, because that would result in a cluster that is not redundant. To resolve this issue, you need to add extra ESX hosts so that you have at least 2 servers in each datacenter.
Scenario 5: Split ESX cluster 2
This design is a simply a combination of scenario3 and scenario4, which at first sight has the additional ability to VMotion/Storage VMotion VMs between the clusters or datacenters. But beware! If you configure this, both the active storage and its replicated counterpart are assigned to the same cluster. As far as I know, this will generate errors on your ESX hosts like: Clash between snapshot (vml.xxx…x:1) and non-snapshot (vml.xxx…x:1) device. So to take advantage of this setup, you have to un-assign the replicated storage to both clusters. When disaster strikes, you have to re-assign the replicated storage to the surviving cluster before you can start recovering VMs.
Figure 6
Advantage:
- No passive ESX hosts needed in both datacenters, but you might need extra capacity dependant on your disaster recovery requirements.
- Active storage in both datacenters, so active disk-IO stays local to the datacenter.
- Possibility to VMotion/Storage VMotion between sites manually.
- You can utilize both VMware HA and VMware DRS without the drawbacks from scenario 3, because you have 2 separate clusters and both technologies operate at the cluster level.
Disadvantage:
- Complex storage configuration and possibly error prone, because 2 different clusters share a common set of storage luns
- You can’t assign both the active storage and its replicated counterpart to the same cluster as this would generate errors.
- Disaster recovery becomes more complex as you have to re-assign the replicated storage to the surviving cluster before you can start recovering VMs.
- VMotioning VMs to the other datacenter will results to all active disk-IO for that VM traveling across the WAN link and consequently impacting VM performance, because of the extra WAN latency. This action should always be followed by a storage VMotion to correct this, but one might forget.
- You can’t split ESX clusters with less than 4 hosts, because that would result in a cluster that is not redundant. To resolve this issue, you need to add extra ESX hosts so that you have at least 2 servers in each datacenter.
Conclusion
Stretching an ESX cluster across datacenters (scenario 2 and scenario 3) is a bad idea, because you can’t use VMware HA or VMware DRS. VMware DRS currently doesn’t have any functionality that takes different datacenters/sites into account and would result in VMs running on the “wrong” side of the cluster compared to their storage. This is also true for VMware HA, where VMware HA might put all primaries on one side of the cluster. I even doubt if scenario 2 and scenario 3 are supported by VMware, so let’s just assume they’re not!
If you want to design your virtual infrastructure across datacenters I recommend you to choose scenario 1, scenario 4 or scenario 5. I would choose for scenario4 as this is a rather simple setup and doesn’t require any passive ESX hosts. Although things might be different in the future as VMware is continuously developing new features and maybe one day will provide something like site-affinity/awareness for VMware HA and VMware DRS. Time will tell…
If you have any other opinions, insights, ideas, options or comments, please share! I’m very interested in hearing from you.
Additional readings
Long-distance VMotion
http://blogs.vmware.com/networking/2009/06/vmotion-between-data-centersa-vmware-and-cisco-proof-of-concept.html
http://virtualgeek.typepad.com/virtual_geek/2009/09/vmworld-2009-long-distance-vmotion-ta3105.html
http://www.virtuallifestyle.nl/2009/09/vmworld-09-long-distance-vmotion-ta3105/
http://vinf.net/2009/06/30/long-distance-vmotion-heading-to-the-vcloud/
http://www.yellow-bricks.com/2009/09/21/long-distance-vmotion/
http://www.simonlong.co.uk/blog/2009/06/30/wan-vmotion-a-step-closer-to-a-private-cloud/
Stretched clusters / cloud
http://virtualgeek.typepad.com/virtual_geek/2008/06/the-case-for-an.html
http://rodos.haywood.org/2009/01/moving-workloads-split-cluster-or-cloud.html
http://thevirtualdc.com/?p=135
http://blogs.cisco.com/datacenter/comments/what_is_not_networking_for_the_cloud
http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns836/white_paper_c11-557822.pdf