As of vSphere 5.1, VMware started including vSphere Replication with the base licenses without needing its own license or a Site Recovery Manager (SRM) license. What this means is that whether you’re running vSphere Essentials Plus or vSphere Enterprise Plus, you can use vSphere Replication without any additional cost. I can hear some readers out there noting, “Well, sort of, except you need an additional vCenter license and licensing to cover the replication side hosts!” You’d be right if you were to replicate your VMs to another vSphere environment, but what many people don’t realize is that you can replicate your VMs locally within your single vCenter Server to separate storage. With vSphere 6, the vSphere Replication 6 component gains a few new features over previous versions that make it more flexible and attractive – sometimes people don’t even realize they can leverage it as a part of their base license!
It’s pretty often that I see environments with third-party replication/backup solutions implemented and when asked “Hey, ever think about not renewing with [product] and use vSphere Replication?” the response is “Nah, it’s too expensive.”
How does it work?
In a traditional disaster recovery (DR) plan, one would likely be required to provide evidence that their data is being stored in a remote, secured location. In order to satisfy that requirement many people are using SAN replication to DR sites or use Veeam cloud providers to put their VMs on remote storage. SAN replication is great but requires a SAN on each side of a reliable connection with, presumably, hosts on both sides to hold the VMs in a DR fail-over. Further, with SAN replication you usually end up replicating LUNs so you have to place VMs on LUNs that are replicated and have to replicate the whole LUN even if you only care about a single VM.
Another approach is to use vSphere Replication to replicate the VMs from the vSphere layer rather than from the SAN layer – to make this a true DR solution, one would need a vCenter Server on the DR side along with hosts to support the VMs. Of course you can go further and use SRM in order to automate some of the fail-over process, but that’s not free. However, by using vSphere Replication you can replicate only the VM(s) you require, set the schedule of how often to true-up the replica, and how many replicas to keep for how many days!
For smaller installations, though, you don’t need to replicate to a remote location. In fact, if you had a single host running a virtual vCenter Server with redundant local storage where all of your VMs run you could simply add another source of storage (be it local, iSCSI, NFS, whatever) and replicate to that. So, really, the end result is that you leverage the vSphere Replication appliance to handle replicating the VM(s) on a schedule to storage other than your default, active storage. This does not protect you from a host failure or power failure, obviously. But, one of the biggest concerns small installations have is what to do if their storage solution fails – this satisfies that.
Getting Started with vSphere Replication 6
There are really only a few requirements in order to get started with vSphere Replication in vSphere 6:
- Must have vCenter Server 6.0 deployed in your vSphere 6 environment
- Must have at least one ESXi 6.0 host associated with the above vCenter Server
- Must deploy at least one vSphere Replication 6.0 virtual appliance in one (or more) vSphere 6 environments with vCenter Server 6
- If using a single vCenter Server for local replication you should have more than one storage target
- Must be able to allocate 2 vCPU and 4GB of RAM to the vSphere Replication 6.0 appliance (though this can be changed after the deployment)
Once you have all of the requirements satisfied, deploying the vSphere Replication 6.0 Appliance is quite simple! We’ll go over that next.
The first thing you’ll want to do is log in to your my.vmware.com portal and download the latest “VMware vSphere Replication 6.1 Appliance – *.iso disk image” – note that at the time of this blog entry the latest version is 126.96.36.19919 Build 3051487.
Once you’ve downloaded the ISO image you will want to extract the contents to a folder since this is not the type of ISO that can be booted from. I use WinRAR and extracted the ISO to my desktop. Once extracted, navigate to C:\Users\[username]\Desktop\VMware-vSphere_Replication-188.8.131.5219-3051487\bin:
Note in the image above that there are two OVF files included. The one we want to use is “vSphere_Replication_OVF10.ovf” – the other file, “vSphere_Replication_AddOn_OVF10.ovf” is for delpoying additional vSphere Replication 6 appliances within the same vSphere environment. Since this is our first appliance, we’ll want to deploy the standard OVF.
From the vSphere Web Client, right-click a resource pool or VM folder, or even a host, and choose “Deploy OVF Template…“:
The next screen will require you to specify a URL or local file for the OVF. For this tutorial we’re using a local file:
Once you hit next, the OVF will be validated and the review summary page will be displayed:
Click next and you’ll be offered the option to accept or decline the EULA – you have to accept to continue. You are then prompted to select a VM Folder to deploy the OVF to:
After picking the folder, you will be able to select whether you want to assign 2 vCPU or 4 vCPU to the appliance. The default is 4 vCPU but 2 vCPU works well for a lab or any small implementation:
You will then be prompted to select where you’d like to store the appliance. You can see I have many datastores available on my lab host, but choose any datastore with enough space (18GB thick provisioned) that is available to each of your hosts in case the appliance is moved around by DRS:
After you have selected where to store the appliance you will have to set up your networking options. The default is to use IPv4 and DHCP. I personally prefer a manual/static IP. You also have to select your network/port group. If you choose to set the IP by static/manual, you’ll have to provide the DNS server(s), gateway, and netmask for the appliance. Oddly enough, the actual IP isn’t set on this window:
The next window will allow you to set the password for the appliance, NTP server(s), and management IP of the appliance:
After you have the network settings configured, including the management IP address, you are presented a window regarding the vService Bindings. There are no options at this point – you are just being informed that the vSphere Replication will register itself as a vCenter Extension and the appliance will “gain unrestricted access to the vCenter server APIs”:
Finally, you’ll have the typical summary window to review showing all of your settings – you just need to click on “Finish” and the OVF will be deployed. I chose to check the box to power the VM on after deployment:
Once the VM is deployed you can access it by going to https://[ipaddress]:5480/ and logging in as the username “root” and the password you specified during the OVF deployment. You can also consider creating a DNS entry for the appliance if you wish:
When you are logged into the appliance, go to the “Configuration” tab and enter your SSO Administrator username and password. The VRM host and other attributes should already be populated. Once the settings are correct, there will be a Save and Register/Restart Service button on the right hand side. My menu shows that it’s already registered and running because I have already deployed this appliance. This is the point where the appliance will be registered to the vCenter Server (using the SSO username and password):
Once you see “VRM service is running” you’re good to go!
If you are going to replicate between vCenter Servers then you need to the above sequence on the other vCenter Server/vSphere environment. If you end up doing a lot of replication and find that you need more resources for replication you can deploy the alternate OVF package for additional vSphere Replication Appliances on each side.
If you go back to your vSphere Web Client and right-click a VM you should see a context menu for “All vSphere Replication Actions” which will lead to the ability to “Configure Replication…” If you do not see the “All vSphere Replication Actions” menu then you may need to reboot your vCenter Server.
Note: I am using the vCenter Server Appliance in my lab(s) and not the Windows-based vCenter Server. This should make no difference at all since both iterations of vCenter Server support vSphere Replication.
So now that we have our appliance(s) deployed, and can see the context menu for “All vSphere Replication Actions” when right-clicking a VM, let’s look at the meat of the replication features.
While logged into the vSphere Web Client select your vCenter Server while in the Hosts and Clusters view. Select the Monitor tab and you should see a sub-section called vSphere Replication below:
In the screenshot above you will notice that I have one “Incoming Replication” – it is a VM called ConwayLEMP1. You can see in the right-hand pane that the source is conwayvcsa.conway.local which is another vCenter Server. You can also see the name of the vSphere Replication server (VR server) as well as the status and some details below.
So, from the Monitor section of your vCenter Server you can see incoming and outgoing replication, reports, and cloud recovery settings. Remember that the options you see from this tab are relevant to the vCenter Server/vSphere environment you’re logged into – if I were to log into conwayvcsa.conway.local vSphere Web Client and go to the same area, I’d see no incoming replication but would see outgoing replication.
You can also go to the Manage tab and then choose vSphere Replication in order to see/configure target sites and replication servers:
The image above shows that we have a target site of conwayvcsa.conway.local and it also identifies the VR appliance (or cloud address) on that vSphere environment that is doing the replicating. It also shows the connection status as “connected“. The replication servers option will display all of the replication servers associated with the local vCenter/vSphere environment:
Note in the image above that the VR server is part of the vCenter/vSphere environment we’re logged into. You can see on the right-hand side that the number of replication is also presented in this view per VR server.
I won’t show you how to add a target site through the Manage window because I tend to do it when configured replication for a VM.
The above steps cover deploying the appliance so now the next portion of this blog entry will talk about configuring replication and things to keep in mind.
Replicate all of the things!
For the next portion of this article we’re going to pretend like we have a single-host environment with one vCenter server and no cloud services. Let’s consider an instance where our host has a LUN/datastore residing on a fast RAID50 local array, but because we have a super-ultra-important SQL server, we want to replicate it to a Synology DS1513+ that is acting as an iSCSI target. Seems pretty reasonable, right? Without vSphere Replication we could use Veeam (free) and manually do VM backups or pay money for the paid version to schedule them. However, you get some pretty clever options when leveraging vSphere Replication that you wouldn’t have otherwise. Replicating to another array means that if for some reason the RAID controller in our host failed, or an unfortunate sequence of events happened that the array failed, we have the VM replicated over on another device.
To start, let’s right-click on our super-ultra-important SQL server and choose “All vSphere Replication Actions” and then select “Configure Replication…”:
You may notice above that our options are to replicate to a vCenter Server or to a cloud provider. For sake of this article, we’re going to ignore the ability to replicate to a cloud provider since that is not easy to demonstrate without a subscription. So, leave the default “Replicate to a vCenter Server” and click next to move on:
The next screen you’re presented with is asking us to identify which target site we want to replicate to. If this is a new installation you will not have any options other than the local vCenter Server in the list and will have to click “Add Remote Site” if you wish to replicate elsewhere. Note: If you have to add a new target site you must have a valid credential on that vCenter Server. Additionally, the credentials you provide must have permission to perform tasks related to replication configuration, etc. I am going to select the local krcvc1.krc.local vCenter Server so as to replicate to the local environment:
Next you’ll be asked to identify which Replication Server you want to use in order to perform the replication. As we only have one VR server in this lab, there isn’t much to choose from. However, it’s best to choose “Auto-assign vSphere Replication server” so that the number of replications are distributed across available VR servers:
After clicking next, you will have to decide where you’re going to store the replica. Note that if you are replicating across vCenter Servers then this will be a datastore on the far side (the receiving side). For a local replication like we’re doing, you just want to ensure that you’re picking a datastore other than the one that VM currently resides on. To do this, click the “edit” link on the right side of the pane:
If you try to replicate the VM to a datastore that it already exists on, you’ll get the following error:
Once you have a valid location selected you’ll be able to click “OK” and then click next at the following screen:
The next screen you are presented with is a good one:
In the image above you can see that you have a couple options to choose. The first option is for Guest OS quiescing – if you aren’t sure what quiescing does in vSphere, it basically relies on VM Tools in order to flush buffers/processes running within the guest such that a snapshot (or replica) is able to grab all of the information within the guest at the time of action. In general, it is good practice to enable quiescing when available.
The other option you can choose to enable is Network Compression – this setting really only makes sense if you are going to be sending the replica over a WAN connection. If you are going to be doing the replication locally to another datastore then you are going to tie up CPU resources on the VR server as the replication occurs. For this example (a local vCenter Server replication), leave the network compression option off and click next.
The following screen is another really important one:
There’s a lot to talk about in the above image. The first option you see is the Recovery Point Objective (RPO) slider – this is basically a “how frequently you want to replicate the VM” slider. The shortest period of 15 minutes means that you will have captured the replica every 15 minutes. This means that if the VM were to explode and crash into the sea, then you’d have at worst case a 15 minute period of lost data. The longest period of time you can select is 24 hours. This means that you will only replicate the VM once a day. Think about this setting for a bit before choosing – 15 minutes sounds nice but it’ll tie up CPU and network/storage resources frequently while 24 hours may not give you enough frequency in data collection.
The next option is Point in time instances. This setting basically specifies how many instances of the replication you want to be able to refer back to. So, if you set your RPO to 4 hours and then set your Point in time instances to 2 instances per day for 3 days then you’ll have 6 restore points from the previous 3 days that you can choose from when performing a restore. This setting is also tricky because it will depend on how your RPO is set. If you have an RPO of 24 hours set and then try to do any more than 1 instance per day you will get an error: “With the current RPO setting, vSphere Replication cannot create automatically the specified number of replication instances per day. Decrease the RPO value or sync this replication manually.”
Point in time instances are snapshots of the VM – so, an initial sync is made, then at each RPO, a snapshot is created. The point in time instances you configure is basically how deep of a snapshot tree is kept before the snapshots are consolidated. Be careful – depending on the type of VM this can lead to huge amounts of disk space used on the replication target.
For general VM availability I set RPO to 8 hours and keep 1 instances for the last 2 days. This will allow me to recover the VM from 2 days ago, once from each day. There’s also a case where you won’t care about this all – I sometimes use vSphere Replication in order to migrate a VM over to another site one time at which point I don’t really care how many versions are kept because I am going to “recover” the VM as soon as it’s done, stop replication, and delete the VM from the source side. More on that later.
After setting your RPO and point in time instance settings you will have a summary page and can click finish to complete the replication configuration:
Once you click finish you may notice in your Recent Tasks pane (if you have it enabled) that you a “Configure a virtual machine for replication” task occurred but so did one or more “Create a virtual disk” tasks:
The reason you see the “Create virtual disk” tasks is because vCenter is creating a VM in a folder by the name of the original replicated VM on the datastore you specified and is creating the number of disks that the VM has. Once that is complete the Initial Full Sync will occur. You can click on the vCenter Server from Hosts and Clusters view and choose Monitor and then vSphere Replication and pick either Incoming or Outgoing replications – since this is a single vCenter Server replication event, the VM is in both the incoming and outgoing lists:
From the view above you can track the progress of the replication. The percentage bar is deceptive – it rips through the 0 – 40% range in no time even when replicating over WAN to another vCenter Server because some of the initial steps are to calculate checksums and such. The actual replication of the data takes much longer than the acceleration of the status bar would suggest. In this example I am replicating from an internal RAID50 LUN comprised of 8 SAS 2.5″ 10k disks over to a RAID50 LUN comprised of 15 SATA 3.5″ 7,200 RPM disks – replication is decently fast.
One thing to note is that if you’re replicating to another vCenter Server that you had to provide credentials for when configuring the replication, you may need to re-authenticate in order to see the updated status from time to time. This is because your session will eventually time out on the other vCenter Server. If you do everything from the local side (outgoing replications) then you should not need to re-authenticate.
Naturally the initial full sync takes a while especially if you’re replicating large VMs. Each subsequent RPO snapshot is a delta between the initial sync, whatever snapshots were consolidated, and the currently running RPO snapshot. As a result, RPO snapshots are a lot smaller, usually, than initial full sync operations.
Should your replication complete successfully (and it should), you’ll be greeted with a comforting green “OK” status:
What’s nice about vSphere Replication Monitor view is that you can see how long the last sync duration was, when it was, the size, and if you click on the Point in Time tab you’ll even see the different snapshot dates/times and sizes (showing a VM that has been replicating for a bit):
When you navigate to the datastore you replicated the VM to you can see what is going on:
Looking above, you’ll probably recognize the VMDK files but some of the other files you’d expect to see (VMX, etc.) are not really there – instead, there are some really long ID names in front of what looks like a VMX, etc. This is because vCenter/vSphere Replication is handling this “VM” as a replica and it’s not a true copied/cloned VM. Something to keep in mind.
One last item worth giving a (small) look at is the Reports aspect of the Monitor section for vSphere Replication. The reporting aspect isn’t hugely useful, but it does show you different statistics like number of VMs replicated, transferred bytes, RPO violations, and site connectivity status:
Let’s Recover Some Stuff!
You’re probably wondering, “OK, replication is cool, but what actually happens if things get bad?” Well, the first thing you need to keep in mind regarding a local vCenter Server replication/recovery is that the vCenter Server needs to be available. So, if your vCenter Server is on the same volume that your original replicated VM is on, and you lost that volume and need to recover it… you can see where this is going. vCenter must be available in order to do a recovery. That may sound scary but really just keep vCenter on something redundant and you should be fine.
Note: Technically you can recover VMs without vCenter Server being available – there are a few blog entries floating around the internet that describe this process but it’s not officially supported and obviously much more cumbersome.
If you navigate to one of the views that show you incoming/outgoing replications you can right-click one of the replicated VMs and choose Recovery:
When you click on Recovery you will be prompted with a new window that asks how you want to sync the recovered VM:
The first option above is only relevant if the source site is still accessible but the VM must be powered off. That’s not super useful. The second option is more useful because it’s the more likely recovery option if a VM is corrupt or the source site is no longer available. Additionally, if Point in Time recovery points were available you will see a message like “There is 1 currently retained instance that will be converted to a snapshot during recovery.”
So, proceeding with the Use latest available data option, you’ll see the next couple screens which ask you to place the recovered VM in a folder and resource pool. If you choose a folder or resource pool that already contains the VM you’ll see “The selected folder already contains an entity with the same name”:
And, as usual, you’ll eventually be met with a summary screen asking if you wish to turn the VM on automatically or not:
Once reviewed just click Finish. Do note the message that the network devices of the recovered VM will be disconnected.
After only a few seconds the VM will go from being a replica to being recovered. The VM will now be sorted in both VM Folders and Resource Pool views. When you visit the Monitor tab on your vCenter Server and then go into the vSphere Replication section again, you’ll now see that the VM status shows up as Recovered:
And… that’s pretty much it from here. Oh, one more tidbit that isn’t totally apparent at first; the Point in Time stuff – when you recover the VM you will not get to pick which time to restore. Instead, you recover the VM and then you are able to right-click the VM, click Snapshots, then Manage Snapshots and you’ll see that there are snapshots associated with the VM that you can restore to:
Once you re-enable the network connections on the VM (disabled at recovery time to avoid IP conflicts) then you’re set to start using the VM (so long as the original is offline/network disconnected).
What are the pros and cons of vSphere Replication?
As with anything “free” there are some give-and-takes, right? vSphere Replication is not perfect but it sure provides some neat functionality that most people would not ordinarily have available to them. Here are a couple of the pros and cons to using vSphere Replication:
- Replication from the vSphere/VM level
- No additional licenses necessary (but for base vSphere licenses)
- No requirement for SAN replication
- No requirement to have a DR or secondary site – just other datastores
- All configuration centrally located from vSphere Web Client
- Modest reporting to show status of replication and site availability
- Easy recovery process for failed/corrupt VMs
- “Point-in-time” snapshot availability for flexible recovery
- Very fast recovery since the replica is already registered in vSphere
- No simple way to track replica space – replicas just eat into datastore usage on disk so monitoring is important
- No replication throttling – you could dedicate a VMkernel to replication and use egress control on distributed virtual switches but nothing for standard switches or from the vSphere Replication function itself
- Difficult/unofficial methods for recovering VMs without vCenter Server being available
- Once recovered, there’s no way to roll the delta back into the original VM on the source site
- Once recovered, the only way to get the VM back to the source site is to re-configure replication to point to the original site as a target site – full sync needs to occur, etc.
- No automatic recovery/fail-over
Some of the cons above can be circumvented by deploying the full-fledged Site Recovery Manager, but that has associated licensing costs. For an included feature, I feel vSphere Replication offers a realistic, good value to any small to medium size business whether or not the environment has a DR site available. Not to mention, you can use vSphere Replication to send replicas to cloud providers as well.
I know a lot of people are running home labs that are becoming more and more “relied on” and many people are running variations of Veeam (free) with powershell scripts or rsync cron jobs or other somewhat makeshift solutions. vSphere Replication provides a real, supported, easy-to-use and easy-to-configure solution to making sure that your VMs are highly available even in single-host environments. I feel that vSphere Replication is probably one of the more overlooked features of the vSphere product line since it previously required disparate licensing.
I hope this article was useful – if you have any comments or questions please do not hesitate to post below!