So I had a fun little project the other day which involved migrating VMs across hosts that were not able to access shared storage. We don’t need no stinkin’ clusters ’round here! Actually, it would have been really nice to have had a cluster and shared storage in this case but this ended up being a bit of fun.
Edit: Note that in this blog entry I am not only migrating the VM from one host to another, but also leveraging storage vMotion to move the data from one local datastore to another local datastore by creating a third, virtual-backed NFS datastore. I had a couple people mention the ability to “share nothing” migrate the VMs within the same datacenter. I think there’s a misunderstanding between “migrating” and vMotion/svMotion. These VMs cannot afford any downtime. As such, there needs to be shared storage.
So somehow some VMs were handed over to developers for testing and, of course, those VMs became production even though they were on a test cluster. This happens all the time at every place I’ve worked. Some real fun occurs when you blow away the test cluster or break it testing things and then production goes down – sigh.
Right now, my group plays around on a Dell VRTX chassis with three blades. Our biggest (and only) issue with it right now is that it has a redundant PERC controller. While this would usually be a good thing, it’s a handicap in this case because Dell messed up with the PERC8 firmware such that you have no write-back cache when you have a primary and fail-over controller in one chassis since they have no way of accessing the shared cache between the two. Dell fixed this with a new firmware release so we want to update our controllers with this but we can’t because we have pseudo-production VMs on the test cluster (smh, I know, it pains me to say it). Problem is, we have two VMs that need to go on a specific host that is designated to a project and another VM that needs to be migrated to any other host but one in this VRTX chassis. Each test ESXi host has robust internal storage (we use a few Dell R720XD’s as hosts so we’ve got 8+ spindle RAID10 and RAID50 arrays on local storage) but alas the internal storage is not accessible outside of said host. So, we need to storage vMotion the data between hosts with no downtime and then vMotion the VMs onto other hosts. Here’s what the environment looks like:
So, the VMs we are concerned about live on esxtest4 currently and two of them need to go on esxtest6 while another one needs to go on esxtest7. “Test Cluster” is the Dell VRTX chassis (Two of the blades are ESXi 6.0 GA already, and thus not in the cluster as esxtest4 is ESXi 5.5 U2). Obviously the desired hosts are outside of the chassis and cluster. Here’s what the shared storage looks like in the VRTX:
And the whole reason for this adventure is what we’re ultimately going to fix in another post:
So anyway, to get around this issue a co-worker of mine came up with a pretty clever workaround. He proposed we create a Linux VM on the host that I want to move things to, give it adequate disk space, and create an NFS export out of the Linux VM. Then, from the ESXi hosts involved, add the NFS share and create a datastore on it, storage vMotion to this now shared datastore and then storage vMotion from the shared storage down to local! This might not seem very cool, but I thought it was super clever compared to downloading VMDKs and taking VMs offline and all. So, I did just that:
I created a CentOS 6.6 VM called zzzMDMNFS1 on the esxtest6 host that we need to move the stuff to. You can see above the specifications are pretty meager but I’ve added a 1TB secondary disk. Once up, I installed nfs-utils, etc. and created the mount point and exported it with loose permissions using NFS. I actually created a second CentOS VM called yyyNFS1 on esxtest7 since I have a VM I need to storage vMotion over to there as well. Below is the exports and df results:
Easy peasy! Now, just hop on back into vSphere and add a datastore with the NFS type and setup the necessary connection information to both hosts involved in the move. Remember, the reason we can’t storage vMotion the data is because we do not have shared storage. So, to satisfy that, we need to share the storage to both locations (esxtest4 and esxtest6/esxtest7). That’s also why in the screenshot above I exported the NFS share with a wildcard for the allowable clients. Ideally, you’d limit this to only the hosts you want to allow to connect but this is only staying up for as long as it takes to vMotion the data. Once you’ve added the NFS datastore to both hosts, you’ll find it in the related objects/datastores view. Also remember that the datastore name needs to be the same at each end – in this case I called it MDMNFS1:
Now it’s just a matter of migrating the storage from the “local storage” of the esxtest4 VRTX array to the “shared storage” served out by the VM we created. We’ll select to “Change storage only” and select the NFS datastore we created. Notice that you will only be able to thin provision the storage when moving it. This is because only NFS storage featuring hardware-acceleration can provision thick. For this purpose thin is ideal anyway since we are going to ultimately storage vMotion the data back down onto storage local to the host we ultimately want the VM on.
Once the VM was migrated to the shared storage (and remained fully accessible and was being written to the whole time by client production processes) the VM needs to be vMotioned over to the host it will reside on. So, let’s try and do that:
Doh! You can’t vMotion across virtual datacenters… what the heck? Yeah I knew this, but forgot to remedy it. If you take a look at the first image in this post you’ll see that there are two datacenters defined and the esxtest6 and esxtest7 hosts reside in one not containing esxtest4. Shucks. It’s easy to resolve, though – just right-click the host and choose “Move to” and change the datacenter. Once you do that, you’re ready, right? Let’s see:
Alright… yeah, so I got sloppy around this point. What’s the main requirement for vMotion to work? You guessed it, a vMotion interface! Because the esxtest6 and esxtest7 hosts are not part of a cluster they were never configured for vMotion. Once I edited the networking on each host and allowed vMotion on the management interface we were good to go! We didn’t have this problem migrating the storage component of the VM because the datastore was added using NFS and you do not access NFS/iSCSI targets through the vMotion vmkernel. After you make the necessary changes for vMotion networking, you’re all set!
This is a great trick to getting VMs from isolated hosts (by isolated, I mean non-shared storage) with no downtime. Remember, for all of this to work, the hosts need to be managed by the same vCenter Server, they need to have their vMotion vmkernels on the same subnet, they need to have the datastores created with exactly the same name, and they need to be in the same virtual datacenter. Once you’ve met all of this criteria you can get that VM off of one host and onto another. Oh, you’ll also need licensing for vMotion and all, but we’ll assume you have that and all you’re lacking is a SAN/NAS.
Moving ~1TB in two VMs to esxtest6 and ~415GB single VM to esxtest7 took about 45 minutes or so. It was plenty fast for what it was – it obviously wasn’t a super fast SAN storage vMotion, but if you had 10GBe connectivity it’d be really nice. Sure, you could have done this with a NAS or SAN that you might have laying around, but you’d need to provision shares and LUNs and let’s be honest, this is a little more fun. If I had a NAS or SAN laying around that these hosts could reach I would probably have defaulted to doing it conventionally. But, with a little creativity involved this was both an interesting concept and fun at the same time!