Sizing an ESXi host – a true balancing act!

One of the most common questions asked in VMware communities and forums revolves around the specifications for new virtual hosts.  It can be very daunting to click that “checkout” button for new vSphere hosts without fully understanding the pro’s and con’s of every option you’ve selected.  This article won’t lay it out in full detail, but will try and touch on the key components and how to optimize your configuration.  Brace yourself, this is going to be a long one.  This is by no means a definitive authority on host configuration, but it’s how I generally approach the subject.  For a lot of readers of my blog this may be boring, obvious information but hopefully you all find something useful buried within.

Knowing the components

What I find helpful is laying out the different “focuses” of the virtual host and try and develop an understanding of what I expect from each component.  For instance, we know that there are four primary areas of interest:  compute, memory, storage, and network.  Taking a glance at each is simple – we know we need some of each, right?  We may even know how to optimize each component individually.  When combining components, the question is how much, what kind, and why?  Or, why not?  Remember this – everything has a trade-off.

It is extremely common in my experience for a client to approach a new virtual host replacement from only a single side – “We need the fastest processors…” or “We need as much storage as possible…” but at what cost?  Poor choices don’t just cost money; they can cost you your ability to select other components and lock yourself in on a losing platform as well.  Don’t forget – most vSphere implementations exist as clusters.  If a server specification isn’t ideal from the start, you’ll like build out the rest of the cluster to match so everything is similar/equal…  you can see where this is going!  The fastest, largest, most expensive, more feature-rich items may perform fantastically alone, but what about the package as a whole?  You need to know what your workload is going to consist of!

Picking the compute

There’s a sort of harmony that occurs between processor and RAM specification that can become dissonant quickly.  If you pick a “middle of the road” CPU you can almost never go wrong.  Putting a spin on the whole “No one has ever been fired for buying Cisco” quote, much the same, “No one has ever been fired for picking the middle of the road CPU.”  If you were to spec a new server you may be looking at something in the 8-core or 10-core range operating at between 2.3 – 2.6 GHz which is complete reasonable.  If you venture to either end of the spectrum regarding either core-count or frequency, you need to know more about what you’re doing.  You’ll notice that the high frequency CPUs have fewer cores, and the high core-count CPUs have lower frequency.  You need to know what your workload is to make this decision!  This is literally where things go bad fast.  Why?  Read on.

Optimizing vs. Maximizing

Let’s pretend for a second that you decided, “No!  I won’t settle for average, middle-of-the-road performance.  I must have speed!” and you picked a pair of E5-2643 v3 CPUs at 3.4 GHz.  Guess what?  You’ve picked 6-core CPUs which will result in only 24 vCPU per host.  They’re very fast CPUs and will be nice for a vSphere setup if you have a small business and only need to support a dozen or two VMs.  But what if you have a need for 40 – 60 VMs?  There’s a really good Dell whitepaper by Scott Lowe on CPU allocation in vSphere.  The tldr of it is that when it comes down to vCPU:pCPU, “1:1 – 3:1 is no problem” but “3:1 – 5:1 may begin to cause performance degradation.”

So, if you were to stick with your selection of E5-2643 v3’s and had two dual-socket hosts (6-cores, 12-threads hyperthreaded, 24 vCPU in dual sockets, 48 vCPU in two hosts), then you could feasibly over-allocate CPU to 96 vCPU without much issue.  One issue, though, is that if you start assigning 4, 8, or 16 cores to a VM then you can really start to hurt your clusters performance due to ready time and scaling.  So, even though you could afford 96 vCPUs allocated, the reality is that this depends on (as covered earlier) the workload type as well as the size of the VMs as a whole.

There exists another problem, though.  When you assign vCPU to a VM you really want to avoid assigning more cores than a socket has to provide.  With the E5-2643 v3 CPUs you only have 12 vCPU per socket – that means you will really start to ask a lot of the cluster if you create 16 vCPU VMs as it’ll schedule across two sockets, significantly increasing the CPU Ready Time.

This is where higher core-count CPUs come into play.  Installing an E5-2660 v3 at 2.6 GHz would result in 40 vCPU per host, or 80 vCPU in a 2 host cluster instead of only 24 vCPU!  That’s a pretty significant increase in core-count at the expense of some sheer speed.  There are also plenty of 12 and 14 core CPUs as well.

There’s another big difference between the E5-2643 v3 and E5-2660 v3 that we haven’t talked about – Thermal Design Power, or TDP.  The E5-2643 v3 has a 135W TDP while the E5-2660 v3 has 105W.  This is actually huge not from just the standpoint of power use but also thermal dissipation requirements.  As mentioned, you need to pair your components wisely.  There are some configuration combinations that will not work – for instance, if you use a 135W TDP processor in a Dell M620 blade then you can kiss some DIMMs goodbye:

E5-2667 in Dell M620

E5-2667 v2 in Dell M620 blocks DIMM slots above and below each heatsink

Pictured above are a pair of E5-2667 v2 CPUs (8-core, 3.3 GHz) in a Dell M620 blade – the customer wanted to fill it with RAM except that proved to be a problem because the heatsinks for 130W TDP CPUs extend over 2 DIMMs on each side limiting the number of DIMMs to 20, not 24.  So, in this particular instance the choice to go with the (then) fastest CPUs cost each blade 4 DIMMs of capacity, which means its 768GB capacity is down to 640GB.  Sure, that’s still a lot of RAM, but when the customer intends for the cluster to hold 2.3TB of RAM and it can only 1.92TB, that is a problem.  This is why it is so important to plan ahead and check compatibility/configurations with the vendor before picking parts.

Sizing up the RAM

Just like choosing compute, RAM can throw people for a loop as well.  With so many different configurations of memory and speeds to choose from it’s often assumed it’s best to buy as fast as possible.  This isn’t always true.

Firstly, the number of channels that your processor/mainboard support will drive how your memory is configured in order to get the capacity you require.  If you have a triple-channel configuration then you will have a different layout than a quad-channel configuration and so on.  Further, just because you can buy 1866 MHz (or faster!) RAM doesn’t mean you should – if your CPU only supports 1600 MHz units then the extra money spent is literally wasted.

It can be a little more confusing yet – some systems when fully populated are not capable of running the maximum supported speed when all channels are filled.  One such example is the Dell R710 which has a triple-channel memory controller.  When you populate all 3 banks with memory then the memory clocks down to 800 MHz.  For this reason, make sure you understand the memory system in full so you don’t buy 1333 MHz DIMMs for your R710 only to have them run at 800 MHz.

Sticking with our R710 example: while the R710 supports 288GB per host, you will have a faster performing machine by running only 12 DIMMs rather than a full 18.  That lowers the quantity of memory to 192GB at full speed vs. the 288GB you might budget the cluster for, though.  Something to consider!

That covers some physical gotcha’s but doesn’t talk about allocation.  Look at it this way, what good is 288GB of RAM in a Dell R710 if you only have a pair of 4-core CPUs in it?  Sure, you can build some VMs with healthy amounts of RAM allocated, which may be appropriate sometimes.  But, overall, if you’re going to make use of all of your RAM then you’re going to have some seriously high over-subscription ratios on your vCPU:pCPU as covered earlier.

One awesome function of VMware’s vSphere environment is that you can over-allocate RAM without, usually, any performance issues.  Many 8GB Windows 2012 R2 servers only end up utilizing 20 – 50% of the RAM allocated (depending of course on the workload/server type).  So, an over-allocation of 125-150% over-allocation of RAM isn’t too irresponsible.  If you dedicate SSDs to Virtual Flash Host Swap Cache then you could probably over-allocate a little further since swapping/ballooning will occur to SSD instead of whatever the VM resides on.

On the contrary, wouldn’t it be silly to load up an R720 or R730 with dual 12- or 14-core CPUs (48-56 vCPU per host) with only 32GB of RAM installed?  There’s nothing wrong with starting small and upgrading over time, but this is where planning is required.  If you are going to order a system with only 32GB of RAM to start then make sure it’s filled with 8GB DIMMs and not 2GB DIMMs.  It is cheaper to spec out 32GB of RAM with 2GB DIMMs than it is with 8GB DIMMs which may be attractive at first, but naturally 16 2GB DIMMs may keep you from a very high total capacity without scrapping all of the RAM already in the system.

Putting it all on top of some storage

By this point in the planning of a new virtual host I would hope to have decided what I need from a CPU and RAM perspective after checking my requirements.  The next big step is to select some storage to put the setup on.  Depending on your needs, you can choose internal storage, direct-attached storage, or SAN-based storage.  There is, of course, VMware’s VSAN product that is available which is kind of a hybrid of internal storage on each host aggregated into SAN-based storage – I’ll be talking about this in a later blog post.  There is a place for each method but let’s keep it simple and pretend we have a single host with internal storage.  Let’s also pretend we’re dealing with something like a Dell R730XD with a ton of internal 2.5″ bays.

If we spec’d our R730XD with dual 12-core CPUs and 384GB of RAM then we are probably hoping to hold a whole bunch of VMs on this host without even over-allocating CPU.  So, it would make sense then that we need storage that can provide enough performance (IOPs) to support many VMs from storage layer.  When it comes to storage, though, there’s always two factors to consider: performance vs. capacity.  If we need 5TB of space (let’s ignore RAID for this example) then we could buy 5 1TB SATA disks.  The problem with this plan is that 5 disks alone, even with the write-back cache of a RAID controller, do not support the IOPs required by many, many VMs.

Instead of the 5 1TB SATA drives, maybe we would want to pair our CPU and RAM selection with 12-14 2.5″ 450GB SAS 15k drives and or some SSDs for a cache tier in order to provide enough IOPs for our setup.  Of course, this entire thought process assumes that the VMs that will reside on this host will require decent disk performance – some VMs do and some do not.  As you can probably imagine, the storage selection for your host all depends on the type of workload.

Personally, storage is an often overlooked contributor to how a cluster or single host performs as a whole.  You can put the fastest CPU and memory combination in a host and then hang an array off of it that only measures 600 IOPs and it will literally crawl.  Be careful when selecting your storage since, just like the memory situation earlier, once you start adding a type of disk to an array you may need to stick with that or dump the setup entirely depending on the configuration.

Getting the data out of the box

The final aspect of the host specification will be determining how much network throughput is needed.  Again, if you’re building a 24 vCPU host with 64GB of RAM or a 72 vCPU host with 768GB of RAM you will require different network setups as you’ll presumably have a different number of VMs on the host.  But, like anything else, it is important to have an idea of what you need up front.  Sometimes a vSphere cluster will have 80 VMs that is made up by small talkers with the occasional high-throughput VM.  Other times, depending on the cluster and company, every VM is manipulating data, churning it into something else, and sending it back out to another process and pegs 10 GBe interfaces all day.

For the most part, 1 GBe interfaces are not obsolete yet – you can still get a very good performing setup out of 1 GBe NICs even with 20 – 30 VMs per host.  If you find that the network layer tends to become over-utilized with your VM setup then you’ll have to consider adding more NICs or adding faster NICs.  With Distributed Virtual Switches you can setup the uplinks from the host in a LACP bundle and get more bandwidth out of multiple interfaces.  Or, you can simple jump ship and go straight to 10 GBe interfaces, switching, etc.

A word on LACP, though:  Many people fail to understand that LACP does not make (4) 1 GBe NICs into (1) 4 GBe NIC.  The analogy I like to use is that of a highway and lanes.  For instance, consider a single 1 GBe NIC as a highway with a lane going in one direction – the speed limit may be 65 MPH.  Each car is behind the next and no 2nd car can go a different direction from the one in front of it until it reaches the switch.  Now consider a highway with 4 lanes of traffic.  The speed limit didn’t change!  The cars can still only travel at 65 MPH.  However, cars can can go ahead or behind other cars, and can be traveling 4-cars wide down the highway.  That’s what LACP is.  The only point of it is such that the server (or host in this case) can have an aggregated throughput of 4 GBe but that’s because the traffic is comprised of 4 individual 1 GBe streams.  If a vSphere host has LACP configured and the VMs on that Distributed Virtual Switch serve up data to only 4 end users from their workstations connected with 1 GBe links then in a best-case scenario, each end user may be able to saturate their 1 GBe connection to the server.  But, no single user is going to get more than 1 GBe throughput.  If you are looking to transfer files at the fastest rate possible, 1 GBe LACP is not for you.  In fact, you might even consider 10 GBe LACP instead.

It’s important to assess network utilization because tearing about the core infrastructure of your virtual environment in order to go to 10 GBe is a pain.  It will require down time unless your hosts came with 10 GBe interfaces already.  Additionally, you may have CAT5E cabling in your rack and have to completely re-cable with CAT6/6A/7 to get where you need to be.  It’s very frustrating – save yourself the hassle and do a couple hours worth of research on your existing setup or calculate a hypothetical one and decide what you need up front if you can.  Of course if you’re in management no one likes to see Cisco 2960-X show up for a deployment only to be replaced with Nexus units only months later.

Achieving that sweet spot and planning ahead

The best thing you can do in your environment when planning to purchase some hosts for a vSphere environment is do an assessment of your resources today.  Sometimes that’s very difficult but most of the time you can get a 90%-accurate understanding of what you’re working with.  You can use monitoring utilities like Microsoft System Center Operations Manager (SCOM) or SolarWinds Application Monitor (SAM)  or similar in order to gather statistics of current server resource utilization (you should really be using something like this to monitor even if you’re not trying to assess a situation).  With these systems you will be able to find which servers/applications are heavy-hitters in terms of CPU, RAM, storage, and network performance.  You will be able to anticipate what you need or don’t need in a cluster as a whole and aren’t just “guessing”.  If you don’t want to pay for those specific applications I have had good luck using LibreNMS and SNMP to monitor switch and server performance and just looking at overall trends.

In my experience, people skip this and just build a cluster or host with “good specs” and go for it.  This can work or it can leave you really frustrated and have management questioning why they need to purchase more hardware 3 months into a deployment.  I’ve seen it go the other way  as well – I’ve come across 10-host clusters that have 5% CPU utilization and 15% RAM utilization on each host.  Sure, growth potential is huge, but so is annual licensing costs!  Can you imagine explaining why you need to maintain 10 vSphere Enterprise Plus licenses when you could fit your entire infrastructure within 3 – 4 hosts!?  It all goes hand in hand.  vSphere offers a great feature called Distributed Power Management which will put hosts to sleep when the cluster is under-utilized.  This is cool, except that the hosts you’re turning off are costing you licensing!  It makes sense for some instances, but in general, most companies will want to try and size things up so as to not have idle hosts.

The quick and dirty numbers

Taking the measurements is the best way to get yourself as close as possible to a properly sized cluster.  But, if you just want to rough it in here’s a quick list of figures you can build around (this is assuming you’re sizing an N+1 production cluster based on existing infrastructure):

Required:

  • Support for up to 300% CPU over-allocation (this will put you at the top of the 3:1 recommendation cited in the white paper quoted earlier) for average VMs, 200% for critical application VMs
  • Support for up to 125-150% without Virtual Flash Host Swap Cache and 200% with
  • Support for up to 16 vCPU VMs (without spanning two sockets)
  • Enough hosts to support for an N+1, maybe even N+2 redundancy factor
  • Enough NIC interfaces to be cable in a highly available configuration (account for NIC failure and switch failure)
  • SAN/Storage performance that will provide 125% of desired/estimated IOPS (to account for business functionality while backup windows occur)

Assumed:

  • vSphere licensing that provides/satisfies all Network I/O expectations – (hint:  only vSphere Enterprise Plus enables Distributed Virtual Switches and LACP with ingress and egress I/O control)
  • vSphere licensing that provides cluster HA (high-availability) support
  • vSphere licensing that provides Distributed Resource Scheduler (DRS) support

Testing and labs

Finally, none of this applies to test environments and labs.  I know, I can hear it now, “We read all the way to the end of this huge post and that’s it?!” – basically!  Let me explain my personal lab environment from a host level.  I have two primary hosts in my lab (two labs, really) – one is a Lenovo TS140 that I kind of spun out of the Lenovo tower case.  The other host is a Dell R710.  The specifications are as follows:

 Lenovo ThinkServer TS140

  • Intel Xeon E3-1246 v3 CPU (3.5 GHz, 4-core with hyperthreading, 8 vCPU)
  • 32GB DDR3 ECC RAM
  • LSI 9260-8i RAID controller
  • 8 Western Digital 4TB Red (NAS) 5400 RPM drives in RAID50
  • 2 Samsung Evo 850 250GB SSDs
  • Intel Pro/1000 VT Quad-Port NIC

 Dell PowerEdge R710

  • Dual Intel Xeon E5649 CPU (each is 2.53 GHz, 6-core with hyperthreading, 24 vCPU total)
  • 144 GB of DDR3 ECC RAM
  • Dell PERC 6i
  • 8 Dell 146GB 2.5″ 10K RPM SAS drives in RAID50 (internal)
  • Dell PERC6E connected to an MD1000 with 15 Dell Enterprise 1TB SATA 7200 RPM drives in RAID50
  • Intel Pro/1000 VT Quad-Port NIC

Both are great for what they do.  Originally I only had the Lenovo TS140 setup.  It did well for a what I test.  But, after adding the E3-1246 v3 with 8 vCPU I realized I could lean hard on the over-allocation of CPU since this is a lab and ended up not having a whole lot of memory left, and 32GB of RAM is the maximum amount supported.  Even with 8 vCPU, I never encountered any appreciable CPU Ready Time readings.  Considering a 4:1 or 5:1 over-allocation of CPU meant I really needed to be thrifty with RAM.  Running a mixture of Windows and Linux meant I’d start with 2GB of RAM for a Windows VM and 512MB for a Linux VM and see how bad it performed.  I am able to lab a ton of stuff like this, but it was clear that if I was going to start testing Exchange DAGs and SQL clusters I would need a little more RAM.  The RAID50 on the Lenovo TS140 performs great despite being WD Red drives.  I also have some SSDs for virtual read cache in vSphere.

I picked up my R710 with 4-core CPUs and added 72GB of RAM.  It was actually decent but I came across 144GB of RAM and when I threw that in realized that I would never be able to run as many VMs as my RAM would support unless I added more cores to the single host.  So, that’s when I picked up the 6-core CPUs.  Now, it’s perfect – I allocate RAM at reasonable levels and with 20 – 25 VMs running I am still under 50% allocation on the RAM and only 10 – 15% on the CPU and have almost all 2 and 4 vCPU VMs.  My R710 is fully populated with RAM, so it does run at 800 MHz but again, this is a lab and entirely acceptable.

These two examples, though, show you that on one server (Lenovo TS140) I really put more CPU in than I could really spare RAM for.  I would have to settle for fewer VMs but could allocate more CPU.  On the other system (R710), I had a ton of RAM but not enough CPU to populate the RAM without being silly-generous with allocations.  Both work great, neither cost me a fortune, and I knew what I was getting into.  But, pretend for a second that I was my own client and someone told me to build a 3.5 GHz 8 vCPU/32GB RAM host for a small 8 – 12 VM environment and I’d realize that I was riding up at 90-95% RAM utilization in order to have standard RAM specifications for my guests.

The take away

When I see people ask about server specification suggestions for production VMs I almost never answer with anything specific.  It’s too difficult.  If it were a 12-week ordeal to figure out what your physical environment or current virtual environment is doing, I could see the frustration and wanting to shortcut the work.  But it really does not take much effort to configure a monitoring solution, be it SCOM, SAM, or something like LibreNMS, and simply looking at the data.  Have 12 physical servers with 4-core CPUs all constantly at 80% between production hours?  Well, simply do the math!  If you have bursts of load throughout the day then just use one of the monitoring tools to plot 48 hour usage and draw a line through the peaks.

A lot of this comes from experience and knowing what to expect – to put it plainly, if you can tell that a VM is a VM by using it, your configuration isn’t right.  The only way to get comfortable with setting things up is to do it.  I hope someone finds this article useful!  Let me know if you have any questions or suggestions on improving it!

Author: Jon

Share This Post On

3 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Share This