Hey everyone! The point of this blog post is to document upgrading my home server with enough storage to where I don’t have to prune stuff out constantly. Between my wife and I, we had our Synology DS214+ up to around 87% full which left little space for any backups or imaging of our desktops/laptops and I had to turn my security camera into “motion only” recording mode. We are creative people and create a lot of files – I wanted to revamp my storage in my ESXi server enough such that I never had to think about deleting a RAW image file again and regretting it.
My last entry regarding my home server (my Lenovo TS140 build with 8 drives using an LSI 9260-8i Controller) was about general performance with VMs using various types of storage from local SSD, local RAID10, to NAS-based iSCSI RAID5. For those couple articles I was using 8 Western Digital Caviar Black 1TB disks in a RAID10 internally which, after formatting, provided for 3.64TB or so of usable space. The reason for choosing RAID10 was because I was not entirely comfortable putting those disks in any sort of distributed parity configuration because the drives are known to not support TLER which could be problematic should the controller kick the drives out of the array if the drive takes longer than expected to return the controllers request.
So, my intentions were to upgrade from the Black drives to Reds, building out my capacity while at the same time going for better speed:space:cost compromise. Because you lose half of your storage when building the array with RAID10 it is by no means cost- or space-effective. A more mindful solution would be RAID5, whereas your resulting usable space is N-1 where N is the number of disks, less one. So, 8 WD 1TB disks would yield 7TB before formatting which sounds like a better option until you consider the details.
RAID5 can survive a single physical disk failure because it rotates parity across the array as a whole. That part is great. Read performances are increased because every member disk is participating in reading the contents, so that’s great too. However, write performance suffers – instead of writing data striped across many disks in the array, each change to the disk requires a read of data, reading parity, and then writing the data, and writing the parity before one operation is complete. So, usually in a RAID5 you will get the combined IOPS (roughly) of all disks when reading, but when writing you’ll get about 1-disk worth of performance – lame! But, nothing is free, right?
A lot of people these days are looking at RAID6 or whatever the equivalent is within ZFS (FreeNAS/ZFS call it RAIDz2) which allows for a 2-disk failure before the data is lost. Great. But guess what? The write penalty is even higher than with RAID5! Why does it even exist then? Well, consider this table from BackBlaze:
|Backblaze Hard Drive Failure Rates Through December 31, 2014|
|HGST Deskstar 7K2000
|2.0 TB||4,641||3.9||1.1%||0.8% – 1.4%|
|HGST Deskstar 5K3000
|3.0 TB||4,595||2.6||0.6%||0.4% – 0.9%|
|HGST Deskstar 7K3000
|3.0 TB||1,016||3.1||2.3%||1.4% – 3.4%|
|HGST Deskstar 5K4000
|4.0 TB||2,598||1.8||0.9%||0.6% – 1.4%|
|HGST Megascale 4000
|4.0 TB||6,949||0.4||1.4%||1.0% – 2.0%|
|HGST Megascale 4000.B
|4.0 TB||3,103||0.7||0.5%||0.2% – 1.0%|
|Seagate Barracuda 7200.11
|1.5 TB||306||4.7||23.5%||18.9% – 28.9%|
|Seagate Barracuda LP
|1.5 TB||1,505||4.9||9.5%||8.1% – 11.1%|
|Seagate Barracuda 7200.14
|3.0 TB||1,163||2.2||43.1%||40.8% – 45.4%|
|Seagate Barracuda XT
|3.0 TB||279||2.9||4.8%||2.6% – 8.0%|
|Seagate Barracuda XT
|4.0 TB||177||1.7||1.1%||0.1% – 4.1%|
|Seagate Desktop HDD.15
|4.0 TB||12,098||0.9||2.6%||2.3% – 2.9%|
|Seagate 6 TB SATA 3.5
|6.0 TB||45||0.4||0.0%||0.0% – 21.1%|
|Toshiba DT01ACA Series
|3.0 TB||47||1.7||3.7%||0.4% – 13.3%|
|Western Digital Red 3 TB
|3.0 TB||859||0.9||6.9%||5.0% – 9.3%|
|Western Digital 4 TB
|4.0 TB||45||0.8||0.0%||0.0% – 10.0%|
|Western Digital Red 6 TB
|6.0 TB||270||0.1||3.1%||0.1% – 17.1%|
Pretend for a minute you were unlucky enough to have purchased 18 Seagate Barracude 7200.14 3.0TB disks with an annual failure rate of 43.1%! Oh man that would be terrible! Of 1,163 drives BackBlaze has in use, they’re reporting 43.1% failure in 2014 alone. Not only that, but you’ve got 18 of them! So if you had implemented a standard RAID5 across disks (and personally I wouldn’t have) you’d be able to survive one disk failure before the array is lost. So, if two disks fail you’re hosed. So pretend for a moment that you have 18 disks in the array and you made one of them a hot spare. Now, as soon as the first drive fails the array starts rebuilding. Rebuilding is one of the hardest, longest processes that can occur on an array as it has to read ALL parity data and rebuild the lost disk by interpolating all of that data into one drive. What do you think the likelihood is that you lose another hard drive during the rebuild process? I’d say it’s pretty darned high! Enter RAID6. Now you have two parity disks so you can actually lose two disks before you’re out to lunch. Again, a long rebuild time but at least you’ve got more resilience.
Ok so this is all boring stuff – what about my home server? Well, I wanted to have as much usable space as possible but with good resilience without wasting a whole ton of disk space. But, I also wanted good performance. I specifically picked the RAID controller that I am using (LSI 9260-8i) with this goal in mind as it supports RAID50. RAID50 is basically two or more RAID5 virtual disks striped together with RAID0 on top. So, in my case, I have 8 WD Red 4TB disks with two 4-drive RAID5 VDs striped. The benefit here is that I get great read speed and not so bad write speed and on top of it, I can lose two disks so long as they are in separate RAID5 groups. The chart below may make more sense:
The only downside to my configuration is that I am using large disks and so rebuilds will take a while. So, I am hanging out there a bit in terms of rebuild times for a second disk to fail. However, like I said, my luck would have to be real bad because I can fully lose one half of my RAID50 before having issues. I can lose two disks so long they are in RAID5 A and RAID5 B separate. Because I only have 8 connections to my controller, I also have no hot spares. So, if one drive did call it quits, I would have to open the server and ID and remove the bad drive and replace it before any rebuilding would occur. I am OK with that – remember this is a home server not an enterprise solution (even if we are using enterprise grade controllers, etc.).
So now what? Well, let me describe what I did to accomplish all of this. The first thing I did was make sure all of my VMs were migrated off of the RAID10 array that I had in place. This is as simple as migrating the data from one datastore to another – this was only possible because of the Synology DS1513+ I have in place with 5 WD Red 4TBs in RAID5 as an iSCSI target. If you do not have space to put your data while you change the array, then you are stuck. So that is something to consider up front. My total VM storage is around 4.3TB so either finding an external disk or setting up a NAS for iSCSI is the only option there.
Once the VMs were migrated off of the local RAID10, I went into MegaRAID (from within my one VM no less) and managed the controller from there. I unmounted and detached the RAID10 datastore from my ESXi host. Next I simply broke the RAID10 VD (Virtual Disk) by deleting it from under the Logical tab in MegaRAID. After that, just go to the Physical tab and set each disk for removal. The controller will shut the disk down and you’ll be able to unplug them. This was a little tricky because my ESXi server is in a 4U generic case without hot swap bays. The drives are hot swap capable, you just need to be careful not to mess anything up while you’re poking around inside.
Once that’s done, I installed these babies:
Installing the drives was not a super easy task due the case I am using, but I didn’t want to pay a ton for hot swap bays that I might use once, if ever. You can see the drive placement below and get a general idea of how I went about swapping the drives out:
You’ll also see that you have “unconfigured” drives from the physical tab:
So long as you see that the unconfigured drives and capacity matches what you’d expect, you’re ready to create the Virtual Disk!
Creating the RAID50 in MegaRAID isn’t as intuitive as it is in some other programs or even in some other controller BIOS settings. I’ve attached screenshots here with brief descriptions walking you through this in case you are using LSI controllers (which you likely are):
- Launch MegaRAID, connect to the controller, and select “Create Virtual Drive” you will be prompted for Simple or Advanced – choose Advanced:
- You’ll want to select RAID 50 as your RAID level. You’ll also notice that there’s a Span 0 on the right hand side – this is the first “RAID5” array:
- Add half of your array – so for instance, Span 0 will use Slot 0, 1, 2, and 3 on my setup:
- Next, you will want to click Create Span so that the software creates the second half of the RAID50 in Span 1:
- Only once you have all the disks added and both Span 0 and 1 populated will you be able to hit Next:
- Next up you just set some stuff if you’d like – here you will name the VD, etc. You can set the size up however you’d like, but I am using one VD on this setup. If you have a Battery Backup Unit (BBU) make sure you set Write Policy to Write Back with BBU else your write speeds will be horrible. I use Fast Initialization and default 256KB strip size. The strip size can be adjusted if your array will be mostly reads or mostly writes – mine is used for VMs so I want a good balance, so 256KB fits that. Set it up as needed and then click Create Virtual Drive at the bottom:
- Once you’ve clicked Create Virtual Drive, you should see this summary:
- If everything looks good, click Finish and it should successfully create the virtual drive as specified:
- Now you should see the two Spans and disk membership status under Physical or Logical tabs:
- You should also see the virtual disk with the total capacity (including post-format sizes and RAID level requirements) from the main screen:
- And finally, in my case since this is an ESXi host, if I jump into vSphere and rescan all the storage for my host, I see the new space pop up:
All done! The cool part about my capacity addition is that my ESXi host never went down and my MegaRAID software runs from a VM within that same host – kind of funny when you think about it. The LSI 9260-8i is the original non-rebranded version of the controller that Dell uses in their Rx10 series (R410, R610, R710, etc.). Dell calls their rebranded card the PERC H700. These controllers all support hot swapping of disks as well as hot creation of VDs. You will need to use Dell OpenManage or LSI MegaRAID to create the VDs from within an OS with a server up and running (and the OS not on the RAID!), but you can replace a failed drive by simply removing it and installing a replacement. You can bet that any SAS2108-based controller is an LSI 9260 underneath. In fact, you can even flash LSI firmware to Dell PERC or IBM cards as well but that’s a story for another day.
So, after all is said and done we started with a Lenovo TS140 with no RAID originally. Then, we added a hardware controller and 8 WD Black 1TB drives. We later migrated from a RAID10 consisting of 8 WD Black 1TB drives to 8 WD Red 4TB drives using MegaRAID within a VM living on that very same host that we were swapping disks on, all while never taking the host down. I performed this upgrade on February 10, 2015 and here you can see my ESXi host shows 45 days of up-time:
Success! I don’t think I’ll be making any entries about expanding storage anytime soon 🙂 Well, so long as I don’t backup Blu Ray discs without compression…