How to add enough storage to not care anymore (with no down time)!

Hey everyone!  The point of this blog post is to document upgrading my home server with enough storage to where I don’t have to prune stuff out constantly.  Between my wife and I, we had our Synology DS214+ up to around 87% full which left little space for any backups or imaging of our desktops/laptops and I had to turn my security camera into “motion only” recording mode.  We are creative people and create a lot of files – I wanted to revamp my storage in my ESXi server enough such that I never had to think about deleting a RAW image file again and regretting it.

My last entry regarding my home server (my Lenovo TS140 build with 8 drives using an LSI 9260-8i Controller) was about general performance with VMs using various types of storage from local SSD, local RAID10, to NAS-based iSCSI RAID5.  For those couple articles I was using 8 Western Digital Caviar Black 1TB disks in a RAID10 internally which, after formatting, provided for 3.64TB or so of usable space.  The reason for choosing RAID10 was because I was not entirely comfortable putting those disks in any sort of distributed parity configuration because the drives are known to not support TLER which could be problematic should the controller kick the drives out of the array if the drive takes longer than expected to return the controllers request.

So, my intentions were to upgrade from the Black drives to Reds, building out my capacity while at the same time going for better speed:space:cost compromise.  Because you lose half of your storage when building the array with RAID10 it is by no means cost- or space-effective.  A more mindful solution would be RAID5, whereas your resulting usable space is N-1 where N is the number of disks, less one.  So, 8 WD 1TB disks would yield 7TB before formatting which sounds like a better option until you consider the details.

RAID5 can survive a single physical disk failure because it rotates parity across the array as a whole.  That part is great.  Read performances are increased because every member disk is participating in reading the contents, so that’s great too.  However, write performance suffers – instead of writing data striped across many disks in the array, each change to the disk requires a read of data, reading parity, and then writing the data, and writing the parity before one operation is complete.  So, usually in a RAID5 you will get the combined IOPS (roughly) of all disks when reading, but when writing you’ll get about 1-disk worth of performance – lame!  But, nothing is free, right?

A lot of people these days are looking at RAID6 or whatever the equivalent is within ZFS (FreeNAS/ZFS call it RAIDz2) which allows for a 2-disk failure before the data is lost.  Great.  But guess what?  The write penalty is even higher than with RAID5!  Why does it even exist then?  Well, consider this table from BackBlaze:

Backblaze Hard Drive Failure Rates Through December 31, 2014
Name/Model     Size Number
of Drives
Average Age
in years
Annual
Failure Rate
95% Confidence
Interval
HGST Deskstar 7K2000
(HDS722020ALA330)
2.0 TB 4,641 3.9 1.1% 0.8% – 1.4%
HGST Deskstar 5K3000
(HDS5C3030ALA630)
3.0 TB 4,595 2.6 0.6% 0.4% – 0.9%
HGST Deskstar 7K3000
(HDS723030ALA640)
3.0 TB 1,016 3.1 2.3% 1.4% – 3.4%
HGST Deskstar 5K4000
(HDS5C4040ALE630)
4.0 TB 2,598 1.8 0.9% 0.6% – 1.4%
HGST Megascale 4000
(HGST HMS5C4040ALE640)
4.0 TB 6,949 0.4 1.4% 1.0% – 2.0%
HGST Megascale 4000.B
(HGST HMS5C4040BLE640)
4.0 TB 3,103 0.7 0.5% 0.2% – 1.0%
Seagate Barracuda 7200.11
(ST31500341AS)
1.5 TB 306 4.7 23.5% 18.9% – 28.9%
Seagate Barracuda LP
(ST31500541AS)
1.5 TB 1,505 4.9 9.5% 8.1% – 11.1%
Seagate Barracuda 7200.14
(ST3000DM001)
3.0 TB 1,163 2.2 43.1% 40.8% – 45.4%
Seagate Barracuda XT
(ST33000651AS)
3.0 TB 279 2.9 4.8% 2.6% – 8.0%
Seagate Barracuda XT
(ST4000DX000)
4.0 TB 177 1.7 1.1% 0.1% – 4.1%
Seagate Desktop HDD.15
(ST4000DM000)
4.0 TB 12,098 0.9 2.6% 2.3% – 2.9%
Seagate 6 TB SATA 3.5
(ST6000DX000)
6.0 TB 45 0.4 0.0% 0.0% – 21.1%
Toshiba DT01ACA Series
(TOSHIBA DT01ACA300)
3.0 TB 47 1.7 3.7% 0.4% – 13.3%
Western Digital Red 3 TB
(WDC WD30EFRX)
3.0 TB 859 0.9 6.9% 5.0% – 9.3%
Western Digital 4 TB
(WDC WD40EFRX)
4.0 TB 45 0.8 0.0% 0.0% – 10.0%
Western Digital Red 6 TB
(WDC WD60EFRX)
6.0 TB 270 0.1 3.1% 0.1% – 17.1%

Pretend for a minute you were unlucky enough to have purchased 18 Seagate Barracude 7200.14 3.0TB disks with an annual failure rate of 43.1%!  Oh man that would be terrible!  Of 1,163 drives BackBlaze has in use, they’re reporting 43.1% failure in 2014 alone.  Not only that, but you’ve got 18 of them!  So if you had implemented a standard RAID5 across disks (and personally I wouldn’t have) you’d be able to survive one disk failure before the array is lost.  So, if two disks fail you’re hosed.  So pretend for a moment that you have 18 disks in the array and you made one of them a hot spare.  Now, as soon as the first drive fails the array starts rebuilding.  Rebuilding is one of the hardest, longest processes that can occur on an array as it has to read ALL parity data and rebuild the lost disk by interpolating all of that data into one drive.  What do you think the likelihood is that you lose another hard drive during the rebuild process?  I’d say it’s pretty darned high!  Enter RAID6.  Now you have two parity disks so you can actually lose two disks before you’re out to lunch.  Again, a long rebuild time but at least you’ve got more resilience.

Ok so this is all boring stuff – what about my home server?  Well, I wanted to have as much usable space as possible but with good resilience without wasting a whole ton of disk space.  But, I also wanted good performance.  I specifically picked the RAID controller that I am using (LSI 9260-8i) with this goal in mind as it supports RAID50.  RAID50 is basically two or more RAID5 virtual disks striped together with RAID0 on top.  So, in my case, I have 8 WD Red 4TB disks with two 4-drive RAID5 VDs striped.  The benefit here is that I get great read speed and not so bad write speed and on top of it, I can lose two disks so long as they are in separate RAID5 groups.  The chart below may make more sense:

RAID_50

The only downside to my configuration is that I am using large disks and so rebuilds will take a while.  So, I am hanging out there a bit in terms of rebuild times for a second disk to fail.  However, like I said, my luck would have to be real bad because I can fully lose one half of my RAID50 before having issues.  I can lose two disks so long they are in RAID5 A and RAID5 B separate.  Because I only have 8 connections to my controller, I also have no hot spares.  So, if one drive did call it quits, I would have to open the server and ID and remove the bad drive and replace it before any rebuilding would occur.  I am OK with that – remember this is a home server not an enterprise solution (even if we are using enterprise grade controllers, etc.).

So now what?  Well, let me describe what I did to accomplish all of this.  The first thing I did was make sure all of my VMs were migrated off of the RAID10 array that I had in place.  This is as simple as migrating the data from one datastore to another – this was only possible because of the Synology DS1513+ I have in place with 5 WD Red 4TBs in RAID5 as an iSCSI target.  If you do not have space to put your data while you change the array, then you are stuck.  So that is something to consider up front.  My total VM storage is around 4.3TB so either finding an external disk or setting up a NAS for iSCSI is the only option there.

Once the VMs were migrated off of the local RAID10, I went into MegaRAID (from within my one VM no less) and managed the controller from there.  I unmounted and detached the RAID10 datastore from my ESXi host.  Next I simply broke the RAID10 VD (Virtual Disk) by deleting it from under the Logical tab in MegaRAID.  After that, just go to the Physical tab and set each disk for removal.  The controller will shut the disk down and you’ll be able to unplug them.  This was a little tricky because my ESXi server is in a 4U generic case  without hot swap bays.  The drives are hot swap capable, you just need to be careful not to mess anything up while you’re poking around inside.

Once that’s done, I installed these babies:

IMG_6000

8 Western Digital Red 4TB disks for my storage

Installing the drives was not a super easy task due the case I am using, but I didn’t want to pay a ton for hot swap bays that I might use once, if ever.  You can see the drive placement below and get a general idea of how I went about swapping the drives out:

TS140 board transplanted

TS140 board transplanted

 

After inserting your new drives and opening MegaRAID you’ll see from the main screen that you have “Unconfigured Capacity” in this case 29.1TB:4TB_installed3

You’ll also see that you have “unconfigured” drives from the physical tab:

4TB_installed1

 

So long as you see that the unconfigured drives and capacity matches what you’d expect, you’re ready to create the Virtual Disk!

Creating the RAID50 in MegaRAID isn’t as intuitive as it is in some other programs or even in some other controller BIOS settings.  I’ve attached screenshots here with brief descriptions walking you through this in case you are using LSI controllers (which you likely are):

  1. Launch MegaRAID, connect to the controller, and select “Create Virtual Drive” you will be prompted for Simple or Advanced – choose Advanced:  RAID50_config1
  2. You’ll want to select RAID 50 as your RAID level.  You’ll also notice that there’s a Span 0 on the right hand side – this is the first “RAID5” array:RAID50_config3
  3. Add half of your array – so for instance, Span 0 will use Slot 0, 1, 2, and 3 on my setup:RAID50_config4
  4. Next, you will want to click Create Span so that the software creates the second half of the RAID50 in Span 1:RAID50_config5
  5. Only once you have all the disks added and both Span 0 and 1 populated will you be able to hit Next:RAID50_config6
  6. Next up you just set some stuff if you’d like – here you will name the VD, etc.  You can set the size up however you’d like, but I am using one VD on this setup.  If you have a Battery Backup Unit (BBU) make sure you set Write Policy to Write Back with BBU else your write speeds will be horrible.  I use Fast Initialization and default 256KB strip size.  The strip size can be adjusted if your array will be mostly reads or mostly writes – mine is used for VMs so I want a good balance, so 256KB fits that.  Set it up as needed and then click Create Virtual Drive at the bottom:RAID50_config7
  7. Once you’ve clicked Create Virtual Drive, you should see this summary:RAID50_config9
  8. If everything looks good, click Finish and it should successfully create the virtual drive as specified:RAID50_config10
  9. Now you should see the two Spans and disk membership status under Physical or Logical tabs:RAID50_config12
  10. You should also see the virtual disk with the total capacity (including post-format sizes and RAID level requirements) from the main screen:RAID50_config11
  11. And finally, in my case since this is an ESXi host, if I jump into vSphere and rescan all the storage for my host, I see the new space pop up:RAID50_config13

All done!  The cool part about my capacity addition is that my ESXi host never went down and my MegaRAID software runs from a VM within that same host – kind of funny when you think about it.  The LSI 9260-8i is the original non-rebranded version of the controller that Dell uses in their Rx10 series (R410, R610, R710, etc.).  Dell calls their rebranded card the PERC H700.  These controllers all support hot swapping of disks as well as hot creation of VDs.  You will need to use Dell OpenManage or LSI MegaRAID to create the VDs from within an OS with a server up and running (and the OS not on the RAID!), but you can replace a failed drive by simply removing it and installing a replacement.  You can bet that any SAS2108-based controller is an LSI 9260 underneath.  In fact, you can even flash LSI firmware to Dell PERC or IBM cards as well but that’s a story for another day.

So, after all is said and done we started with a Lenovo TS140 with no RAID originally.  Then, we added a hardware controller and 8 WD Black 1TB drives.  We later migrated from a RAID10 consisting of 8 WD Black 1TB drives to 8 WD Red 4TB drives using MegaRAID within a VM living on that very same host that we were swapping disks on, all while never taking the host down.  I performed this upgrade on February 10, 2015 and here you can see my ESXi host shows 45 days of up-time:ESXi_uptime

Success!  I don’t think I’ll be making any entries about expanding storage anytime soon 🙂  Well, so long as I don’t backup Blu Ray discs without compression…

Author: Jon

Share This Post On

7 Comments

  1. Thank you so much for this I know want to build the Ts140 and get a raid card. Thank you so much for taking the time to type this up!

    Post a Reply
  2. Nice article with a thorough explanation of why this setup works best for you.

    I was just curious as to how you are planning on backing up 22TB of data? Or will you just cross your fingers that RAID will be enough and that you don’t have a controller error which corrupts your data or a loss of more than one drive in each RAID5 subset?

    Post a Reply
    • Thanks Hutsy – I have a DS1513+ with 16TB usable in it. I also have a DS214+ with 3TB usable. I actually rsync a lot of content (the db and content of this site for instance) over to the DS214+ which is a RAID1 – that happens daily at midnight. Then once a month it’s uploaded to a cloud server I have. Anything “critical” is rsync’d from the internal 22TB to the DS1513+ 16TB. Critical meaning my personal photography, videos, programs, content, etc. Anything else, for the most part, can be rebuilt or re-assembled.

      Post a Reply
  3. Wouldn’t raid 6 have been a better choice?

    Post a Reply
    • The word “better” is very tough. Better at having parity across more drives? Yes. Better at performance? No. Remember, RAID50 is not as popular amongst most users because it requires higher level cards that support it. RAID50 is essentially two RAID5 arrays. I am sure there is some math that could be done, but the chance of 1 out of 4 drives failing, twice (for each RAID5 array), is rare. The chance of 2 out of 8 drives failing and them both being in the same subset of 4 disks? Rarer yet, I am sure. RAID6 is great for mass storage but the rebuild times are longer and the performance is much lower than that of RAID50. I’ll be doing a comparison performance-wise soon though!

      Post a Reply
      • I was curious about the chances of loosing 2 disks in the same stripe myself and found a “nifty” site that can calculate your probability of survival on your Raid 50 array… in short, you have a 57% chance of surviving a 2 disk failure in a Raid 50(based on your 2 groups of 4 disks each). So that would imply that statistically there is a 43% chance that 2 disks from the same 4 drive stripe could fail. Loosing any 2 disks in a Raid 6 still guarantees 100% survival.

        So, you gain performance with your Raid 50, but increase your likelihood of a total failure in a 2 disk loss to a 43% chance.

        Site: http://www.raid-failure.com/raid10-50-60-failure.aspx

        Post a Reply
        • Cool link – I think I’ve seen that site before. Yeah, you definitely do run more risk at RAID50 than 6, no doubt. But I don’t feel that the calculators are factoring the rebuild of RAID6 usually. For instance, RAID6 does have a 6 factor write penalty which can make rebuild times insanely long – I’ve seen 6 – 8 disk RAID6 arrays take a week or more to rebuild! That’s pretty intense and a lot of time to open yourself up for failure IMHO!

          Post a Reply

Leave a Reply to Henry Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.