I recently (seriously, less than 36 hours ago) experienced my first disk failure in an array running ZFS (FreeNAS/FreeBSD). The pool configuration is comprised of 4 RAIDZ1 3 disk vdevs involving 1TB disks. Many people on the internet will tell your RAID5/RAIDZ1 is dead but they claim this with no context. When I’ve mentioned running RAIDZ1 to people their first reaction is to tell me how RAIDZ2 is better. Sure, with 3 – 6TB disks I’d probably run RAIDZ2. However, when dealing with 3 disks vdevs with “small” (by today’s standards) disks, is it worth running double or triple parity or risking your data with single disk redundancy? Well, read more.
The volume StoragePool (ZFS) state is DEGRADED: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The message above is not something you want to see on a Monday as you come into work. Catching this disk failure was actually a bit of a lucky situation. FreeNAS sent me the failure message above at 2:15AM November 7 and I had missed it when cruising my email. Coincidentally, a co-worker and I put together a brief shell script (that I will share in a future post) that emails me drive status and I had been tinkering with the script the day prior. When the script sent me a daily email (at 9:00AM), I noticed that the output was showing bad sectors – lots of them. I thought for sure that I must have pulled the wrong column from the smartctl output or something:
dev/da9 status is Passed with 0 bad sectors. Disk temperature is 41.
/dev/da10 status is Passed with 0 bad sectors. Disk temperature is 39.
/dev/da11 status is Passed with 65535 bad sectors. Disk temperature is 38.
/dev/da12 status is Passed with bad sectors. Disk temperature is 31.
The “Passed” figure comes right out of smartctl which is somewhat concerning. But, wow – 65,535 bad sectors… it filled the counter up. That’s not good.
I checked the report from the day prior and it was 0. I then manually ran smartctl against the device and it was in fact reporting 65,535 bad sectors and the volume was degraded. Crap. I don’t have any hot spares in the array because someone is there 99.9% of the time to swap a drive in but of course no one would be available for over 2 weeks. I used Storage vMotion to evacuate data I cared about and made sure my Veeam backups had been completing successfully… just in case.
Because I am running FreeNAS 9.10 on my Dell R510, I knew I’d be able to hot-swap the drive with no down time. It was simply a matter of making the ~1 hour drive and swapping the hardware. I posted on the FreeNAS forums just to double check the process. When I built this array I used sas2ircu to map out the serial numbers of the drives to the slot number on the backplane – this is critical to successfully pulling the correct disk:
# sas2ircu 0 DISPLAY…Device is a Hard diskEnclosure # : 2Slot # : 9SAS Address : 500065b-3-6789-abf1State : Ready (RDY)Size (in MB)/(in sectors) : 953869/1953525167Manufacturer : ATAModel Number : Hitachi HUA72101Firmware Revision : A74ASerial No : GTE000PAJX8NKEGUID : N/AProtocol : SATADrive Type : SATA_HDD
Note: If you do run RAIDZ1 or any combination of single-disk redundancy per vdev or span, do realize that pulling the wrong disk out during replacement could result in total pool failure. Don’t mess this up!
After correlating the serial number of the failed drive to the sas2ircu output (it was also convenient that slot #9 was not flashing any activity LEDs) I pulled the tray out of slot 9 and none of my VMs on the array exploded. I then slid a new Dell Enterprise 1TB 7.2k RPM SATA disk into position and clicked the “Replace” button:
This kicked off the ZFS resilvering process and changed the alert from a “A volume is degraded, fix your stuff” message to:
I could see in the disk view that the serial number of the disk being “replaced” now had no description which I use to identify what slot it is in. So, I checked sas2ircu again and updated that:
Then I waited. I made sure the resilvering process hit the 50 – 75% mark before heading home. In all, the resilvering process took 1 hour 18 minutes to scan 1.91TB and resilver 162GB of data:
Success! Though, honestly, I didn’t expect this to fail. The drives I am using are mostly enterprise units and since they’re 1TB in size there’s not a ton of thrashing during the rebuild. However, people on the internet would have you believe that this was destined for disaster. I do need to pick up a couple more enterprise class SATA disks since I only have one or two spares now but that’s my own problem. In a 12-disk configuration, I do think that a pool made up of 4 RAIDZ1 vdevs with 3 disks each is the best compromise for usable space vs. performance.
Anyway, I am very content with my decision to choose FreeNAS as the solution to my shared storage dilemma in my vSphere cluster. The product is built on a very reliable, resilient filesystem and offers tons of flexibility (like supporting my switchless 10 GbE configuration). I’ll post up a couple video clips I have from replacing the disks to share just how simple this system is to use. It’s always nice when technology works as intended! I am giving FreeNAS the credit here, but the real hero is ZFS. It just works.
Thanks for reading and as always please subscribe and feel free to comment!