A few have noticed the site was down for an extended period this week. I learned a few things this week.
I set up my FC system and was so excited to get it moving that I neglected to adequately test my equipment. I bought used equipment, with used drives, and put real data on them after a whopping 2 days of light testing. I never stress-tested the drives, didn’t do any kind of exercizing of them to validate that they were worthy of production data.
I also neglected to functionally test the array. While it did offer the ability to configure a hot-spare, I didn’t check to see if the hot-spare was functional before I moved data over to it. (Seeing that it was configured was enough for me)
So what happened was this. I was running on the system and all was well until a drive failed. The hot-spare didn’t invoke on it’s own, and while one drive was in a failed state, the second drive failed. Needless to say I lost half my luns and three of them were corrupted beyond repair.
Luckily I’m one of the old hold-outs. I have a tape backup system consisting of a Veritas 6.0 environment with an ATL tape library. I was able to restore to within 48 hours of the failure using tape.
My *NEW* storage back-end of consists of a Dell 2650 with 5x 146G drives. I installed CentOS5 with a 512GB NFS-mount partition and mounted them to my VMWare servers. The most interesting part is I realized that by bonding the network interfaces I’m getting the same bandwidth I got out of the 2x 1Gig fibrechannel ports.
Not being a network guy though, does anyone have any suggestions for optimising NFS for storage applications?