50Micron.com

So much fun, so little time.

by on Apr.11, 2008, under DAS, Downtime, Duck, VMWare, Vmware-NFS

A few have noticed the site was down for an extended period this week.  I learned a few things this week.

I set up my FC system and was so excited to get it moving that I neglected to adequately test my equipment.  I bought used equipment, with used drives, and put real data on them after a whopping 2 days of light testing.  I never stress-tested the drives, didn’t do any kind of exercizing of them to validate that they were worthy of production data.

I also neglected to functionally test the array.  While it did offer the ability to configure a hot-spare, I didn’t check to see if the hot-spare was functional before I moved data over to it.  (Seeing that it was configured was enough for me)

So what happened was this.  I was running on the system and all was well until a drive failed.  The hot-spare didn’t invoke on it’s own, and while one drive was in a failed state, the second drive failed.  Needless to say I lost half my luns and three of them were corrupted beyond repair. 

Luckily I’m one of the old hold-outs.  I have a tape backup system consisting of a Veritas 6.0 environment with an ATL tape library.  I was able to restore to within 48 hours of the failure using tape.

My *NEW* storage back-end of consists of a Dell 2650 with 5x 146G drives.  I installed CentOS5 with a 512GB NFS-mount partition and mounted them to my VMWare servers.  The most interesting part is I realized that by bonding the network interfaces I’m getting the same bandwidth I got out of the 2x 1Gig fibrechannel ports.

Not being a network guy though, does anyone have any suggestions for optimising NFS for storage applications?


2 Comments for this entry

  • williamwbishop

    Geez, how much data is on the website?!? Or was it a loss of a lot of data, some being the website?

    If your hardware supports it you can use jumbo frames, but short of a database, it’s rarely worth the effort. NFS by itself should treat you right, I get really good peformance out of it.

    W.

  • SanGod

    It’s not just this site for one, I run about 12-15 blog sites for different people, friends, family, etc. But this was but one of 10 luns on the box – of those 10 luns, my finance sql server, my exchange, my blackberry, the whole nine yards.

    Exchange came out of it working but with the vmdk file so corrupted I couldn’t migrate off the lun. I ended up having to build a new exchange server, move all of the mailboxes to it, and decomission the old one. The blackberry server, my front-end exchange server, and this webserver were all so totally distroyed I had to restore from backups. The blackberry and web-mail servers were windows, so even restoring from the backup was tricky and ended up being a rebuild instead of a recovery.

    This one was actually the easiest. Once I rebuilt the server, I recovered /etc/http, /www and /var/lib/mysql from backup, rebooted and it all just sort of fell together.

Leave a Reply

 

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...