50Micron.com

General

When the links go bouncing…

by Jesse on Jul.07, 2010, under General

In a true DR environment where Synchronous replication is used, it’s best to have two routes from source to target, or at the very least a switched route that can dynamically re-route in semi-real-time.

Everyone knows the story.  The link is up, everything is good, source ack’s a write to the host when the target acks it.  The link is down, replication is halted, source ack’s to the host when write is committed to cache on the source.

(Or, in this case, you have two optical routes but somehow managed to put it all through the same DWDM tray, which then failed, taking out both routes)

But i’ve seen it happen more often than not.  The “Bouncing” link.  Up, down, up down up down etc etc etc..

Very few storage systems handle that well.  Mostly because when the link is half-way there the system gets torn between the requirement (in synchronous replication) to acknowledge the link.

The good news is most host operating systems handle it wonderfully.  Sun records such events as “Retryable disk errors”, Windows and AIX I don’t think even report it.

Enter RedHat Linux, or in this case, RHEV.  RHEV uses a standard lvm2 volume group with virtual disks as logical volumes within the volume group.  Simple enough right?

Well what if you have disks from different disk subsystems?  What if  you have some mirrored and some not (the usual reason for that would be test/dev and production in the same environment. (Though putting dev/test and production in the same cluster is kinda nutty)

The situation I just saw was this.  4x 500G volumes, only ONE of them mirrored.   RHEV apparently put them all in the same volume group.

You *NEVER* put mirrored and non-mirrored volumes in the same volume group.  If for no other reason than the disk on the target array is USELESS without it’s partner disks.

In this case we had one disk out of 4 that was dropping on and off-line, some admin gets the idea to reboot the host – which of course attempts to close the volume group.  When it can’t flush those writes to disk the behavior gets a little unpredictable.  Most likely the shutdown will hang, causing some overzealous admin to go hit the power-switch…

Data loss ensues because there are cached-writes that haven’t been committed.

And they call me for help with it.  meanwhile, the freeware VMWare ESXi environment, that is also replicated, and that *I* have been pushing hard for enterprise-wide adoption of, blows right through the 36 hours of random problems with not even a sigh.

The problem with calling me for help with it, is I can just SMELL someone trying to blame the data-loss on EMC, and I want NOTHING to do with it.  So I tell them to open up a support ticket with RedHat.

Oops, they didn’t buy support.  Apparently when you throw in support the cost-benefit analysis vs. VMWare that makes it too expensive..

FML

I worked for 18 straight hours on Friday.

11 Comments more...

On Manners…

by Jesse on Jun.15, 2010, under General

So if consulting firm A talks to consultant B about an engagement, and then decides to go with consultant C because he’s cheaper it’s totally understandable.

If consultant C then calls consultant B for help with that engagement, consulting firm A should expect to get a bill for the time.

I mean first off, its rude.

Secondly, none of us work for free.

Now I’m glad to help when i can, but when it goes from “i’ve got a few questions” to “give me the step-by-step on how to do this” I have to draw the line.

</rant>

Leave a Comment more...

By request…

by Jesse on May.18, 2010, under General

Ok, believe it or not there are a number of people who would like me to stay online.

Though I can’t commit the kind of time I’d like to, so i’d like to request guest-contributors if possible.

Just email – jg(at)50micron(dot)com

Leave a Comment more...

What I run… (And why boot-from-san is a good thing.)

by Jesse on Mar.03, 2010, under General

It amazes me how much power you can get in a desktop today.

From the Quad-core “Extreme” desktop processors, to 64bit Operating Systems that are almost ready for prime-time, the options are limitless.

I recently ‘came into’ some hardware and decided to build a server workstation around it.

The ‘found’ hardware was 16G of Registered, ECC memory in a 4×4 configuration.  (From a client for whom I upgraded to 4×8 who told me to…and I quote…”Keep it, I’ve got no use for it.”  – Why thank you, don’t mind if I do.)

First step was to put a motherboard under it.  Most people know that Registered, Fully buffered, ECC memory won’t work in any motherboard.  Requires server hardware.  So I go to my favorite computer shop, Affinity Computer Technologies (Sterling, Virginia area) and I ask Bill what I I should buy.

My requirements:

  • Must take the memory I have on hand.
  • 2x PCIe x16 slots (for the dual/dual-port video cards I run)
  • at least 1x PCI-X slot for the Emulex LP9802-DC.

What he comes up with, after about 15 minutes of careful research (he’s that good) is a Supermicro X7DAL-E+ motherboard.  (Link – pictured above)

This board rocks.  Dual Xeon, supports up to 24G of RAM, and meets the rest of my requirements.

Next buy was the processors, because i certainly don’t have Xeon processors lying around.  (Well, point of fact, I do, but not THOSE Xeon processors)  I opted for a pair of Quad-Core 2.0Ghz processors.  They weren’t the best processors I could buy, but they were in my price range.

Thanks to the boys at Affinity, the whole thing was had for under $1,250.

And I bought a new case, the Antec 300, because I the case i was running wasn’t fully ATX compliant, and required that I choose between having a CD-Rom and the second processor.  (Not gonna happen)  Antec makes some pretty decent looking / sized cases for under $100.

No hard-drives please.  I’m booting from SAN.

First thing I have to say is that when I first installed Win7 on my old workstation, I learns is that powerpath does NOT work correctly at all on Windows 7.

Second thing I learn is that Windows2008-R2 64Bit almost perfectly emulates Windows7 when you put it in desktop mode and enable all the bells/whistles.

Third thing I learn is that 8CPU cores 16G’s of ram, and 2 1Gbyte video cards makes World of Warcraft SCREAM. ;)   Even when there are four VM’s running in the background. :)   Yes, I’m *THAT* nerd. :)

But the best part of it was the migration.  Now as I’ve said in the past I’ve been running the boot-from-san for some time (most recently with the Win2k8R2/Powerpath up and running), so of course the Emulex drivers were already a part of the operating system. So this is how it went:

1. Build new system.

2. Shut down old system.

3. Move Video/Emulex cards to new system

4. Connect and Power On.

5. Reboot twice as motherboard/CPU specific drivers are loaded.

6. Done.

Total migration time – about 45 minutes, including hardware swap.

Now *THIS* is the reason I strongly support and encourage boot-from-san in a datacenter.  Not only does it make it amazingly easy to protect your data.  (SnapView, MirrorView, etc) but you have the option of upgrading hardware and keeping your disks/OS in-tact.

So when the G3 HP’s go out of fashion,  you shut it down, make a simple zoning/masking change, and power the new box on.

if it’s linux, you don’t even need a reboot most of the time… (however your ifconfig settings will need to be updated – they’ll get hashed when the MAC of the network card is changed.

This is what I do for fun. ;-)

2 Comments more...

On Symantec/Norton Technical Support….

by Jesse on Feb.24, 2010, under General

Ok – I just had an interaction with a “technical” support rep from Symantec that is quite simply driving me insane.

I want to thank “Shajeewin” for renewing my objection to outsourcing jobs overseas.  Yes people, it’s cheaper.  But then again you do get what you pay for.

Background:  I am trying to set up a customer who wants to push a DR image of ONE system across the internet to my storage.  Initial push about 30G, daily updates in the megabytes range.  The hard part is this customer isn’t the type to spend a lot of money on Bandwidth, so went with Verizon DSL, with it’s whopping 128K upstream speed.

My solution for this was to sneaker-net the initial recovery point, and then push the incremental updates over the wire.  Simple, right?

So I look at Norton/Symantec Ghost.  First option, I’ve always liked Norton.

I’m changing my mind about that QUICKLY.

Here is the chat that ensued (with my comments thrown in)

(continue reading…)

8 Comments more...

Haitus….

by Jesse on Feb.21, 2010, under General

Well – thanks to a contract glitch I’m spending a few weeks off work. :) No biggie, and I’ll be back before you know it.  It’s been a great opportunity to get some stuff down around the house, build my new workstation, get backups under control, etc. :)

My new workstation is a riot, started out Building an updated workstation, ended up something completely…well….other.

The particulars:

  • SuperMicro Serverboard
  • Dual Xeon Quad-Core processors
  • 16GB of Buffered/Registered/ECC memory
  • Dual NVidia dual-head video cards.  (4 heads total)
  • Emulex LP9002-DC Fibre Card
  • Generic BDRAM drive.

It’s nuts.   First think you’ll note, no hard-drives.  I’m booting from the Clariion.  Set up a dedicated Raid-Group and built a 128G Raid-1/0 lun for the boot volume.

Lastly, the Operating System.  Well I couldn’t use Windows7 for the OS, as much as I wanted to, because PowerPath doesn’t support Windows7 (yet?) so I went with Windows 2008 R2, x64.  Nicely you can put it in desktop mode (by adding the “Desktop Experience” feature) and it does basically the same thing.  A copy of VMWare server and I run anything that requires XP in a VM.  (Like my work VPN c;oemt, which *HATES* 2008, or 64bit, or both. :)

The greatest part is booting from the SAN I have nightly snapshots taken of the OS volume, which makes life easier in case I blow something up.

It even runs World of Warcraft, which of course was also a requirement. :)

Leave a Comment more...

Government purchasing….

by Jesse on Feb.11, 2010, under General

Ok, the way government budgeting/purchasing scares the hell out of me as an engineer, and as a tax-payer it gets me absolutely barking.

It starts off in the budget.  Department heads go around and ask all of their people how much money they’re going to need to do their job.  When supplied with this information, Department head then goes and multiplies that number by 2 and adds about 10 million for good measure just to be sure.

Those budgets are then fed up the food chain.  Every person who handles the budget adds 10-20% for good measure, their little pet project, or anything else they can come up with.

This number goes in front of congress.  Congress immediately approves the budget because they don’t know a SAN box from a NAS box from a Kleenex box.  (Seriously, like 16 brain cells between them)

Budget approved, spending starts.

About six months from the end of the fiscal year, someone realizes that “hey – we have all this money left over for some reason.”

Here’s the painful part.  They start making stuff up to spend it on.  Absolutely ridiculous stuff when it comes down to it – like storage rooms full of Sun thin-workstations or uselessly huge laptops that are barely deserving of the name “laptop” (and don’t have a serial port I might add, annoying for people who have to do ground-up switch configurations)

Why?  Because:

If they don’t spend it this year they won’t be able to justify requesting a budget increase next year.

If they don’t spend it they will get their hands slapped for requesting too much money this year.

(The greatest reason EVER given to me by a government employee as to why they spend money like this:  “If we don’t spend it now we won’t have it next year.”)

My brain hurts.  And as a taxpayer, this is the kind of stuff that gets me positively barking.

Spend money – it keeps the economy moving and allows people like me to keep working.

HOWEVER – spend it smartly.  Don’t throw $1.2 million dollars away on a solution when a $600K solution will do the same job with fewer moving parts.

In fact, it’s funny – when you start spending money smartly, the side effect is you can usually get more.  So instead of buying $1,200,000 worth of storage for a Data-center, you can get $1,200,000 worth of storage and fully equip both a data-center *AND* a Disaster Recovery site.

You know – actually protecting the data you’re entrusted to protect might be a neat idea…

By the way, all examples here are hypothetical…right? ;-)

Leave a Comment more...

Celerra fun…

by Jesse on Jan.25, 2010, under General

Just a bit of Celerra fun.

The content on this site is on a Celerra NS500 back-end.  It’s protected in a show of great trust and support by nothing but Celerra SnapSure.  (30 daily checkpoints)

In the six or so months I’ve been running this there are now more than two dozen sites here, including this one.

Of all of them, the first person to actually need the SnapSure backup was…

Drum roll please.

Me.

(As if you didn’t see that coming)

I “somehow” managed to clobber the header.php file which resulted in this blog turning a bright shade of nothing but white for a few minutes.  luckily, it was *THAT* simple to type

cp .ckpt/<datestamp>/header.php ./header.php

and the problem was instantly solved.

Thanks EMC. <grin>

Leave a Comment more...

Microsoft loses data, but no backups?

by Jesse on Oct.12, 2009, under General

Microsoft, T-Mobile Apologize For Data Loss, Offer Month Credit

File this under whups.  Microsoft loses data.  That’s not a big surprise.

But I’m in a situation where Microsoft is recommending to a customer that they use almost exactly the same technology to protect their new exchange environment – there is a HUGE part of me that wants to stand up and scream that this is *NOT* a good idea.

Nevermind that in the past I’ve tried a number of times explaining to them some of the shortcomings of their design.

1. That using DAS in an enterprise environment when there is a multi-million-dollar replicated SAN already at their disposal is foolish.

2. They are going to replicate over a 50% saturated gigabit IP network when there is an 8b DWDM Fibrechannel connection available might leave their production and DR environments out of sync.

3. They are going to set all of this up on Hyper-V when VMware offers load-balancing, HA and an amazingly scalable environment is a bit short-sighted.

It’s obvious to me that the genius who designed this cluster____ pulled the design directly from a Microsoft white-paper..

But look at the Microsoft/T-Mobile debacle and ask yourself…  Is the Microsoft way always the right way?

My answer would be quite solidly…no.

4 Comments more...

I’ve got the power….

by Jesse on Sep.10, 2009, under General

But apparently the brain-trust that runs the facilities for my customer doesn’t.

If you’ve ever seen a datacenter go completely black, it’s a scary thought.

If you’ve ever seen someone actually manage to take out the power to *JUST* the servers (leaving the lights and AC functional)

Well that’s just a work of art…

That we’re *STILL* recovering from. ;-)

Leave a Comment more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...