Archive for the 'Data Migration' Category

Best Nerd Movies of all time.

Wednesday, December 31st, 2008

Yeah – one of those posts.  But I’m not talking about the latest blockbuster, and I’m not even talking about anything even remotely “Matrix-Like” (though it does make the list)

I’m talking about the movies us techie-types hate to admit that we loved.

Here they are:

Honorable Mention: Short Circuit – campy but loveable.  Good movie for the kids.

10. Weird Science – John Hughes at the height of his mediocrity.

9. The Net – Love me some Sandra Bullock, but who believes that hackers are going to take over the world with Macs? (or that Ms. Bullock is going to be the one to stop them?)

8. Independence Day – See “The Net”: Jeff Goldbloom isn’t believable as a hacker/nerd/geek in the first place.  Hacking an alien species with a Macbook is beyond bad.

7. Hackers – Angelina Jolie before she really was anyone, and yes, we get to see her websites.  Plot-line was fun, dialog was horrible, tech was sadly misrepresented – the scary part though is I’ve known people like Cereal Killer….

6. Tron – So nerdy even the nerds won’t own it.  Stay tuned – rumors of a Tron2 running around, I saw a pirated preview on youtube (Link – Here)

5. Real Genius – Val Kilmer plays a slightly left-of-center genius.  Great laughs, good teen/college movie. :)

4. Stargate – ok, not so much a nerd/geek show but the scientist saves the day every time. (and gets the cute slave-chick in the process)

3. The Matrix (Any one of them) – There is something to be said for the ‘willful suspension of disbelief’ – I thoroughly enjoyed the movies and one day want to learn kung-fu by ramming a needle into the back of my head.

3. Blade Runner – Can’t hate anything with Harrison Ford in it, but it took a lot of hemming and hawing over whether this went above or below The Matrix.  It is a serious marvel.  Just got it on Blu-Ray and it’s SPECTACULAR. :)

2. Wargames – My inspiration.  That and when I was growing up I was about as nerdy as Matthew Broderick…well…..is.  (And he still managed to get Sarah Jessica Parker)

1. Star-Trek – Didn’t think I’d forget this did you?  Most of the movies were great, once you got past the ones that sucked (Voyage Home, anyone?).  The movies are here as an afterthought because they were pure marketing.  An attempt to capitalize on the popularity instead of furthering the idea.   The series rocked though, and I will always have a warm-spot in my heart for Gene Roddenberry.

AS400 -

Sunday, August 10th, 2008

It’s 4am – Just finished an AS400 migration from 8830 to DMX – Went swimmingly well.  Shut down the system, swing the cables, split the RDF, and boot it back up.  Since it’s a direct-attach there is no pesky zoning or masking to deal with. :)

AS400′s are an odd beast.  The only way to boot them from the Symm is to use a Load Source Emulator, which is to say, a customized Fibre-SCSI bridge that is about the size and shape of a 3.5″ SCSI drive.  It slides into the SCSI slot on the AS400 and has a cable sticking out the front of it.

If you ask me, it’s not the most efficient way of doing it, but interesting none-the-less.  Then again I’ve never been a big fan of some of the intricacies of IBM hardware, though I *LOVE* AIX as an operating system. :)

Got a VMS host to do next weekend.  *THOSE* are difficult to design around, because they actually depend on the Symmetrix device (Hypervolume or Meta-Head) number being the same.  Not just the LUN number.  So any binfile has to have the VMS devices in the same locations. –Major pain.

VMWare disk problems…

Friday, February 15th, 2008

Ok, I’ve been playing more and more with VMware lately.  All of it personal, because the work opportunities just haven’t really presented themselves.

In relocating one of my “production” servers to the Fibre array I purchased recently, I ran into a problem.  I realized that I was doing it wrong and tried to cancel out of a disk move.

Every vmware vmfs disk is made up of two parts.  The actual virtual disk is contained in a file ending in “-flat.vmdk” then there is a header file that is named the same way, minus the “-flat”.

In my particular mistake somehow the -flat file got moved but the header file didn’t.  So when I went to re-mount the disk under the VM, it was just gone.

To give you an idea of the level of panic that was going on, the name of the disk that was lost was “finance.vmdk”.  Yes, this is the root disk of the server that runs my accounting package for work.  Not a happy time for me”

I played with it, I scoured the vmfs volumes to ensure that it didn’t get redirected to the wrong lun, I searched VMWare’s knowledge base (a useless endeavor) and was getting ready to rebuild the server when I had an idea.

I renamed the remaining flat file to “finance-temp-flat.vmdk” and went into the console and created a new disk of exactly the same size.  I then deleted the -flat file that was created, and renamed ‘finance-temp-flat.vmdk” to “finance-flat.vmdk” .

I restarted the virtual machine, and lo and behold, it booted without effort.

I then immediately shut it down and backed it up.

I then exhaled.

-SG

Overcomplicating the world

Thursday, December 6th, 2007

Ok, I’ve seen it happen over and over again.  Customers who think they know better.

Now I absolutely applaud a customer who wants to take the time to learn the ins and outs of the storage they’ve spent probably  hundreds of thousands of dollars on.

But when you pay a consultant to come in and do work for you, please please PLEASE don’t handicap him by telling him to do something he knows is wrong.

Simple things, zoning, pathing, masking, failover, even naming conventions we use come from years of experience on what is the best way to put something together.  More damage is done  by people thinking they know better than the years and years of developer-hours that went into the system.

As a for instance.  Single initiator, single target.  There is no need to zone an HBA to multiple targets for redundancy unless it’s the only HBA in the system, in which case you have your first mistake right there.

HBA_1 –> Switch_1 –> FA_1

HBA_2 –> Switch_2 –> FA_2

These should be two completely separate paths, completely isolated from each other.  It’s so simple it’s not funny, yet I’ve seen more “unique” ways of zoning than I can count.

For instance:

HBA_1 –>Switch_1 –> FA_1A
–> FA_1B

The inherent problem with this is that powerpath (or DMP, or whatever flavor multipath software you use) is going to spend more time managing two paths, and the weak/slow point in the link is still the HBA.  You can’t get beyond the fact that it is a serial interface.

Plus the fact that on a Symmetrix, FA1A and FA1B share a processor, so you aren’t even gaining anything from spreading the IO across the Symmetrix front-end.  (not that you would even if you used FA1A and FA2A, because the processors are still writing to cache, minimal lag.)

Then you get into management.  You spend more time and effort managing a complex solution than it’s worth.  Simplify it and you’ll find you spend much more time at your local pub.  ;-)

/jg

Migration complete -

Sunday, November 25th, 2007

We did it.   Migrated the hosts/data.   Production is now running in Kansas, DR in Georgia, and the old datacenters in NY/NJ are one step closer to being shut down.

Interesting couple of things I learned today.

SRDF/A is a great technology for replicating over long distances while maintaining what they call a “dependent-write-consistent” state.  It means that even though the replication is being taken care of asynchronously, with minimal performance impact to the host, that in the event of a failure you’re going to lose a minimal amount of data.  (In our case, when it was running the R2 disks were about 45-60 seconds behind the R1.)

We also performed a “failure” (disconnected both Gig-E ports to simulate the Kansas site dropping out) and brought the DR hardware up as primary, then reconnecting, unmounting, and restarting the SRDF/A session.

The only downside I’ve found with SRDF/A is that it’s a royal pain to stop and restart the replication.  In cases like this one, where once a week they take the R2′s offline to run a 20-hour backup off them, they are putting themselves at unneeded risk.  It’s a situation where TimeFinder/SNAP would be a great benefit.  You snap the R2′s at midnight and back them up, thereby leaving your R2′s in sync with your R1′s for the duration.  You can also then mount the SNAP volumes to a separate media server thereby avoiding having to re-configure the DR server as a temporary media server.

It’s just a thought.

It’s always a great feeling when you hit the deadline dead-on, especially when you’re dealing with a situation where the requirements keptchanging throughout the project, even to the point of having to add new devices at the last minute.

Oh well, on to the next.  At least the next is going to keep me closer to home.   Small-scale data migration from DMX2 to DMX2 within the same room, this should be a cake-walk. :)

Binfile changes

Thursday, November 8th, 2007

The joys of data migrations. 

One of the most common problems is the standard practice of most companies to avoid upgrading whenever possible.  The “if it ain’t broke, don’t fix it” mentality.

I could spend days and days on that particular brand of suicide.  For now I’ll just replace that addage with a new one.

If you don’t upgrade it now when you can do it in a controlled fashion, you will end up doing it when your life depends on it with very little planning.

So on the 17th, a customer is going to have to take an application down on an *OLD* Symmetrix 4.8 system to upgrade from 5265 code to 5267 code.  (two major code revs up, from 5×65 to 5×66, then from 5×66 to 5×67, and neither can be loaded on-line)

All of this has to happen *JUST* so we can move the data off this symm and onto a “not-so” old Symm 5.0 that will then me packed up to be moved out of state.

First off, the idea that you can simply turn off a Symm and ship it across the country is nuts.  Anytime you get a system with that many moving parts (harddrives) that have been spinning for that length of time and simply “turn it off” you run the risk of multiple hard-disk failures.  And as we all know, any time you have multiple hard-disk failures in an array, you run the risk of losing both halfs of a mirror.  Hell I cringe at turning off my desktop PC because I know that there is always the chance it’s not going to come back up, and I’ve got Raid-1 (160G) on my boot devices and Raid-5 (500G) on my data volumes, so I’m reasonably protected.

Secondly, why are we moving a Symm that is going to hit EOSL before too long?  Doesn’t it make sense to go ahead and upgrade to the latest and greatest hardware, get a free support renewal (included with the purchase of new hardware) and get the latest and greatest features/functionality?  Of course we’re moving a bunch of Sun E3000/E4000/E6000 class hardware.  These are the systems I cut my admin teeth on back in 1996 when I first started out in datacenter operations.  They were old 6 years ago.

Next time someone asks you the correct way to move a datacenter, the correct answer is “twin the hardware and replicate” followed by “trade-in.”

Bekins Moving should never be an option.

The beatings will continue…

Tuesday, October 23rd, 2007

…Until morale improves.

Trying to run 4, FCIP trunks over a half a DS3 is a lot like raising a teenager.

Sometimes it looks like it’s working, but in reality it’s just screwing around playing video games.

Actually, my favorite is that “Raising a teenager is like trying to nail JELL-O to a tree”  I’m feeling about the same level of frustration.

What’s basically happening is that the link is fine, as long as we’re not doing anything silly like, oh, PASSING DATA over it.  THe minute we start moving data the link gives up and goes to Palm Beach for the holiday.

I tried to explain to both EMC and the customer at the start of this engagement that replicating four Symms over even a full DS3 is very…optimistic.

So I’ve spent the last three days solid beating my head over this, more than 18 hours a day (except for yesterday which involved 7 hours + 8 hours travel time.

Cheaper isn’t necessarily better.

Sunday, October 21st, 2007

This is a large part of what drives me nuts about customers.   (If they didn’t pay the bills, I’d be for letting them all drown in the sea of bad decisions they make)  The unwillingness to spend a little extra to do it right.

Let me give you an example.  I’m working now on a data-center consolidation.  Two datacenters in the New-York area that have been around since the 80′s are being consolidated into other datacenters further south. 

One is being closed, and the other is staying online, presumably long enough for them to realize that it’s also out of date.  About a dozen hosts are SAN attached. and are being moved.

Now here is the scary part – We’re talking about a collection of Sun E3500′s, I think the most powerful unit they have is an E5000.  All running Sybase and are still running Solaris 2.7.  The real kicker is that the hosts are using JNI SBus cards – these have not been supported in over 5 years (the company is long since a memory)  It was a real challenge even to FIND a batch of old Emulex LP9002-S cards to replace their JNI cards with.

The Symms we’re migrating off of are a collection of 4.0′s, 4.8′s, and a few 5.0′s – and they’re keeping most of the 4.0 and 4.8 symms and retiring the 5.0.s….    Huh?

So what would be the best idea?

If you said “Buy a pair of UE10K’s to go with the new DMX3′s and move all of the hosts into it” you’d be right. :)    The wonderful thing about VM based hosts, like the Sun Ultra Enterprise series and the AIX p-series, is the ability to consolidate multiple smaller, older hosts onto it.  You save a mint just in floor-space and power/cooling bills.  Not to mention the HBA’s you don’t have to buy.  Figure this company bought in the neighborhood of 30 Emulex LP9002′s, even conservatively priced out at $500/ea, that’s a chunk of change.  When one UE10K with 6 or 8 LP10000 HBA’s could have done the same work, and at 4gig no less.

They spent the money on a pair of DMX3′s, source and target, presumably because they had to (I dont think you can buy a Symm5 outside of Ebay anymore)  But they are actually going to spend the next three months MOVING antique hosts 1500 miles and hoping they survive the trip.  The good news is that I get overtime, and I’m averaging 60-70 hours a week right now trying to play into their little psychosis.

And where does it all end?   Quite simply they are going to find themselves upgrading the hardware anyway, probably after a catastrophic failure, so they just wasted a million dollars moving stuff they are going to throw in the trash in a year or two.

Cisco FCIP and SRDF

Friday, October 19th, 2007

Been a while since I’ve written anything – I’m not even sure if I still have a readership.

I’ve been working an average of 60 hours a week on a single project these days.  Doing a datacenter migration and consolidation.  Basically moving 4 Symm-5 generation systems into a single DMX-3.

The funniest part of this has been learning the DMX-3, which I’ve not had a lot of stick-time with.  It seems like a great machine, a good hybrid of the Clariion and the Symmetrix.  I don’t much care for the DAE back-end, too many major points of failure, too many cables.  (Though when you do your first code-load on one, it sure gives you a work-out as far as learning what plugs in where.)

Anyway, as the title suggests, we’re doing a large part of this migration using temporary hardware, in the form of the Cisco MDS9216i.  This is a normal MDS 92xx chassis (2-slot) with a 14/2 FCIP blade in it.  Simply 14x4gbit FC ports and 2xGig-E ports on the same blade.  So far it’s been one challenge after another, and as of this posting we still don’t have the georgia and new-york datacenters talking to each other.

Part of the problem is the customer’s network infrastructure.  Namely it sucks.  For those who don’t know, Gig-E ports on the Cisco don’t negotiate down, they are essentially 1000-SX ports.  The customer, who makes a substantial part of their income off network traffic, doesn’t have a single Gig-E port in the entire datacenter. – that was problem number one.

Problem #2 was, in the datacenter that does have Gigabit available, namely the new in in Georgia, there is no optical available.  So we have to go through the painful process of getting an RPQ (an in-exact definition is “Request for Price Quote” – what it really means is getting engineering to bless the configuration) to use copper SFP’s on the MDS switches.

*THEN* we find out we’re replciating over a DS3 circuit, and even at that that we have to “nice” our hardware down to 12.5Meg/Sec so as to not affect their production traffic, which is (of course) running on the same network.  (SRDF has a nasty habit of sucking up all available bandwidth)

Do you know how long it takes to replciate terabytes of data at 12Meg?  LOL

This is going to be fun.  I’ll keep you posted.

 

SRDF over what media?

Tuesday, February 6th, 2007

Well – Tomorrow I should have data replication going between the two Symms.  And it gives me pause.

We’re going to be using SRDF/A for our replication.  To those who are not familiar with the EMC terminology, SRDF/A is a “Semi-Asychronous” form of SRDF that provides consistency points in the data being transmitted without affecting production performance. 

SRDF inserts a “Checkpoint” periodically into asyncronous traffic.  The target frame will only write a block of changes, called a “Delta-Set”, when the ending checkpoint has been received.  If a link fails before the checkpoint is received, the previous block of data is considered to be invalid and discarded.

This allows a recovery-point of 10-15 minutes, with guaranteed consistency, over a longer distance.  (Our planned replication distance is approximately 1500 miles)

The other option, albeit too expensive for the bean-counters who manage our money, is multi-hop SRDF, which allows you to replicate to a bunker site 10-15km away from the primary site in full synchronous mode, and then from the bunker site to the DR site in Async. or SRDF/A mode.  This allows a recovery up to the point of failure in the event the primary site is lost, and recovery to the last delta-set in the event of both a primary and bunker site loss.  (nuclear explosion?)

So the options for distance are:  Ethernet, and ethernet.  The longest peice of dark fibre I’ve ever seen covers the 35km or so between capitol hill and and the congressional DR facility.  They ran full Syncronous mode but the users never noticed because they never saw what performance was like without the 30ms round-trip.

The Symmetrix supports three protocols for SRDF

*  IP (current max 1gb per link)
*  FibreChannel (current max 4gb per link with DMX-3 and 5772 code)
*  Escon (The original standard for SRDF going all the way back to the Symm3)

FC and Escon are good for limited distances, With long-wave (1300nm) optics you can do about 10km reliably.  With a good DWDM set you can stretch that out significantly, (not native, the DWDM hardware acts as a repeater of sorts) plus will allow you to put multiple links down the same fibre-pair.

Ethernet seems to be the most often implemented version these days.  I’ve seen a few, though not many, “Symmetrix RFA (RDF over Fibre) –> Nishan IPS3300 –> Ethernet –> Nishan IPS3300 –> Symmetrix RFA” types of implementations, but it seems to me that you’re just throwing that many more potential breaks in the transmission line, plus every time you have to decode a signal and re-encode it in another format you’re losing a step. 

(Even the fastest computer hardware takes time to process data, nothing hands it straight across.)

Of course, then the bean-counters get into it….the RE (RDF/Ethernet) adapters are more expensive than simply dedicating two ports of your existing FA (Fibre Host Adapters) to the RDF functionality.

Just ranting. :)