Archive for the 'DataCenter Move' Category

Migration complete -

Sunday, November 25th, 2007

We did it.   Migrated the hosts/data.   Production is now running in Kansas, DR in Georgia, and the old datacenters in NY/NJ are one step closer to being shut down.

Interesting couple of things I learned today.

SRDF/A is a great technology for replicating over long distances while maintaining what they call a “dependent-write-consistent” state.  It means that even though the replication is being taken care of asynchronously, with minimal performance impact to the host, that in the event of a failure you’re going to lose a minimal amount of data.  (In our case, when it was running the R2 disks were about 45-60 seconds behind the R1.)

We also performed a “failure” (disconnected both Gig-E ports to simulate the Kansas site dropping out) and brought the DR hardware up as primary, then reconnecting, unmounting, and restarting the SRDF/A session.

The only downside I’ve found with SRDF/A is that it’s a royal pain to stop and restart the replication.  In cases like this one, where once a week they take the R2′s offline to run a 20-hour backup off them, they are putting themselves at unneeded risk.  It’s a situation where TimeFinder/SNAP would be a great benefit.  You snap the R2′s at midnight and back them up, thereby leaving your R2′s in sync with your R1′s for the duration.  You can also then mount the SNAP volumes to a separate media server thereby avoiding having to re-configure the DR server as a temporary media server.

It’s just a thought.

It’s always a great feeling when you hit the deadline dead-on, especially when you’re dealing with a situation where the requirements keptchanging throughout the project, even to the point of having to add new devices at the last minute.

Oh well, on to the next.  At least the next is going to keep me closer to home.   Small-scale data migration from DMX2 to DMX2 within the same room, this should be a cake-walk. :)

Binfile changes

Thursday, November 8th, 2007

The joys of data migrations. 

One of the most common problems is the standard practice of most companies to avoid upgrading whenever possible.  The “if it ain’t broke, don’t fix it” mentality.

I could spend days and days on that particular brand of suicide.  For now I’ll just replace that addage with a new one.

If you don’t upgrade it now when you can do it in a controlled fashion, you will end up doing it when your life depends on it with very little planning.

So on the 17th, a customer is going to have to take an application down on an *OLD* Symmetrix 4.8 system to upgrade from 5265 code to 5267 code.  (two major code revs up, from 5×65 to 5×66, then from 5×66 to 5×67, and neither can be loaded on-line)

All of this has to happen *JUST* so we can move the data off this symm and onto a “not-so” old Symm 5.0 that will then me packed up to be moved out of state.

First off, the idea that you can simply turn off a Symm and ship it across the country is nuts.  Anytime you get a system with that many moving parts (harddrives) that have been spinning for that length of time and simply “turn it off” you run the risk of multiple hard-disk failures.  And as we all know, any time you have multiple hard-disk failures in an array, you run the risk of losing both halfs of a mirror.  Hell I cringe at turning off my desktop PC because I know that there is always the chance it’s not going to come back up, and I’ve got Raid-1 (160G) on my boot devices and Raid-5 (500G) on my data volumes, so I’m reasonably protected.

Secondly, why are we moving a Symm that is going to hit EOSL before too long?  Doesn’t it make sense to go ahead and upgrade to the latest and greatest hardware, get a free support renewal (included with the purchase of new hardware) and get the latest and greatest features/functionality?  Of course we’re moving a bunch of Sun E3000/E4000/E6000 class hardware.  These are the systems I cut my admin teeth on back in 1996 when I first started out in datacenter operations.  They were old 6 years ago.

Next time someone asks you the correct way to move a datacenter, the correct answer is “twin the hardware and replicate” followed by “trade-in.”

Bekins Moving should never be an option.

The beatings will continue…

Tuesday, October 23rd, 2007

…Until morale improves.

Trying to run 4, FCIP trunks over a half a DS3 is a lot like raising a teenager.

Sometimes it looks like it’s working, but in reality it’s just screwing around playing video games.

Actually, my favorite is that “Raising a teenager is like trying to nail JELL-O to a tree”  I’m feeling about the same level of frustration.

What’s basically happening is that the link is fine, as long as we’re not doing anything silly like, oh, PASSING DATA over it.  THe minute we start moving data the link gives up and goes to Palm Beach for the holiday.

I tried to explain to both EMC and the customer at the start of this engagement that replicating four Symms over even a full DS3 is very…optimistic.

So I’ve spent the last three days solid beating my head over this, more than 18 hours a day (except for yesterday which involved 7 hours + 8 hours travel time.

Cisco FCIP and SRDF

Friday, October 19th, 2007

Been a while since I’ve written anything – I’m not even sure if I still have a readership.

I’ve been working an average of 60 hours a week on a single project these days.  Doing a datacenter migration and consolidation.  Basically moving 4 Symm-5 generation systems into a single DMX-3.

The funniest part of this has been learning the DMX-3, which I’ve not had a lot of stick-time with.  It seems like a great machine, a good hybrid of the Clariion and the Symmetrix.  I don’t much care for the DAE back-end, too many major points of failure, too many cables.  (Though when you do your first code-load on one, it sure gives you a work-out as far as learning what plugs in where.)

Anyway, as the title suggests, we’re doing a large part of this migration using temporary hardware, in the form of the Cisco MDS9216i.  This is a normal MDS 92xx chassis (2-slot) with a 14/2 FCIP blade in it.  Simply 14x4gbit FC ports and 2xGig-E ports on the same blade.  So far it’s been one challenge after another, and as of this posting we still don’t have the georgia and new-york datacenters talking to each other.

Part of the problem is the customer’s network infrastructure.  Namely it sucks.  For those who don’t know, Gig-E ports on the Cisco don’t negotiate down, they are essentially 1000-SX ports.  The customer, who makes a substantial part of their income off network traffic, doesn’t have a single Gig-E port in the entire datacenter. – that was problem number one.

Problem #2 was, in the datacenter that does have Gigabit available, namely the new in in Georgia, there is no optical available.  So we have to go through the painful process of getting an RPQ (an in-exact definition is “Request for Price Quote” – what it really means is getting engineering to bless the configuration) to use copper SFP’s on the MDS switches.

*THEN* we find out we’re replciating over a DS3 circuit, and even at that that we have to “nice” our hardware down to 12.5Meg/Sec so as to not affect their production traffic, which is (of course) running on the same network.  (SRDF has a nasty habit of sucking up all available bandwidth)

Do you know how long it takes to replciate terabytes of data at 12Meg?  LOL

This is going to be fun.  I’ll keep you posted.

 

The Datacenter move is COMPLETE

Sunday, January 21st, 2007

Now I just have to go in and pull about a thousand cables out of the old datacenter and tidy up a bit.  We were there until about 4am.  My goal from 9pm to 3am was to shutdown and move the Clariion to it’s new home in the primary data center.  I got it done at minutes before midnight, however was plagued with Veritas issues for the rest of the evening.

When we made the move, we moved to a new, more robust network.  This allowed us to do away with the dedicated gig network we were using with Veritas, because between the more robust production network and the fact that most of the database backups will now be done with TimeFinder, the need for the extra network is nil.

The issue is that Microsoft doesn’t like dual-networks – getting it configured for DNS and a dual-homed network was hard enough – getting the dual-homed config removed was even harder.Â