50Micron.com

Best Practices

Storage is as Storage does

by Jesse on Sep.18, 2008, under Best Practices, Business, Career, Consulting

Sitting here running RDF create scripts for a data push this weekend and going over the days events in my head.

One of the things you get as a consultant is the ability to get a glimpse of the political machinations of many different companies and to get a first-hand view of what does and doesn’t work.

One thing I’ve seen is about a million different attempts at integrating storage into various systems departments.  It never works.  It always ends up with departmental pissing contests over who owns what, and usually results in a company or orginization buying more storage than they need to in order to pacify the different warring factions.

Storage belongs by itself. Pure and simple, the only way I’ve ever seen it work storage is a department in and of itself, with it’s own staff, it’s own budget, and a little autonomy and freedom to make decisions, and to act with the peace of mind that you’re not having to work around someone else’s changes.

The main reason for this is that server people don’t have the time to understand the dynamics of a truly heterogenious storage environment.  Network people understand firewalls and routing (something that *STILL* puzzles me to a certain extent), etc.

A good storage person knows the basics of as many operating systems as they can.

For instance – the current environment I’m working in has the following systems:

  • AIX
  • Mainframe
  • VMS
  • AS/400
  • Windows
  • Linux
  • VMWare

A good storage person knows the gotchas of each server, but may not know even how to log into the system.

For instance – for each of the systems listed above:

  • AIX – mount the pseudo device powerpath creates (hdiskpowerX).  AIX is sensitive to D_ID changes (Switch port changes) but if you’re using the LVM there are no real worries, just have to be careful.
  • Mainframe – Three words – Long Wave SFP’s
  • VMS – Is actually sensitive to the SYMDEV number.  if you’re doing a data migration you have to move the data to the same SYMDEV number.
  • AS400 – Boot from SAN using a Load Source Emulator – use the serial cable included with it to configure the boot device.  The boot device has to be on a separate port than the data devices.  Make sure Emulation is set correctly.
  • Windows – Dynamic disks cause hell with replication and TimeFinder – don’t use them.
  • Linux – make sure you use disk/partition labels so you can avoid issues if the LUN order changes.
  • VMWare – SPC2 bit needs to be set on FA’s for DRS/HA Clustered hosts.  Best bet is to do this using Symmask to avoid conflicts with other hosts sharing these ports.

A good storage department would include:

Tier-1 (Symmetrix) expert

Tier-2 (Clariion) expert

Backup person

NAS person

just my thoughts.

2 Comments more...

101 uses for a Clariion…

by Jesse on Sep.16, 2008, under Backup, Best Practices, Clariion

You know, it floored me recently when i heard that someone had said that a CX3-20 couldn’t be used as a dumparea for Tivoli.

What floored me even more is when they said this would not perform as well as a Symmetrix 8430.

8430.

Symm4.

Almost 10 year old hardware.

Huh?

Back when I was working for the student loan company in Sterling, we ran all of our backups to a single CX500 with 15 Terabytes behind it. Worked without any issue and was screaming fast, to the point that 18 hours (to tape) of backups was compressed into 8 hours (to disk.)

{sarcasm}Now correct me if I’m wrong, the CX3-20 is a tad faster than the older CX500, right?{/sarcasm}

As with any array, the bulk of the issue arises from whether or not the disks are laid out appropriately. If you try to run single-disk LUNs you’re probably going to die of old-age before a backup finishes. But if you stripe it appropriately and make sure the LUN ownership is correct, you can do wonders with EMC’s “Tier-2″ array.

6 Comments more...

A Cabling Before/After:

by Jesse on Sep.04, 2008, under Best Practices, CableManagement

Anyone who knows me knows I’m a little insane when it comes to cabling.

Keeping cable management sane makes management, and troubleshooting easier.

The trouble of course is that usually, when you’re called in to do a clean-up, you’re not given the luxury of taking the datacenter off-line to do the work.  Most of this was done online, though I had to shut-down a couple of single-attached linux servers in order to move their cables.

You don’t get the opportunity to truly clean up the environment when you can’t disconnect everything,however you can at least get the mess out of the way.


Before

Before

After

After

Hidden cabling to the right of the switch

Hidden cabling to the right of the switch

11 Comments more...

Overcomplicating the world

by Jesse on Dec.06, 2007, under Best Practices, Data Migration, Fibrechannel

Ok, I’ve seen it happen over and over again.  Customers who think they know better.

Now I absolutely applaud a customer who wants to take the time to learn the ins and outs of the storage they’ve spent probably  hundreds of thousands of dollars on.

But when you pay a consultant to come in and do work for you, please please PLEASE don’t handicap him by telling him to do something he knows is wrong.

Simple things, zoning, pathing, masking, failover, even naming conventions we use come from years of experience on what is the best way to put something together.  More damage is done  by people thinking they know better than the years and years of developer-hours that went into the system.

As a for instance.  Single initiator, single target.  There is no need to zone an HBA to multiple targets for redundancy unless it’s the only HBA in the system, in which case you have your first mistake right there.

HBA_1 –> Switch_1 –> FA_1

HBA_2 –> Switch_2 –> FA_2

These should be two completely separate paths, completely isolated from each other.  It’s so simple it’s not funny, yet I’ve seen more “unique” ways of zoning than I can count.

For instance:

HBA_1 –>Switch_1 –> FA_1A
–> FA_1B

The inherent problem with this is that powerpath (or DMP, or whatever flavor multipath software you use) is going to spend more time managing two paths, and the weak/slow point in the link is still the HBA.  You can’t get beyond the fact that it is a serial interface.

Plus the fact that on a Symmetrix, FA1A and FA1B share a processor, so you aren’t even gaining anything from spreading the IO across the Symmetrix front-end.  (not that you would even if you used FA1A and FA2A, because the processors are still writing to cache, minimal lag.)

Then you get into management.  You spend more time and effort managing a complex solution than it’s worth.  Simplify it and you’ll find you spend much more time at your local pub.  ;-)

/jg

17 Comments more...

Binfile changes

by Jesse on Nov.08, 2007, under Best Practices, Consulting, Data Migration, DataCenter Move, Symmetrix

The joys of data migrations. 

One of the most common problems is the standard practice of most companies to avoid upgrading whenever possible.  The “if it ain’t broke, don’t fix it” mentality.

I could spend days and days on that particular brand of suicide.  For now I’ll just replace that addage with a new one.

If you don’t upgrade it now when you can do it in a controlled fashion, you will end up doing it when your life depends on it with very little planning.

So on the 17th, a customer is going to have to take an application down on an *OLD* Symmetrix 4.8 system to upgrade from 5265 code to 5267 code.  (two major code revs up, from 5×65 to 5×66, then from 5×66 to 5×67, and neither can be loaded on-line)

All of this has to happen *JUST* so we can move the data off this symm and onto a “not-so” old Symm 5.0 that will then me packed up to be moved out of state.

First off, the idea that you can simply turn off a Symm and ship it across the country is nuts.  Anytime you get a system with that many moving parts (harddrives) that have been spinning for that length of time and simply “turn it off” you run the risk of multiple hard-disk failures.  And as we all know, any time you have multiple hard-disk failures in an array, you run the risk of losing both halfs of a mirror.  Hell I cringe at turning off my desktop PC because I know that there is always the chance it’s not going to come back up, and I’ve got Raid-1 (160G) on my boot devices and Raid-5 (500G) on my data volumes, so I’m reasonably protected.

Secondly, why are we moving a Symm that is going to hit EOSL before too long?  Doesn’t it make sense to go ahead and upgrade to the latest and greatest hardware, get a free support renewal (included with the purchase of new hardware) and get the latest and greatest features/functionality?  Of course we’re moving a bunch of Sun E3000/E4000/E6000 class hardware.  These are the systems I cut my admin teeth on back in 1996 when I first started out in datacenter operations.  They were old 6 years ago.

Next time someone asks you the correct way to move a datacenter, the correct answer is “twin the hardware and replicate” followed by “trade-in.”

Bekins Moving should never be an option.

2 Comments more...

Upgrades complete

by Jesse on Nov.08, 2007, under Best Practices, Symmetrix

With the exception of a small problem I had with sendmail, everything seems to be working.

Love it when it works this smoothly. ;-)

/jg

Leave a Comment more...

Cheaper isn’t necessarily better.

by Jesse on Oct.21, 2007, under Best Practices, Consulting, Data Migration, General

This is a large part of what drives me nuts about customers.   (If they didn’t pay the bills, I’d be for letting them all drown in the sea of bad decisions they make)  The unwillingness to spend a little extra to do it right.

Let me give you an example.  I’m working now on a data-center consolidation.  Two datacenters in the New-York area that have been around since the 80′s are being consolidated into other datacenters further south. 

One is being closed, and the other is staying online, presumably long enough for them to realize that it’s also out of date.  About a dozen hosts are SAN attached. and are being moved.

Now here is the scary part – We’re talking about a collection of Sun E3500′s, I think the most powerful unit they have is an E5000.  All running Sybase and are still running Solaris 2.7.  The real kicker is that the hosts are using JNI SBus cards – these have not been supported in over 5 years (the company is long since a memory)  It was a real challenge even to FIND a batch of old Emulex LP9002-S cards to replace their JNI cards with.

The Symms we’re migrating off of are a collection of 4.0′s, 4.8′s, and a few 5.0′s – and they’re keeping most of the 4.0 and 4.8 symms and retiring the 5.0.s….    Huh?

So what would be the best idea?

If you said “Buy a pair of UE10K’s to go with the new DMX3′s and move all of the hosts into it” you’d be right. :)    The wonderful thing about VM based hosts, like the Sun Ultra Enterprise series and the AIX p-series, is the ability to consolidate multiple smaller, older hosts onto it.  You save a mint just in floor-space and power/cooling bills.  Not to mention the HBA’s you don’t have to buy.  Figure this company bought in the neighborhood of 30 Emulex LP9002′s, even conservatively priced out at $500/ea, that’s a chunk of change.  When one UE10K with 6 or 8 LP10000 HBA’s could have done the same work, and at 4gig no less.

They spent the money on a pair of DMX3′s, source and target, presumably because they had to (I dont think you can buy a Symm5 outside of Ebay anymore)  But they are actually going to spend the next three months MOVING antique hosts 1500 miles and hoping they survive the trip.  The good news is that I get overtime, and I’m averaging 60-70 hours a week right now trying to play into their little psychosis.

And where does it all end?   Quite simply they are going to find themselves upgrading the hardware anyway, probably after a catastrophic failure, so they just wasted a million dollars moving stuff they are going to throw in the trash in a year or two.

2 Comments more...

Be the Packet….

by Jesse on Mar.07, 2007, under Best Practices, Cisco, Switches

I was whiteboarding a switch migration today with one of the DCR people, and it occurred to me:

If you can visualize the data’s path through the system, your life gets a lot easier. 

If you can see the path the data is going to take through the San, I.E. from Switch1, Blade1, Port4 through an ISL to Switch2, to Switch 2, blade 2, port 16.  you put yourself in a better position to shorten that path.

First, cut the ISL’s.  Unless you can absolutly avoid it, don’t pass data down an ISL link.  Pushing data through an ISL does two things.  It creates a bottleneck where there may not be enough bandwidth to handle multiple hosts.

Second, keep the intra-switch hops to a minimum.  If you can do it, plug the storage and the higest performing hosts into the same grouping of ports.  Most switches use 4 port ASIC’s, and if you look at the switch you’ll see that the ports are grouped by ASIC.

However, the 32port Cisco blade is an exception, uses basically the same ASIC, but shares the bandwidth over 8 ports, so keeping that in mind it’s best to connect the lower performing hosts (Windows?) to those ports, and definately *DO NOT* connect ISL’s to the 32 port blades.

2 Comments more...

SRDF over what media?

by Jesse on Feb.06, 2007, under Best Practices, Data Migration, Replication

Well – Tomorrow I should have data replication going between the two Symms.  And it gives me pause.

We’re going to be using SRDF/A for our replication.  To those who are not familiar with the EMC terminology, SRDF/A is a “Semi-Asychronous” form of SRDF that provides consistency points in the data being transmitted without affecting production performance. 

SRDF inserts a “Checkpoint” periodically into asyncronous traffic.  The target frame will only write a block of changes, called a “Delta-Set”, when the ending checkpoint has been received.  If a link fails before the checkpoint is received, the previous block of data is considered to be invalid and discarded.

This allows a recovery-point of 10-15 minutes, with guaranteed consistency, over a longer distance.  (Our planned replication distance is approximately 1500 miles)

The other option, albeit too expensive for the bean-counters who manage our money, is multi-hop SRDF, which allows you to replicate to a bunker site 10-15km away from the primary site in full synchronous mode, and then from the bunker site to the DR site in Async. or SRDF/A mode.  This allows a recovery up to the point of failure in the event the primary site is lost, and recovery to the last delta-set in the event of both a primary and bunker site loss.  (nuclear explosion?)

So the options for distance are:  Ethernet, and ethernet.  The longest peice of dark fibre I’ve ever seen covers the 35km or so between capitol hill and and the congressional DR facility.  They ran full Syncronous mode but the users never noticed because they never saw what performance was like without the 30ms round-trip.

The Symmetrix supports three protocols for SRDF

*  IP (current max 1gb per link)
*  FibreChannel (current max 4gb per link with DMX-3 and 5772 code)
*  Escon (The original standard for SRDF going all the way back to the Symm3)

FC and Escon are good for limited distances, With long-wave (1300nm) optics you can do about 10km reliably.  With a good DWDM set you can stretch that out significantly, (not native, the DWDM hardware acts as a repeater of sorts) plus will allow you to put multiple links down the same fibre-pair.

Ethernet seems to be the most often implemented version these days.  I’ve seen a few, though not many, “Symmetrix RFA (RDF over Fibre) –> Nishan IPS3300 –> Ethernet –> Nishan IPS3300 –> Symmetrix RFA” types of implementations, but it seems to me that you’re just throwing that many more potential breaks in the transmission line, plus every time you have to decode a signal and re-encode it in another format you’re losing a step. 

(Even the fastest computer hardware takes time to process data, nothing hands it straight across.)

Of course, then the bean-counters get into it….the RE (RDF/Ethernet) adapters are more expensive than simply dedicating two ports of your existing FA (Fibre Host Adapters) to the RDF functionality.

Just ranting. :)

2 Comments more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...