Overcomplicating the world
by Jesse on Dec.06, 2007, under Best Practices, Data Migration, Fibrechannel
Ok, I’ve seen it happen over and over again. Customers who think they know better.
Now I absolutely applaud a customer who wants to take the time to learn the ins and outs of the storage they’ve spent probably hundreds of thousands of dollars on.
But when you pay a consultant to come in and do work for you, please please PLEASE don’t handicap him by telling him to do something he knows is wrong.
Simple things, zoning, pathing, masking, failover, even naming conventions we use come from years of experience on what is the best way to put something together. More damage is done by people thinking they know better than the years and years of developer-hours that went into the system.
As a for instance. Single initiator, single target. There is no need to zone an HBA to multiple targets for redundancy unless it’s the only HBA in the system, in which case you have your first mistake right there.
HBA_1 –> Switch_1 –> FA_1
HBA_2 –> Switch_2 –> FA_2
These should be two completely separate paths, completely isolated from each other. It’s so simple it’s not funny, yet I’ve seen more “unique” ways of zoning than I can count.
For instance:
HBA_1 –>Switch_1 –> FA_1A
–> FA_1B
The inherent problem with this is that powerpath (or DMP, or whatever flavor multipath software you use) is going to spend more time managing two paths, and the weak/slow point in the link is still the HBA. You can’t get beyond the fact that it is a serial interface.
Plus the fact that on a Symmetrix, FA1A and FA1B share a processor, so you aren’t even gaining anything from spreading the IO across the Symmetrix front-end. (not that you would even if you used FA1A and FA2A, because the processors are still writing to cache, minimal lag.)
Then you get into management. You spend more time and effort managing a complex solution than it’s worth. Simplify it and you’ll find you spend much more time at your local pub.
/jg
December 6th, 2007 on 11:59 am
Since my experience has been servers/os/app stuff, I’d love to hear more. I find it hard to locate best practice reference resources out there on SAN setup. On your example above, people have always told me it was best to do example 2 instead of single initiator, single target for an additional layer of redundancy in the path (e.g. if SP/FA1 goes down, you lose your path redundancy with other components).
I’ve also gotten into arguments about port vs soft zoning with people who have been configuring arrays for years who tell me port zoning is bad (ignoring the security aspect).
December 6th, 2007 on 12:17 pm
The example is best applied to Symmetrix, because everything is completely redundant on the back-end. You’re absolutely not gaining anything by doing the “example 2″ method. If an HBA fails you’re losing two paths instead of one, granted if an FA fails you’re losing half a path instead of a whole one, but you’re not going to see any real performance gain.
In 10 years of working with the Symmetrix, I’ve *NEVER* known EMC to recomend anything other than my first example.
Now a Clariion is a completely different story – you HAVE to zone a single HBA to both SP’s, because it’s the only way to take advantage of the load-balancing as well as the peer-redundancy.
HBA_0 –> SPA0/SPB0
HBA_1 –> SPA1/SPB1
So that when the LUN is owned by SPA, load can be effectively balanced across SPA0 and SPA1, while SPB0 and SPB1 sit idle waiting for a trespass.
Then you have to figure out the per-port cost of the Symmetrix ports and whether it’s worth the expenditure. When a set of FA’s can run 40 – 50K, that’s about $3,000 per port, plus about $1,000 per switch port. If you have an environment with more hosts you wind up sacrificing addressing on more ports than you need to for little or no return.
On to your second point:
Port zoning is good and bad – the security aspect is there, but you have to also understand that in order for someone to spoof a world-wide name a hacker would have to have physical access, which means you’ve got a bigger problem.
It’s best to mention that EMC is very selective about in what situations port-zoning will be supported. So if you’re planning on contacting the SAC for zoning/switch issues, make sure they’ve approved it first.
The real issue I have with port zoning is that if an SFP or port fails you may be required change the zoning to recover, rather than just swapping the cable to a new port. This creates more work for what ends up being one of the most common failures. (Fixed optics on the HBA are the most common failure, nothing you can do but replace the card)
One of the benefits to port zoning is in a blade environment where you are booting from the SAN and want to be able to do a quick swap in the event of a blade failure. I’ve worked in environments where port-zoning was used and device masking was disabled on the symm (so that clustered hosts would see a separate boot volume along with the shared data volumes) and if a blade failed you simply swapped it out and turned the new one on.
Downside to that plan – see my note above about port costs, that requires that you dedicate Symmetrix ports to a single host. For dual-pathed connections that’s almost $10k per host just in port costs.
There are environments where the cost-benefit is there, such as environments where downtime is measured in thousands of dollars per minute, but in the “average” environment that doesn’t make much sense. You can bypass that cost by enabling lun-masking however when you swap a host you end up having to do an HBA swap in Volume Logix as well, just one added step.
Hope that wasn’t too rambling, trying to do three things at once.
December 10th, 2007 on 5:36 pm
Not sure I’d completely agree with your statement of “no need to zone an HBA to multiple targets for redundancy”
True if you zone the same hba to different ports on the same card, it doesn’t get you much. But I do believe there is a tangible benefit to multiple targets per HBA, while still keeping the rule that a zone contains only one hba port and one target port. We have 4x director cards in our DMX (slot 7,8,9,10), any host attatched to the san we create 4 zones
hba1-> Director 7aa
hba1-> Director 8aa
hba2-> Director 9aa
hba2-> Director 10aa
What this gives me is the capability to continue to use both HBA’s on a host in case I lose a path to the DMX. We’ve had a number of FA port failures over the years (not saying is frequent but it’s happened more than twice over 8 years or so of using symmetrix storage). True I have to create 2x more zones, and my multipathing software now has 2x additional paths to deal with. When I have a patch cord or director fail need to be replaced I don’t have all my traffic slam over onto just one fabric.
But the real key one for me, is that I don’t have to be as paranoid that when they pull the director card to fix one port on a 8x port director. If a host unknown to me has a path that had failed I could be pulling the one remaining active path: i.e.
Only zoning to 2x ports
Host A is attatched to ports 7aa & 10aa
Host B is attatched to ports 7ba & 10ba
Host B’s cable to path 10ba had a floor tile drop on it and break the fibre and was not caught by any reporting/monitoring/etc software (you think it’s just fine)
Port 7aa failed inside the symmetrix and EMC’s wants to reseat the whole line card to fix it
In this situation I would have an unexpected outage on Host B if someone didn’t go to every host on the san and check to make sure there was an active path on a different line card
Zoning to 4x ports
Host A is attatched to ports 7aa, 8aa, 9aa & 10aa
Host B is attatched to ports 7ba, 8ba, 9ba & 10ba
Host B’s cable to path 10ba had a floor tile drop on it and break the fibre and was not caught by any reporting/monitoring/etc software (you think it’s just fine)
Port 7aa failed inside the symmetrix and EMC’s wants to reseat the whole line card to fix it
In this situation, I would have no outage on Host B at all (being the paranoid type I still would have done a quick double check on the different hosts paths)
It’s a fairly deep rabbit hole you can go down in with the paranoia as to what you trust: heck 8x paths would be better than 4x (more is always better), do I trust tier 1 storage should I mirror across DMX units in my volume manager, etc. But for me having 4x paths rather than 2x, I think is the proper amount of paranoia as paths do fail, and there are times when those path failures aren’t found for extended periods of time (someone forgot to setup monitoring, cleared an alert prematurely, etc), and I don’t have to flood all my traffic onto just one fabric/hba; for a *very minimal* initial setup cost (~5 extra minutes to add 2x more zones, once in the life of the server). I don’t think about managing my ports because I can sustain a failure of any one path and not really worry about flooding the ports to my storage, I can also sustain 3x path failures and still have the host remain running. Inside the DMX you won’t get much performance improvement going across cards, but outside I think there are benefits to it, that will protect you more often than one in a million times, for almost no overhead/cost.
December 10th, 2007 on 10:38 pm
Oh yes, it’s very possible, and commonly done by end-users, but there is no benefit other than a slight additional piece of mind. (it’s funny actually, you can do multiple mirror positions, up to 4 in a non-SRDF volume, but people don’t do it because of the cost – whereas burning more than one port for a single HBA is a much more expensive proposition.)
I can’t remember doing this at EMC’s request even once in 8 years of designing for them.
What you get, is that in the event of an FA failure, you maintain multiple paths, which means your loss of performance will be minimal during the loss of signal. (you’ll run at around 75% with an FA failure as opposed to 50%)
As far as performance goes, there is absolutely *NO* performance gain to be had for zoning, say HBA_A to both FA7aA and FA8aA.
Think about it. From a bandwidth standpoint your bottleneck is the link from the HBA to the switch. Assuming 4-Gbit links across the board, you’re never going to use more than half of the bandwidth available to 7aA and 8aA. More importantly, the Emulex/QLogic cards most commonly in use right now have no internal processing power of their own, so they depend on host CPU cycles. The more paths / targets you put on a single HBA, the more processing power is taken from the host.
Whereas each FA board has four processors, each running 2 ports. So in the above example, you’ve got two processors receiving data from one HBA.
Quite the contrary, I’ve actually done the opposite, where multiple HBA’s from a host will be zoned into a single FA, to take advantage of both lun numbering limitations (older Solaris hosts with a limit of 256 luns per HBA..
The real danger existed when the optics on the Symm were fixed, so a bad laser required the replacement of an entire back-end adapter. Now that that’s no longer the case, a cable failure is a 5 minute fix on the Symm side. (HBA’s still need to be replaced, as most of them still utilize fixed optics this is what reduces the effectiveness of the single HBA –> Multiple FA zoning)
By making multiple paths from the swtich to the Symm, you’re not really doing anything to mitigate the real weak link, which is the HBA to Switch.
Truthfully, if you want to go 4FA’s for added speed and redundancy, go 4 HBA’s as well. You’ll pick up the performance and get a more meaningful sense of protection.
And whereas two FA ports can run you in the neighborhood of $10,000, two HBA’s can run you as little as $700 – $1000 each. A minor investment to actually utilize the bandwidth you’re allocating.
December 12th, 2007 on 7:06 pm
We’ll probably have to agree to disagree
I’ve had multiple FA issues (admittedly very few) that either required a whole FA board to be replaced or support required the card to be reseated (and these were not simple SFP based issues: at least twice on DMX2000). It isn’t the most common one, but I’ve had it happen more than once.
Like I said earlier it isn’t anything inside the DMX (I agree there is very limited pickup in performance between different FA’s); but outside the DMX especially in a core-edge design. If I now have to push *all* traffic for multiple hosts onto just one fabric because I have to replace/reseat a FA board, there is a possibility I will over-run an ISL (sometimes the path chosen for HBA isn’t the best). True, it won’t give me a bit more performance running with no failures (provided I’m not overrunning a port), but it can reduce the amount of headaches when a failure occurs (been there, done that, got the call from management)
Management I’ve got to say isn’t harder: I setup 4x paths straight across the DMX. If I tell someone they are on port 7aa, they know they are also on 8aa,9aa & 10aa. To create an additional 2x zones and 2x masks takes no more than 5 minutes and then I never touch them again. I’m not sure if you are expecting to dedicated a FA to a single HBA from your statement “burning more than one port for a single HBA is a much more expensive proposition”, that is a *very* expensive proposition, but I’d ask why have a SAN if you are going to do that, why not directly cable it up? We have more hosts than the DMX can support if we do 1:1 mappings, so we are going to be sharing FA’s.
To do this doesn’t cost me any capital to do (i.e. I don’t have to goto the business for more HBA’s or FA’s), it costs me only 5 minutes of my resource time to setup initially, the host-level performance impact of 2x vs 4x paths on my equipment is really a nit (2x vs 128x you are probably talking something tangible), in the event of a failure in the storage card I can still have all my HBA’s working along with both fabrics, and it’s easier to manage (at least for me). Going 4x paths has given me less pain than when I did 2x, for really no cost (but that’s just my experience). I’m all about less headaches, got enough battle-scars to know I don’t want any more, this has reduced the amount of battles I’ve had to wage.
December 14th, 2007 on 12:55 pm
I dunno, I think I’m with SanGod on this one. You say:
“If I now have to push *all* traffic for multiple hosts onto just one fabric because I have to replace/reseat a FA board, there is a possibility I will over-run an ISL”
Seems to me you have a fabric design problem. You’re trying to mitigate this by over-zoning and mapping (which you agree isn’t going to help you performance-wise), but if you’re in a position where one FA failing on one storage array could overun your ISLs, what happens if you lose a whole SAN fabric? You’re SOL… If you have redundant SAN fabrics, but one can’t handle the full load if the other fails, are you really redundant? Why bother with two fabrics if you’re going to underprovision one to the point where the other chokes just when you need it the most?
December 14th, 2007 on 2:06 pm
True – if you’re putting traffic over an ISL in any case, you should have twice the number of ISL’s you “need”.
Better solution to that is not to use ISL’s. I know they’re handy, and that when you’re putting a large number of hosts into a SAN they are almost necessary, but if you’re operating redundant fabrics, you should never utilize more than half of your available ISL bandwidth, preparing for the idea that someday you may need to temporarily support your environment with a single fabric.
This happened to me a while back when Cisco came out with the upgrade from 2.1.x to 3.0.x on our 9216 switches. The problem is on the 9216, this is not a non-disruptive upgrade. The whole fabric has to bounce during the reboot.
Just always remember: Anytime you use an ISL to pass data traffic you force all of your data down a limited pipe, which is forcing a limit where there doesn’t need to be one. If you have to ISL your switches, try to use them for SAN-A / SAN-B management only.
December 14th, 2007 on 6:52 pm
One of the problems is… well let’s say that sometimes the path a switch will assign a HBA to storage can be non-optimal.
i.e. you have
3x, 2GB ISL links between some Brocade switches
As an aggregate on the hosts you normally push ~3GB of ISL throughput across both fabrics
Theoretically I’m happy as a clam as you actually have 2x the throughput that you normally use (6GB of ISL’s, 3GB of actual traffic)
Problem comes in when the lovely path algorithm comes into play, this is the tricky part:
Host A: uses 10MB
Host B: uses 10MB
Host C: uses 140MB
Host D: uses 140MB
Host E-?: uses whatever
There is a possiblity that the Brocade has assigned hosts C & D to the same ISL port, as it’s calculated upon first seeing the port come up and stays stuck there until an “event” (failure, or otherwise) causes the switch to recalculate paths. Even with double the amount of ISL bandwidth than I use, I can still run into a problem (280MB of traffic will not fit into a 200MB ISL), heck I could quadruple my ISL bandwidth and still have this issue. Admittedly this is some old-school living, but I’ve felt the pain and as you said have “years of experience” and learned from it. Now you can buy channel bonding licenses for Brocade, etc to reduce this but I’ve lived the dream, had the experiences. Having ran Brocade switches for years and years, I’ve had to do the whole fabric goes down multiple times over the years (existing 1GB switches, want to plug in 2GB switches, need to change core fabric values so they match, major code revs). Don’t remember what code it was exactly but after going from 3.x to 4.x of Brocade firmware or 4.2 to 4.22 (been a number of years, kinda fuzzy). Any fabric event (non-major ones, just simply reboot a host, etc and have the hba login) and running HBA’s on completely different switches would be kicked off the fabric (2nd customer in the world to run into this lovely one). This required us to poweroff all the switches in the entire fabric simultaneously, one by one bring a switch online upgrade it, power it off until all the switches were complete and then power the entire fabric on.
You still have not really given me a reason that is a really bad reason to do it, that it causes major headaches in day to day management. I have lived through some very fun experiences, and I’ve found this works best for me. The above noted brocade ugly 2nd customer in the world bug, had a tier 1 array powercycle while in production due to vendor service engineer (not in a nice way either, committed cache writes to hosts were not flushed, but were instead lost… very, very bad day), arrays with clustered heads that instead of failing to the working head the working head shutdown losing *all* access to the storage (fix: powercycle both heads). We’ve found very interesting ways for DMX’s, Clariions, NetApps, Brocades, Cisco’s etc to fail while they were in production, things that aren’t supposed to *ever* happen do… I’ve truely lived those pains. Maybe having 4x paths to an HBA is not really absolutely “needed”, but if the only penalty is 5min setup time, that’s the only penalty? Why wouldn’t one be doing it? That’s really is the question, unless 5 minutes is really that much more important to you than a little more piece of mind? Not to be argumentative but Is it that much more painful, is it in a whitepaper that says this is against best practices, etc I’ve felt the goodness with no pain. What really is the hearburn that you experience from this, other than it takes 5 minutes longer to do.
After 4 months or so of work,, I just sent out a PO this afternoon to do *all* chassis switches in the different fabrics and upgrade *all* DMX & Clariion units to a number of DMX-4′s.
December 15th, 2007 on 9:54 am
That’s the danger of not using trunking. Brocade uses FSPF, literally, Fabric Shortest Path First.
If there are multiple “shortest paths” it will simply round robin them.. So if you have two ISL’s and you connect Host-A, Host-B, Host-C, Host-D in order, A and C will be on one path, and B and D will be on the other.
This is especially problematic in clustered environments, because there is a 25% chance that both active nodes will end up utilizing one ISL, and the passive nodes will sit and exactly NOT utilize the second.
Trunking solves this problem, but requires that you burn more ports than you may like. Also Brocade’s trunking is limited to single-ASIC, so you have to put all 4 cables into one ASIC to get a trunk. Single ASIC failure results in a segmented fabric.
McData and Cisco switches both trunk independant of ASIC, which is a good thing, because it means you can spread ISL’s across blades and still get the performance.
I still maintain that pushing data across ISL’s is universally a bad idea, but understandably sometimes it can’t be avoided. When I was out at Disney they had the storage in the basement and the hosts one floor up, they used ISL’s between directors on each floor to hook everything together.
I guess the point is if you’re going to run to 4 FA ports, you’re gaining nothing by that unless you’re running 4 HBA’s. The FSPF calculation is still going to be based on the HBA path, and if both FA’s are in the same switch you’re gaining nothing, if the FA’s are in the different switch, you are potentially introducing extra, needless hops.
Best way is to devide the Symm up depending on the number of switches you have. If you have 2 switches, use the Low-FA’s to one, and the high-FA’s to the other. If you have four, devide the symm into quads, with Low/AB on one, Low/CD on the second, High/AB on the third, and High/CD on the fourth.
If you have three switches. God help you. There really is no easy way to balance across three swtiches. Lord knows, i’ve tried. (Don’t ask, though I can say it involved a government agency and leave it at that.
As far as doing one HBA to multiple FA’s, again, you gain nothing but added complexity and cost. If for the sake of argument you are following the 16:1 fan-in which I believe is still the recommended. Over 4 FA’s (with single-initiator/single-target – and reserving the D ports for SRDF) that’s 192 total hosts you can connect to this Symm. If you go 2:4, that cuts your number in half, or to 96.) Remember, the Fan-In refers to the number of HBA’s zoned to a single FA, not the number of real hosts.
So to get to the number of host ports you should have had before, you have to buy twice as many FA’s. Not a problem from EMC’s perspective, but a waste nonetheless. If the SAN is assembled correctly, with Dual fabrics that are connected properly, you gain absolutely nothing in the process.
December 17th, 2007 on 9:25 am
Well, I don’t think we’re going to convince InsaneGeek on this one. I agree that crossing ISLs should be avoided where possible, but in larger environments they’re a necessary evil. In our shop we’ve got (mostly full) Cisco MDS 9513s in the core and on the edges. We’d be managing 12-16 separate ‘fabrics’ if we didn’t have a core-edge design with ISLs, and then we’d be back to SAN islands. No thanks! If ISLs can be sized and balanced properly*, they’re not too bad to work with. Cisco port channels use the SCSI exchange ID in the round robin calculation to get very granular load balancing. I’ve yet to see a lopsided port channel. It’s pretty straightforward to keep an eye on them and if any are regularly seeing >50% utilization you can non-disruptively add bandwidth.
December 17th, 2007 on 10:50 am
That’s the great thing about Cisco – pretty much anything can be done to it ‘non-disruptively’
I agree, some people are going to do it the way they’re going to do it and that is that. At least the advent of 4gbit FC means that when you trunk 4 cables together from core to edge, the odds of you overutilizing your trunk is minimal. maybe in a burst, but highly unlikely under regular load.
As a consultant I can only make suggestions. What the end-customer does from that point on is up to them. I just find it ironic that people pay consultants only to ignore their suggestions.
December 18th, 2007 on 12:36 pm
Yup Brocade trunking licenses help with that issue (which is what I meant from my “old-school living” statement, I had Brocade’s when they didn’t offer any trunking support: anybody remember the old 1gb, SC models with the LCD screens on the front?). I did it back then because of that (among other things), and haven’t seen any reason not to continue the practice today. Brocade still have some interesting technical “gotchas” even with trunking today (that you already covered), and is the reason I’m purchasing Cisco now since they’ve had enough time to “cook” since I last replaced all my switches.
Your fan-out concern, is really the first and only legitimate quantifiable statement why zoning to two ports could be a problem. A DMX-3/4 has a fan-out ratio of 128:1 so even that argument while valid for a configuration with > 768 hosts going to a single DMX with FC SRDF (or >1024 hosts with gig-e SRDF); we are into levels that are semi silly as to whether it’s a concern or not for *general* usage statements (unless > 768 hosts going to a single DMX is a general deployment rule than the exception).
With that I’m not trying to aggressive, argumentative, jerk, etc about it, I’m very open to being convinced that 2x is better than 4x, that it’s the way of the future and my thinking is out-dated stupidity; but nobody has given me any reasonable data to change my opinion. I’ve given a number of examples that are still relevant today, while nobody has been able to show any reasonable technical counter arguments. People have made statements about managing, difficulty, and complexity, but nobody has made any quantifiable statements around why it’s so difficult/complex to manage. So nobody has shown a technical reason why it’s bad (except in a very extremely rare case with massive amount of hosts), nobody has been able to (or even attempted) quantify why it’s difficult. It’s your blog and I appreciate you maintaining a resonable dialog when you don’t have to with a random guy on the internet, but I don’t think anybody has shown me anything substantial other than “I said it, so that’s the way it should be done”.
December 18th, 2007 on 1:23 pm
I suppose it’s a philosophical question on some level. You have to weigh the management and maintenance of extraneous FC zones, device mappings, and device masking entries against what it can buy you in the event of some component failure. I wouldn’t say you’ve chosen to do anything wrong, it’s just extra work that isn’t likely to buy you much of anything so long as you have your ducks in a row to begin with, so to speak. If you’re going to go above and beyond a fully redundant configuration (2 HBAs per host, 1 HBA -> 1 FA), and you’ve got a DMX4, you could just as well zone, map, and mask all your hosts to all your FAs. Or if you’ve got more than 128 hosts, maybe only zone to half of your FAs, since even beyond full redundancy, the logic goes “more is better”, right?
I dunno, I guess I see where you’re coming from in light of the ISL situation you’ve got, but it just doesn’t seem “clean” to me. I’m kind of anal that way I guess.
December 18th, 2007 on 1:36 pm
JM -you beat me to it.
IG – First off – this is not supposed to sound confrontational, and I’m sorry if it’s coming off that way. It’s always been my experience that this is how discussion works and how people learn, myself included. (I don’t pretend to know it all, I just try to make it a habit to know where to look)
As for my clever technical retort, It’s not a “you’re doing it wrong” it’s a “you’re doing more work than you need to”.
Two absolute truths in life:
(I keep trying to tell my 11 year old the latter, he’s not buying it)
The skill lies in finding the balance between those two statements.
At heart, I’m lazy. I don’t want to do more than I have to. I have a friend/former co-worker who followed me into two different jobs and said he loved it, because by the time i was done, things ran more smoothly in fewer steps than they had when I got there.
Provided you’re running powerpath, the following setup:
HBA1 – Fabric1 – FA1
HBA2 – Fabric2 – FA2
(and if you find the need for extra bandwidth or redundancy)
HBA3 – Fabric1 – FA3
HBA4 – Fabric2 – FA4
(if you have four fabrics, make the obvious substitutions)
Is optimal. Fewest moving parts, Fewest places for configuration problems / mistakes to affect you, and you’re protected from the failure of any component. EMC’s internal infrastructure is robust enough to handle anything that the back-end can throw at it. And all you need is something to monitor PowerPath (which ECC does a pretty good job of) and you’re golden. Because the newer DMX’s use SFP’s and not fixed optics, you even have an easy recovery from a port failure. (an educated, experienced guess places 80% of port failures in the optics)
it also means your configuration change scripts and lun masking scripts are half as long, which when you’re presenting 100 devices at a time means something.
January 3rd, 2008 on 5:32 pm
Dredging this back up after the Holidays…
See for me it’s easier, everything on the SAN has 4x paths (excluding the obvious tape, etc).
I have scripts that go out and check powermt periodically (ECC is fine and dandy, but I a bit of an old curmudgeon in my trusting department). We have the same host attatched to both a DMX (primary storage) and a Clariion (Oracle flashback), trying to write monitoring scripts to take into account nuances like that is rather annoying. Having a set policy of 4x paths for everything allows scripts to not worry about such things, else I’d have to add a rather significant ammount of logic in the scripts.
For me I don’t directly write symask or symconfigure scripts so whether it’s 1x device or 200x devices isn’t much different. I run things through a perl script I made to create the scripts (script to create the script) so I don’t have to worry about typos, etc and it can do a bit of very rudimentary checking as well. Since I don’t create the scripts directly, 2x paths vs 4x paths vs 32x paths doesn’t add any risk in typos, configuration mistakes, etc. Something could be said that the end scripts have 2x the number of characters in it, so it would take longer to actually submit and run on the DMX; but from my experience if I’m dropping a lun onto 4x instead of 2x FA’s it really doesn’t take much measurable time longer (but I don’t stare at the screen with a stopwatch waiting for symconfigure to complete the full 85x steps either).
These above for me fill the requirements of (to steal a bit of your verbage)
1) InsaneGeek abhors extra work and avoids it at all costs
2) If you script it right the first time you won’t need to do it (or type it) again
I’ve been a perl junky for years and years, so everything I do I try to make a script for, and the symcli is almost made directly for doing perl scripts. When scripting if I can reduce the amount of logic and exceptions I need to worry about I have less extra work to do and there is less chance for messups (and less spaghetti code).
i.e. to check the status of DMX & Clariion devices using a script
Using a requirment of Clariion has 4x paths and DMX has 2x paths
parse the output of powermt display
Determine which devices are Clariion
Determine which devices are DMX
Match Clariion devices to number of active paths
Match DMX devices to number of active paths
if Clariion device 4x is the expected number of paths else alert
if DMX device 2x is the expected number of paths else alert
or
Using a requirment of Clariion has 4x paths and DMX has 4x paths
parse the output of powermt display
Match device to number of active paths
If paths not equal to 4x alert
For me this is the cleanest, easiest, and simplest way to think of it. If all I know is a host is attatched to port BA, I know that it should have access on 7BA, 8BA, 9BA & 10BA. This is very easy for people to understand, everything on the san should have 4x paths, rather than these storage arrays should have 2x paths and these should have 4x paths and these are how you find out which is which (which is really a huge pickup). It isn’t that more is “better” so turn up the paths to “11″ (it only goes to ten), but that 4x paths just seems to be the magic number that works, from a simplicity, understandability, and availability perspective for general use. No special cases, no special rules, no special anything, just 4x; being the anal person as well being able to say “one shallt allways have four paths to thine EMC storage no matter the make” makes me happy.
May 11th, 2008 on 7:32 am
Back to the initiator / target zoning stuff.
What should be the method used for a FastT. I would have guessed at it being the same as Clariion (single initiator / dual target – for similar reasons as above) but IBM docs seem to say it should be single/single.
If this is right, what is the theory behind this?
Thanks
May 11th, 2008 on 9:39 am
Ok – here is my take. For about 4 years I worked in R&D, first at EMC, then at MTI working on their ‘enterprise’ hardware.
In all the work I did in both arenas I found that when multiple targets are put into a single zone, they will try to act like initiators and log-into each other. While I never found a specific problem with this behiavor (IE couldn’t pin a failure down and attribute it to the behavior) I did notice that even from a pure management standpoint, keeping them separate was logically the best practice. IE if two devices aren’t passing data directly between the two of them, they don’t belong in the same zone. I also noticed a slight (not really quantifiable, so it’s technically opinion) increase in stability when you do single-initiator, multi-target.
This came down to ‘single-pair zoning’ – which has always been my preference (and practice) but EMC’s official policy is ‘single initiator zoning’ which is slightly less restrictive. It’s a single HBA and multiple targets in the same zone.
The behiavor becomes apparent when you look at the login table on a Symm that is using Single-Initiator, Multi-Target zoning. (using ‘symmask -sid xxx -dir 8a -p 0 list logins’ for example) you will see multiple Symm ports logged into your target.
The only time you should see this is in the case of RDF over Fibrechannel, when you are actually intending to Pair the SRDF devices together.
Hope this answers your question.