<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Overcomplicating the world</title>
	<atom:link href="http://blog.50micron.com/2007/12/06/overcomplicating-the-world/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.50micron.com/2007/12/06/overcomplicating-the-world/</link>
	<description>Ranting and raving about storage and technology</description>
	<lastBuildDate>Mon, 19 Dec 2011 15:36:52 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: SanGod</title>
		<link>http://blog.50micron.com/2007/12/06/overcomplicating-the-world/comment-page-1/#comment-6217</link>
		<dc:creator>SanGod</dc:creator>
		<pubDate>Sun, 11 May 2008 14:39:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.sangod.com/?p=188#comment-6217</guid>
		<description>Ok - here is my take.  For about 4 years I worked in R&amp;D, first at EMC, then at MTI working on their &#039;enterprise&#039; hardware.

In all the work I did in both arenas I found that when multiple targets are put into a single zone, they will try to act like initiators and log-into each other.  While I never found a specific problem with this behiavor (IE couldn&#039;t pin a failure down and attribute it to the behavior) I did notice that even from a pure management standpoint, keeping them separate was logically the best practice.  IE if two devices aren&#039;t passing data directly between the two of them, they don&#039;t belong in the same zone.  I also noticed a slight (not really quantifiable, so it&#039;s technically opinion) increase in stability when you do single-initiator, multi-target.

This came down to &#039;single-pair zoning&#039; - which has always been my preference (and practice) but EMC&#039;s official policy is &#039;single initiator zoning&#039; which is slightly less restrictive.  It&#039;s a single HBA and multiple targets in the same zone.

The behiavor becomes apparent when you look at the login table on a Symm that is using Single-Initiator, Multi-Target zoning.  (using &#039;symmask -sid xxx -dir 8a -p 0 list logins&#039; for example) you will see multiple Symm ports logged into your target.

The only time you should see this is in the case of RDF over Fibrechannel, when you are actually intending to Pair the SRDF devices together.

Hope this answers your question.</description>
		<content:encoded><![CDATA[<p>Ok &#8211; here is my take.  For about 4 years I worked in R&#038;D, first at EMC, then at MTI working on their &#8216;enterprise&#8217; hardware.</p>
<p>In all the work I did in both arenas I found that when multiple targets are put into a single zone, they will try to act like initiators and log-into each other.  While I never found a specific problem with this behiavor (IE couldn&#8217;t pin a failure down and attribute it to the behavior) I did notice that even from a pure management standpoint, keeping them separate was logically the best practice.  IE if two devices aren&#8217;t passing data directly between the two of them, they don&#8217;t belong in the same zone.  I also noticed a slight (not really quantifiable, so it&#8217;s technically opinion) increase in stability when you do single-initiator, multi-target.</p>
<p>This came down to &#8216;single-pair zoning&#8217; &#8211; which has always been my preference (and practice) but EMC&#8217;s official policy is &#8216;single initiator zoning&#8217; which is slightly less restrictive.  It&#8217;s a single HBA and multiple targets in the same zone.</p>
<p>The behiavor becomes apparent when you look at the login table on a Symm that is using Single-Initiator, Multi-Target zoning.  (using &#8216;symmask -sid xxx -dir 8a -p 0 list logins&#8217; for example) you will see multiple Symm ports logged into your target.</p>
<p>The only time you should see this is in the case of RDF over Fibrechannel, when you are actually intending to Pair the SRDF devices together.</p>
<p>Hope this answers your question.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: RacaSAN</title>
		<link>http://blog.50micron.com/2007/12/06/overcomplicating-the-world/comment-page-1/#comment-6216</link>
		<dc:creator>RacaSAN</dc:creator>
		<pubDate>Sun, 11 May 2008 12:32:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.sangod.com/?p=188#comment-6216</guid>
		<description>Back to the initiator /  target zoning stuff.
What should be the method used for a FastT. I would have guessed at it being the same as Clariion (single initiator / dual target - for similar reasons as above) but IBM docs seem to say it should be single/single.
If this is right, what is the theory behind this?
Thanks</description>
		<content:encoded><![CDATA[<p>Back to the initiator /  target zoning stuff.<br />
What should be the method used for a FastT. I would have guessed at it being the same as Clariion (single initiator / dual target &#8211; for similar reasons as above) but IBM docs seem to say it should be single/single.<br />
If this is right, what is the theory behind this?<br />
Thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: InsaneGeek</title>
		<link>http://blog.50micron.com/2007/12/06/overcomplicating-the-world/comment-page-1/#comment-5705</link>
		<dc:creator>InsaneGeek</dc:creator>
		<pubDate>Thu, 03 Jan 2008 22:32:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.sangod.com/?p=188#comment-5705</guid>
		<description>Dredging this back up after the Holidays...

See for me it&#039;s easier, everything on the SAN has 4x paths (excluding the obvious tape, etc).

I have scripts that go out and check powermt periodically (ECC is fine and dandy, but I a bit of an old curmudgeon in my trusting department).  We have the same host attatched to both a DMX (primary storage) and a Clariion (Oracle flashback), trying to write monitoring scripts to take into account nuances like that is rather annoying.  Having a set policy of 4x paths for everything allows scripts to not worry about such things, else I&#039;d have to add a rather significant ammount of logic in the scripts.  

For me I don&#039;t directly write symask or symconfigure scripts so whether it&#039;s 1x device or 200x devices isn&#039;t much different.  I run things through a perl script I made to create the scripts (script to create the script) so I don&#039;t have to worry about typos, etc and it can do a bit  of very rudimentary checking as well.  Since I don&#039;t create the scripts directly, 2x paths vs 4x paths vs 32x paths doesn&#039;t add any risk in typos, configuration mistakes, etc.  Something could be said that the end scripts have 2x the number of characters in it, so it would take longer to actually submit and run on the DMX; but from my experience if I&#039;m dropping a lun onto 4x instead of 2x FA&#039;s it really doesn&#039;t take much measurable time longer (but I don&#039;t stare at the screen with a stopwatch waiting for symconfigure to complete the full 85x steps either).

These above for me fill the requirements of (to steal a bit of your verbage)

1) InsaneGeek abhors extra work and avoids it at all costs
2) If you script it right the first time you won’t need to do it (or type it) again

I&#039;ve been a perl junky for years and years, so everything I do I try to make a script for, and the symcli is almost made directly for doing perl scripts.  When scripting if I can reduce the amount of logic and exceptions I need to worry about I have less extra work to do and there is less chance for messups (and less spaghetti code).  

i.e. to check the status of DMX &amp; Clariion devices using a script

Using a requirment of Clariion has 4x paths and DMX has 2x paths
parse the output of powermt display
Determine which devices are Clariion
Determine which devices are DMX
Match Clariion devices to number of active paths
Match DMX devices to number of active paths
if Clariion device 4x is the expected number of paths else alert
if DMX device 2x is the expected number of paths else alert

or

Using a requirment of Clariion has 4x paths and DMX has 4x paths
parse the output of powermt display
Match device to number of active paths
If paths not equal to 4x alert

For me this is the cleanest, easiest, and simplest way to think of it.  If all I know is a host is attatched to port BA, I know that it should have access on 7BA, 8BA, 9BA &amp; 10BA.  This is very  easy for people to understand, everything on the san should have 4x paths, rather than these storage arrays should have 2x paths and these should have 4x paths and these are how you find out which is which  (which is really a huge pickup).  It isn&#039;t that more is &quot;better&quot; so turn up the paths to &quot;11&quot; (it only goes to ten), but that 4x paths just seems to be the magic number that works, from a simplicity, understandability, and availability perspective for general use.  No special cases, no special rules, no special anything, just 4x; being the anal person as well being able to say &quot;one shallt allways have four paths to thine EMC storage no matter the make&quot; makes me happy.</description>
		<content:encoded><![CDATA[<p>Dredging this back up after the Holidays&#8230;</p>
<p>See for me it&#8217;s easier, everything on the SAN has 4x paths (excluding the obvious tape, etc).</p>
<p>I have scripts that go out and check powermt periodically (ECC is fine and dandy, but I a bit of an old curmudgeon in my trusting department).  We have the same host attatched to both a DMX (primary storage) and a Clariion (Oracle flashback), trying to write monitoring scripts to take into account nuances like that is rather annoying.  Having a set policy of 4x paths for everything allows scripts to not worry about such things, else I&#8217;d have to add a rather significant ammount of logic in the scripts.  </p>
<p>For me I don&#8217;t directly write symask or symconfigure scripts so whether it&#8217;s 1x device or 200x devices isn&#8217;t much different.  I run things through a perl script I made to create the scripts (script to create the script) so I don&#8217;t have to worry about typos, etc and it can do a bit  of very rudimentary checking as well.  Since I don&#8217;t create the scripts directly, 2x paths vs 4x paths vs 32x paths doesn&#8217;t add any risk in typos, configuration mistakes, etc.  Something could be said that the end scripts have 2x the number of characters in it, so it would take longer to actually submit and run on the DMX; but from my experience if I&#8217;m dropping a lun onto 4x instead of 2x FA&#8217;s it really doesn&#8217;t take much measurable time longer (but I don&#8217;t stare at the screen with a stopwatch waiting for symconfigure to complete the full 85x steps either).</p>
<p>These above for me fill the requirements of (to steal a bit of your verbage)</p>
<p>1) InsaneGeek abhors extra work and avoids it at all costs<br />
2) If you script it right the first time you won’t need to do it (or type it) again</p>
<p>I&#8217;ve been a perl junky for years and years, so everything I do I try to make a script for, and the symcli is almost made directly for doing perl scripts.  When scripting if I can reduce the amount of logic and exceptions I need to worry about I have less extra work to do and there is less chance for messups (and less spaghetti code).  </p>
<p>i.e. to check the status of DMX &amp; Clariion devices using a script</p>
<p>Using a requirment of Clariion has 4x paths and DMX has 2x paths<br />
parse the output of powermt display<br />
Determine which devices are Clariion<br />
Determine which devices are DMX<br />
Match Clariion devices to number of active paths<br />
Match DMX devices to number of active paths<br />
if Clariion device 4x is the expected number of paths else alert<br />
if DMX device 2x is the expected number of paths else alert</p>
<p>or</p>
<p>Using a requirment of Clariion has 4x paths and DMX has 4x paths<br />
parse the output of powermt display<br />
Match device to number of active paths<br />
If paths not equal to 4x alert</p>
<p>For me this is the cleanest, easiest, and simplest way to think of it.  If all I know is a host is attatched to port BA, I know that it should have access on 7BA, 8BA, 9BA &amp; 10BA.  This is very  easy for people to understand, everything on the san should have 4x paths, rather than these storage arrays should have 2x paths and these should have 4x paths and these are how you find out which is which  (which is really a huge pickup).  It isn&#8217;t that more is &#8220;better&#8221; so turn up the paths to &#8220;11&#8243; (it only goes to ten), but that 4x paths just seems to be the magic number that works, from a simplicity, understandability, and availability perspective for general use.  No special cases, no special rules, no special anything, just 4x; being the anal person as well being able to say &#8220;one shallt allways have four paths to thine EMC storage no matter the make&#8221; makes me happy.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: SanGod</title>
		<link>http://blog.50micron.com/2007/12/06/overcomplicating-the-world/comment-page-1/#comment-5492</link>
		<dc:creator>SanGod</dc:creator>
		<pubDate>Tue, 18 Dec 2007 18:36:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.sangod.com/?p=188#comment-5492</guid>
		<description>JM -you beat me to it. ;-)   

IG - First off - this is not supposed to sound confrontational, and I&#039;m sorry if it&#039;s coming off that way.  It&#039;s always been my experience that this is how discussion works and how people learn, myself included.  (I don&#039;t pretend to know it all, I just try to make it a habit to know where to look)

As for my clever technical retort, It&#039;s not a &quot;you&#039;re doing it wrong&quot; it&#039;s a &quot;you&#039;re doing more work than you need to&quot;.

Two absolute truths in life:

&lt;blockquote&gt;SanGod abhors extra work and avoids it at all costs.
If you do it right the first time you won&#039;t need to do it again&lt;/blockquote&gt;

(I keep trying to tell my 11 year old the latter, he&#039;s not buying it)

The skill lies in finding the balance between those two statements.

At heart, I&#039;m lazy.  I don&#039;t want to do more than I have to.  I have a friend/former co-worker who followed me into two different jobs and said he loved it, because by the time i was done, things ran more smoothly in fewer steps than they had when I got there.

Provided you&#039;re running powerpath, the following setup:

HBA1 - Fabric1 - FA1

HBA2 - Fabric2 - FA2

(and if you find the need for extra bandwidth or redundancy)

HBA3 - Fabric1 - FA3

HBA4 - Fabric2 - FA4

(if you have four fabrics, make the obvious substitutions)

Is optimal.  Fewest moving parts, Fewest places for configuration problems / mistakes to affect you, and you&#039;re protected from the failure of any component.  EMC&#039;s internal infrastructure is robust enough to handle anything that the back-end can throw at it.  And all you need is something to monitor PowerPath (which ECC does a pretty good job of) and you&#039;re golden.  Because the newer DMX&#039;s use SFP&#039;s and not fixed optics, you even have an easy recovery from a port failure.  (an educated, experienced guess places 80% of port failures in the optics)

it also means your configuration change scripts and lun masking scripts are half as long, which when you&#039;re presenting 100 devices at a time means something.</description>
		<content:encoded><![CDATA[<p>JM -you beat me to it. <img src='http://blog.50micron.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />    </p>
<p>IG &#8211; First off &#8211; this is not supposed to sound confrontational, and I&#8217;m sorry if it&#8217;s coming off that way.  It&#8217;s always been my experience that this is how discussion works and how people learn, myself included.  (I don&#8217;t pretend to know it all, I just try to make it a habit to know where to look)</p>
<p>As for my clever technical retort, It&#8217;s not a &#8220;you&#8217;re doing it wrong&#8221; it&#8217;s a &#8220;you&#8217;re doing more work than you need to&#8221;.</p>
<p>Two absolute truths in life:</p>
<blockquote><p>SanGod abhors extra work and avoids it at all costs.<br />
If you do it right the first time you won&#8217;t need to do it again</p></blockquote>
<p>(I keep trying to tell my 11 year old the latter, he&#8217;s not buying it)</p>
<p>The skill lies in finding the balance between those two statements.</p>
<p>At heart, I&#8217;m lazy.  I don&#8217;t want to do more than I have to.  I have a friend/former co-worker who followed me into two different jobs and said he loved it, because by the time i was done, things ran more smoothly in fewer steps than they had when I got there.</p>
<p>Provided you&#8217;re running powerpath, the following setup:</p>
<p>HBA1 &#8211; Fabric1 &#8211; FA1</p>
<p>HBA2 &#8211; Fabric2 &#8211; FA2</p>
<p>(and if you find the need for extra bandwidth or redundancy)</p>
<p>HBA3 &#8211; Fabric1 &#8211; FA3</p>
<p>HBA4 &#8211; Fabric2 &#8211; FA4</p>
<p>(if you have four fabrics, make the obvious substitutions)</p>
<p>Is optimal.  Fewest moving parts, Fewest places for configuration problems / mistakes to affect you, and you&#8217;re protected from the failure of any component.  EMC&#8217;s internal infrastructure is robust enough to handle anything that the back-end can throw at it.  And all you need is something to monitor PowerPath (which ECC does a pretty good job of) and you&#8217;re golden.  Because the newer DMX&#8217;s use SFP&#8217;s and not fixed optics, you even have an easy recovery from a port failure.  (an educated, experienced guess places 80% of port failures in the optics)</p>
<p>it also means your configuration change scripts and lun masking scripts are half as long, which when you&#8217;re presenting 100 devices at a time means something.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jm</title>
		<link>http://blog.50micron.com/2007/12/06/overcomplicating-the-world/comment-page-1/#comment-5491</link>
		<dc:creator>jm</dc:creator>
		<pubDate>Tue, 18 Dec 2007 18:23:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.sangod.com/?p=188#comment-5491</guid>
		<description>I suppose it&#039;s a philosophical question on some level.  You have to weigh the management and maintenance of extraneous FC zones, device mappings, and device masking entries against what it can buy you in the event of some component failure.  I wouldn&#039;t say you&#039;ve chosen to do anything wrong, it&#039;s just extra work that isn&#039;t likely to buy you much of anything so long as you have your ducks in a row to begin with, so to speak.  If you&#039;re going to go above and beyond a fully redundant configuration (2 HBAs per host, 1 HBA -&gt; 1 FA), and you&#039;ve got a DMX4, you could just as well zone, map, and mask all your hosts to all your FAs.  Or if you&#039;ve got more than 128 hosts, maybe only zone to half of your FAs, since even beyond full redundancy, the logic goes &quot;more is better&quot;, right?

I dunno, I guess I see where you&#039;re coming from in light of the ISL situation you&#039;ve got, but it just doesn&#039;t seem &quot;clean&quot; to me.  I&#039;m kind of anal that way I guess.</description>
		<content:encoded><![CDATA[<p>I suppose it&#8217;s a philosophical question on some level.  You have to weigh the management and maintenance of extraneous FC zones, device mappings, and device masking entries against what it can buy you in the event of some component failure.  I wouldn&#8217;t say you&#8217;ve chosen to do anything wrong, it&#8217;s just extra work that isn&#8217;t likely to buy you much of anything so long as you have your ducks in a row to begin with, so to speak.  If you&#8217;re going to go above and beyond a fully redundant configuration (2 HBAs per host, 1 HBA -&gt; 1 FA), and you&#8217;ve got a DMX4, you could just as well zone, map, and mask all your hosts to all your FAs.  Or if you&#8217;ve got more than 128 hosts, maybe only zone to half of your FAs, since even beyond full redundancy, the logic goes &#8220;more is better&#8221;, right?</p>
<p>I dunno, I guess I see where you&#8217;re coming from in light of the ISL situation you&#8217;ve got, but it just doesn&#8217;t seem &#8220;clean&#8221; to me.  I&#8217;m kind of anal that way I guess.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: InsaneGeek</title>
		<link>http://blog.50micron.com/2007/12/06/overcomplicating-the-world/comment-page-1/#comment-5490</link>
		<dc:creator>InsaneGeek</dc:creator>
		<pubDate>Tue, 18 Dec 2007 17:36:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.sangod.com/?p=188#comment-5490</guid>
		<description>Yup Brocade trunking licenses help with that issue (which is what I meant from my &quot;old-school living&quot; statement, I had Brocade&#039;s when they didn&#039;t offer any trunking support: anybody remember the old 1gb, SC models with the LCD screens on the front?).  I did it back then because of that (among other things), and haven&#039;t seen any reason not to continue the practice today.  Brocade still have some interesting technical &quot;gotchas&quot; even with trunking today (that you already covered), and is the reason I&#039;m purchasing Cisco now since they&#039;ve had enough time to &quot;cook&quot; since I last replaced all my switches.

Your fan-out concern, is really the first and only legitimate quantifiable statement why zoning to two ports could be a problem.  A DMX-3/4 has a fan-out ratio of 128:1 so even that argument while valid for a configuration with &gt; 768 hosts going to a single DMX with FC SRDF (or &gt;1024 hosts with gig-e SRDF); we are into levels that are semi silly as to whether it&#039;s a concern or not for *general* usage statements (unless &gt; 768 hosts going to a single DMX is a general deployment rule than the exception).

With that I&#039;m not trying to aggressive, argumentative, jerk, etc about it, I&#039;m very open to being convinced that 2x is better than 4x, that it&#039;s the way of the future and my thinking is out-dated stupidity; but nobody has given me any reasonable data to change my opinion.   I&#039;ve given a number of examples that are still relevant today, while nobody has been able to show any reasonable technical counter arguments.  People have made statements about managing, difficulty, and complexity, but nobody has made any quantifiable statements around why it&#039;s so difficult/complex to manage.  So nobody has shown a technical reason why it&#039;s bad (except in a very extremely rare case with massive amount of hosts), nobody has been able to (or even attempted) quantify why it&#039;s difficult.  It&#039;s your blog and I appreciate you maintaining a resonable dialog when you don&#039;t have to with a random guy on the internet, but I don&#039;t think anybody has shown me anything substantial other than &quot;I said it, so that&#039;s the way it should be done&quot;.</description>
		<content:encoded><![CDATA[<p>Yup Brocade trunking licenses help with that issue (which is what I meant from my &#8220;old-school living&#8221; statement, I had Brocade&#8217;s when they didn&#8217;t offer any trunking support: anybody remember the old 1gb, SC models with the LCD screens on the front?).  I did it back then because of that (among other things), and haven&#8217;t seen any reason not to continue the practice today.  Brocade still have some interesting technical &#8220;gotchas&#8221; even with trunking today (that you already covered), and is the reason I&#8217;m purchasing Cisco now since they&#8217;ve had enough time to &#8220;cook&#8221; since I last replaced all my switches.</p>
<p>Your fan-out concern, is really the first and only legitimate quantifiable statement why zoning to two ports could be a problem.  A DMX-3/4 has a fan-out ratio of 128:1 so even that argument while valid for a configuration with &gt; 768 hosts going to a single DMX with FC SRDF (or &gt;1024 hosts with gig-e SRDF); we are into levels that are semi silly as to whether it&#8217;s a concern or not for *general* usage statements (unless &gt; 768 hosts going to a single DMX is a general deployment rule than the exception).</p>
<p>With that I&#8217;m not trying to aggressive, argumentative, jerk, etc about it, I&#8217;m very open to being convinced that 2x is better than 4x, that it&#8217;s the way of the future and my thinking is out-dated stupidity; but nobody has given me any reasonable data to change my opinion.   I&#8217;ve given a number of examples that are still relevant today, while nobody has been able to show any reasonable technical counter arguments.  People have made statements about managing, difficulty, and complexity, but nobody has made any quantifiable statements around why it&#8217;s so difficult/complex to manage.  So nobody has shown a technical reason why it&#8217;s bad (except in a very extremely rare case with massive amount of hosts), nobody has been able to (or even attempted) quantify why it&#8217;s difficult.  It&#8217;s your blog and I appreciate you maintaining a resonable dialog when you don&#8217;t have to with a random guy on the internet, but I don&#8217;t think anybody has shown me anything substantial other than &#8220;I said it, so that&#8217;s the way it should be done&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: SanGod</title>
		<link>http://blog.50micron.com/2007/12/06/overcomplicating-the-world/comment-page-1/#comment-5473</link>
		<dc:creator>SanGod</dc:creator>
		<pubDate>Mon, 17 Dec 2007 15:50:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.sangod.com/?p=188#comment-5473</guid>
		<description>That&#039;s the great thing about Cisco - pretty much anything can be done to it &#039;non-disruptively&#039;

I agree, some people are going to do it the way they&#039;re going to do it and that is that.  At least the advent of 4gbit FC means that when you trunk 4 cables together from core to edge, the odds of you overutilizing your trunk is minimal.  maybe in a burst, but highly unlikely under regular load.

As a consultant I can only make suggestions.  What the end-customer does from that point on is up to them.  I just find it ironic that people pay consultants only to ignore their suggestions.  ;-)</description>
		<content:encoded><![CDATA[<p>That&#8217;s the great thing about Cisco &#8211; pretty much anything can be done to it &#8216;non-disruptively&#8217;</p>
<p>I agree, some people are going to do it the way they&#8217;re going to do it and that is that.  At least the advent of 4gbit FC means that when you trunk 4 cables together from core to edge, the odds of you overutilizing your trunk is minimal.  maybe in a burst, but highly unlikely under regular load.</p>
<p>As a consultant I can only make suggestions.  What the end-customer does from that point on is up to them.  I just find it ironic that people pay consultants only to ignore their suggestions.  <img src='http://blog.50micron.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jm</title>
		<link>http://blog.50micron.com/2007/12/06/overcomplicating-the-world/comment-page-1/#comment-5472</link>
		<dc:creator>jm</dc:creator>
		<pubDate>Mon, 17 Dec 2007 14:25:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.sangod.com/?p=188#comment-5472</guid>
		<description>Well, I don&#039;t think we&#039;re going to convince InsaneGeek on this one.  I agree that crossing ISLs should be avoided where possible, but in larger environments they&#039;re a necessary evil.  In our shop we&#039;ve got (mostly full) Cisco MDS 9513s in the core and on the edges.  We&#039;d be managing 12-16 separate &#039;fabrics&#039; if we didn&#039;t have a core-edge design with ISLs, and then we&#039;d be back to SAN islands.  No thanks!  If ISLs can be sized and balanced properly*, they&#039;re not too bad to work with.  Cisco port channels use the SCSI exchange ID in the round robin calculation to get very granular load balancing.  I&#039;ve yet to see a lopsided port channel.  It&#039;s pretty straightforward to keep an eye on them and if any are regularly seeing &gt;50% utilization you can non-disruptively add bandwidth.</description>
		<content:encoded><![CDATA[<p>Well, I don&#8217;t think we&#8217;re going to convince InsaneGeek on this one.  I agree that crossing ISLs should be avoided where possible, but in larger environments they&#8217;re a necessary evil.  In our shop we&#8217;ve got (mostly full) Cisco MDS 9513s in the core and on the edges.  We&#8217;d be managing 12-16 separate &#8216;fabrics&#8217; if we didn&#8217;t have a core-edge design with ISLs, and then we&#8217;d be back to SAN islands.  No thanks!  If ISLs can be sized and balanced properly*, they&#8217;re not too bad to work with.  Cisco port channels use the SCSI exchange ID in the round robin calculation to get very granular load balancing.  I&#8217;ve yet to see a lopsided port channel.  It&#8217;s pretty straightforward to keep an eye on them and if any are regularly seeing &gt;50% utilization you can non-disruptively add bandwidth.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: SanGod</title>
		<link>http://blog.50micron.com/2007/12/06/overcomplicating-the-world/comment-page-1/#comment-5458</link>
		<dc:creator>SanGod</dc:creator>
		<pubDate>Sat, 15 Dec 2007 14:54:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.sangod.com/?p=188#comment-5458</guid>
		<description>That&#039;s the danger of not using trunking.  Brocade uses FSPF, literally, Fabric Shortest Path First.

If there are multiple &quot;shortest paths&quot; it will simply round robin them..   So if you have two ISL&#039;s and you connect Host-A, Host-B, Host-C, Host-D in order, A and C will be on one path, and B and D will be on the other.

This is especially problematic in clustered environments, because there is a 25% chance that both active nodes will end up utilizing one ISL, and the passive nodes will sit and exactly NOT utilize the second.

Trunking solves this problem, but requires that you burn more ports than you may like.  Also Brocade&#039;s trunking is limited to single-ASIC, so you have to put all 4 cables into one ASIC to get a trunk.  Single ASIC failure results in a segmented fabric.

McData and Cisco switches both trunk independant of ASIC, which is a good thing, because it means you can spread ISL&#039;s across blades and still get the performance.

I still maintain that pushing data across ISL&#039;s is universally a bad idea, but understandably sometimes it can&#039;t be avoided.  When I was out at Disney they had the storage in the basement and the hosts one floor up, they used ISL&#039;s between directors on each floor to hook everything together.

I guess the point is if you&#039;re going to run to 4 FA ports, you&#039;re gaining nothing by that unless you&#039;re running 4 HBA&#039;s.  The FSPF calculation is still going to be based on the HBA path, and if both FA&#039;s are in the same switch you&#039;re gaining nothing, if the FA&#039;s are in the different switch, you are potentially introducing extra, needless hops.

Best way is to devide the Symm up depending on the number of switches you have.  If you have 2 switches, use the Low-FA&#039;s to one, and the high-FA&#039;s to the other.  If you have four, devide the symm into quads, with Low/AB on one, Low/CD on the second, High/AB on the third, and High/CD on the fourth.

If you have three switches.  God help you.  There really is no easy way to balance across three swtiches.  Lord knows, i&#039;ve tried.  (Don&#039;t ask, though I can say it involved a government agency and leave it at that. ;-)

As far as doing one HBA to multiple FA&#039;s, again, you gain nothing but added complexity and cost.  If for the sake of argument you are following the 16:1 fan-in which I believe is still the recommended.  Over 4 FA&#039;s (with single-initiator/single-target - and reserving the D ports for SRDF) that&#039;s 192 total hosts you can connect to this Symm.  If you go 2:4, that cuts your number in half, or to 96.)  Remember, the Fan-In refers to the number of HBA&#039;s zoned to a single FA, not the number of real hosts.

So to get to the number of host ports you should have had before, you have to buy twice as many FA&#039;s.  Not a problem from EMC&#039;s perspective, but a waste nonetheless.  If the SAN is assembled correctly, with Dual fabrics that are connected properly, you gain absolutely nothing in the process.</description>
		<content:encoded><![CDATA[<p>That&#8217;s the danger of not using trunking.  Brocade uses FSPF, literally, Fabric Shortest Path First.</p>
<p>If there are multiple &#8220;shortest paths&#8221; it will simply round robin them..   So if you have two ISL&#8217;s and you connect Host-A, Host-B, Host-C, Host-D in order, A and C will be on one path, and B and D will be on the other.</p>
<p>This is especially problematic in clustered environments, because there is a 25% chance that both active nodes will end up utilizing one ISL, and the passive nodes will sit and exactly NOT utilize the second.</p>
<p>Trunking solves this problem, but requires that you burn more ports than you may like.  Also Brocade&#8217;s trunking is limited to single-ASIC, so you have to put all 4 cables into one ASIC to get a trunk.  Single ASIC failure results in a segmented fabric.</p>
<p>McData and Cisco switches both trunk independant of ASIC, which is a good thing, because it means you can spread ISL&#8217;s across blades and still get the performance.</p>
<p>I still maintain that pushing data across ISL&#8217;s is universally a bad idea, but understandably sometimes it can&#8217;t be avoided.  When I was out at Disney they had the storage in the basement and the hosts one floor up, they used ISL&#8217;s between directors on each floor to hook everything together.</p>
<p>I guess the point is if you&#8217;re going to run to 4 FA ports, you&#8217;re gaining nothing by that unless you&#8217;re running 4 HBA&#8217;s.  The FSPF calculation is still going to be based on the HBA path, and if both FA&#8217;s are in the same switch you&#8217;re gaining nothing, if the FA&#8217;s are in the different switch, you are potentially introducing extra, needless hops.</p>
<p>Best way is to devide the Symm up depending on the number of switches you have.  If you have 2 switches, use the Low-FA&#8217;s to one, and the high-FA&#8217;s to the other.  If you have four, devide the symm into quads, with Low/AB on one, Low/CD on the second, High/AB on the third, and High/CD on the fourth.</p>
<p>If you have three switches.  God help you.  There really is no easy way to balance across three swtiches.  Lord knows, i&#8217;ve tried.  (Don&#8217;t ask, though I can say it involved a government agency and leave it at that. <img src='http://blog.50micron.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>As far as doing one HBA to multiple FA&#8217;s, again, you gain nothing but added complexity and cost.  If for the sake of argument you are following the 16:1 fan-in which I believe is still the recommended.  Over 4 FA&#8217;s (with single-initiator/single-target &#8211; and reserving the D ports for SRDF) that&#8217;s 192 total hosts you can connect to this Symm.  If you go 2:4, that cuts your number in half, or to 96.)  Remember, the Fan-In refers to the number of HBA&#8217;s zoned to a single FA, not the number of real hosts.</p>
<p>So to get to the number of host ports you should have had before, you have to buy twice as many FA&#8217;s.  Not a problem from EMC&#8217;s perspective, but a waste nonetheless.  If the SAN is assembled correctly, with Dual fabrics that are connected properly, you gain absolutely nothing in the process.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: InsaneGeek</title>
		<link>http://blog.50micron.com/2007/12/06/overcomplicating-the-world/comment-page-1/#comment-5422</link>
		<dc:creator>InsaneGeek</dc:creator>
		<pubDate>Fri, 14 Dec 2007 23:52:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.sangod.com/?p=188#comment-5422</guid>
		<description>One of the problems is... well let&#039;s say that sometimes the path a switch will assign a HBA to storage can be non-optimal.  

i.e. you have 
3x, 2GB ISL links between some Brocade switches
As an aggregate on the hosts you normally push ~3GB of ISL throughput across both fabrics
Theoretically I&#039;m happy as a clam as you actually have 2x the throughput that you normally use (6GB of ISL&#039;s, 3GB of actual traffic)

Problem comes in when the lovely path algorithm comes into play, this is the tricky part:
Host A: uses 10MB
Host B: uses 10MB
Host C: uses 140MB
Host D: uses 140MB
Host E-?: uses whatever

There is a possiblity that the Brocade has assigned hosts C &amp; D to the same ISL port, as it&#039;s calculated upon first seeing the port come up and stays stuck there until an &quot;event&quot; (failure, or otherwise) causes the switch to recalculate paths.  Even with double the amount of ISL bandwidth than I use, I can still run into a problem (280MB of traffic will not fit into a 200MB ISL), heck I could quadruple my ISL bandwidth and still have this issue.  Admittedly this is some old-school living, but I&#039;ve felt the pain and as you said have &quot;years of experience&quot; and learned from it.  Now you can buy channel bonding licenses for Brocade, etc to reduce this but I&#039;ve lived the dream, had the experiences.  Having ran Brocade switches for years and years, I&#039;ve had to do the whole fabric goes down multiple times over the years (existing 1GB switches, want to plug in 2GB switches, need to change core fabric values so they match, major code revs).  Don&#039;t remember what code it was exactly but after going from 3.x to 4.x of Brocade firmware or 4.2 to 4.22 (been a number of years, kinda fuzzy).  Any fabric event (non-major ones, just simply reboot a host, etc and have the hba login) and running HBA&#039;s on completely different switches would be kicked off the fabric  (2nd customer in the world to run into this lovely one).  This required us to poweroff all the switches in the entire fabric simultaneously, one by one bring a switch online upgrade it, power it off until all the switches were complete and then power the entire fabric on.

You still have not really given me a reason that is a really bad reason to do it, that it causes major headaches in day to day management.  I have lived through some very fun experiences, and I&#039;ve found this works best for me.  The above noted brocade ugly 2nd customer in the world bug, had a tier 1 array powercycle while in production due to vendor service engineer (not in a nice way either, committed cache writes to hosts were not flushed, but were instead lost... very, very bad day), arrays with clustered heads that instead of failing to the working head the working head shutdown losing *all* access to the storage (fix: powercycle both heads).  We&#039;ve found very interesting ways for DMX&#039;s, Clariions, NetApps, Brocades, Cisco&#039;s etc to fail while they were in production, things that aren&#039;t supposed to *ever* happen do... I&#039;ve truely lived those pains.   Maybe having 4x paths to an HBA is not really absolutely &quot;needed&quot;, but if the only penalty is 5min setup time, that&#039;s the only penalty?  Why wouldn&#039;t one be doing it?  That&#039;s really is the question, unless 5 minutes is really that much more important to you than a little more piece of mind?  Not to be argumentative but Is it that much more painful, is it in a whitepaper that says this is against best practices, etc I&#039;ve felt the goodness with no pain.  What really is the hearburn that you experience from this, other than it takes 5 minutes longer to do.

After 4 months or so of work,, I just sent out a PO this afternoon to do *all* chassis switches in the different fabrics and upgrade *all* DMX &amp; Clariion units to a number of DMX-4&#039;s.</description>
		<content:encoded><![CDATA[<p>One of the problems is&#8230; well let&#8217;s say that sometimes the path a switch will assign a HBA to storage can be non-optimal.  </p>
<p>i.e. you have<br />
3x, 2GB ISL links between some Brocade switches<br />
As an aggregate on the hosts you normally push ~3GB of ISL throughput across both fabrics<br />
Theoretically I&#8217;m happy as a clam as you actually have 2x the throughput that you normally use (6GB of ISL&#8217;s, 3GB of actual traffic)</p>
<p>Problem comes in when the lovely path algorithm comes into play, this is the tricky part:<br />
Host A: uses 10MB<br />
Host B: uses 10MB<br />
Host C: uses 140MB<br />
Host D: uses 140MB<br />
Host E-?: uses whatever</p>
<p>There is a possiblity that the Brocade has assigned hosts C &amp; D to the same ISL port, as it&#8217;s calculated upon first seeing the port come up and stays stuck there until an &#8220;event&#8221; (failure, or otherwise) causes the switch to recalculate paths.  Even with double the amount of ISL bandwidth than I use, I can still run into a problem (280MB of traffic will not fit into a 200MB ISL), heck I could quadruple my ISL bandwidth and still have this issue.  Admittedly this is some old-school living, but I&#8217;ve felt the pain and as you said have &#8220;years of experience&#8221; and learned from it.  Now you can buy channel bonding licenses for Brocade, etc to reduce this but I&#8217;ve lived the dream, had the experiences.  Having ran Brocade switches for years and years, I&#8217;ve had to do the whole fabric goes down multiple times over the years (existing 1GB switches, want to plug in 2GB switches, need to change core fabric values so they match, major code revs).  Don&#8217;t remember what code it was exactly but after going from 3.x to 4.x of Brocade firmware or 4.2 to 4.22 (been a number of years, kinda fuzzy).  Any fabric event (non-major ones, just simply reboot a host, etc and have the hba login) and running HBA&#8217;s on completely different switches would be kicked off the fabric  (2nd customer in the world to run into this lovely one).  This required us to poweroff all the switches in the entire fabric simultaneously, one by one bring a switch online upgrade it, power it off until all the switches were complete and then power the entire fabric on.</p>
<p>You still have not really given me a reason that is a really bad reason to do it, that it causes major headaches in day to day management.  I have lived through some very fun experiences, and I&#8217;ve found this works best for me.  The above noted brocade ugly 2nd customer in the world bug, had a tier 1 array powercycle while in production due to vendor service engineer (not in a nice way either, committed cache writes to hosts were not flushed, but were instead lost&#8230; very, very bad day), arrays with clustered heads that instead of failing to the working head the working head shutdown losing *all* access to the storage (fix: powercycle both heads).  We&#8217;ve found very interesting ways for DMX&#8217;s, Clariions, NetApps, Brocades, Cisco&#8217;s etc to fail while they were in production, things that aren&#8217;t supposed to *ever* happen do&#8230; I&#8217;ve truely lived those pains.   Maybe having 4x paths to an HBA is not really absolutely &#8220;needed&#8221;, but if the only penalty is 5min setup time, that&#8217;s the only penalty?  Why wouldn&#8217;t one be doing it?  That&#8217;s really is the question, unless 5 minutes is really that much more important to you than a little more piece of mind?  Not to be argumentative but Is it that much more painful, is it in a whitepaper that says this is against best practices, etc I&#8217;ve felt the goodness with no pain.  What really is the hearburn that you experience from this, other than it takes 5 minutes longer to do.</p>
<p>After 4 months or so of work,, I just sent out a PO this afternoon to do *all* chassis switches in the different fabrics and upgrade *all* DMX &amp; Clariion units to a number of DMX-4&#8242;s.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

