Archive for the '“Cloud”' Category

Cloud Computing….

Thursday, December 15th, 2011

Ok, Chuck Hollis posted a great article on Cloud but I had to get my two-one-hundredths in.  Enough so that I’ve temporarily decided to come out of retirement.

Here goes.

The problem with “Cloud” is that most people don’t realize that while you might be gaining in areas of cost, possibly (but not likely) performance, and scalability, IMHO what you’re giving up is far worse.

Control.

Putting your application in “The Cloud” is the ultimate abdication of an IT Manager’s responsibility. It’s saying “If it breaks, I have someone else to blame.”

Cloud computing has been around for years. They just called it “Managed Services” or “Hosted Services” or any of a number of other marketing catch-words before.

Myself I’ve told people that if you “…can’t point to the system that is hosting your application, it’s technically in the cloud.”  A simple two-node VMWare cluster, technology that’s been around for years, is suddenly a “Private Cloud”

I’ve often said that Marketing is nothing but Sales without the ethics involved.  My wife (The marketing person) and I argue on this point regularly.  :)

Cloud is ambiguity wrapped in uncertainty.  It’s a hope without proof that what you need will be there when you need it.

This is obviously an oversimplification but nonetheless it’s accurate.

Now don’t get me wrong – “Cloud” computing is perfect for small businesses who don’t want to host an IT staff or dedicate 250 square feet to a computer room. It’s good for medium businesses that are trying to control licensing costs or who expect random expansions/contractions of their user base.  Heck, even *I* provide similar services to a few local businesses simply because I’ve already got the disks spinning (and it helps pay the electric bill)

But if you have data-retention compliance requirements or any of a dozen other regulatory hurdles, or if you just plain want to *KNOW* where your data is, you’re often better off keeping the application in house.

*ESPECIALLY* if you already have a datacenter you’re using for other purposes.

As a consultant for a company that is experimenting with moving it’s corporate email system to Google. I’ve been asked a number of times what the backup policies are, what the retention policies are, what RTO and RPO are in the event of a failure.

The only answer I can (and will) give is “Well Google says it’s this, Google says it’s that.”

When they ask me what *I* think I simply tell them I not only don’t know, but that I can’t know.  It’s unknowable.  Since I don’t have control over the application as such I absolutely refuse to speculate as to the actions and abilities of others whom I don’t know and are not in my direct control.  I hope the managed services company has someone competent in their employ, but as most of them hire based on labor cost over skill-set (I’ve interviewed for the positions, I’ve seen the depth-charge offers they throw out) I won’t count on it.

Because the truth is, you don’t know. Sure you know what the marketing says, what the sales rep told you, what the tech-support person tells you when you call in. But if your hands aren’t the ones shuttling the tapes from the library to the vault, you don’t actually *KNOW*, you suspect.

Finally, “Cloud” services are fine until there is a failure. The problem with a failure, as last year’s EC2 failure illustrated, is that when there *IS* an enterprise failure, whether it be due to the lack of planning, infrastructure, or just a plain, old-fashioned act of god, you’re not in control.

You have to wait on the guys at Amazon to fix the problem. And if you’re one of a thousand customers affected by an outage, odds are pretty strong your application isn’t the first one that’s being worked on. And no amount of yelling or screaming is going to change that…

Personally, I would always prefer to be in a position to make a single phone call and get someone out of bed whose sole job it is to get *MY* Exchange server back online, or handle a failover, etc.  If I measure downtime in thousands of dollars per minute, I want to *KNOW* that my sites/applications are being worked on first.  The only way to know that is to sign the paycheck of the guy who is actually hands-on.  (Or in my case to *BE* the guy who is actually hands-on.)

Again, Just my .02 cents.

To Cloud, or not to Cloud…

Thursday, July 28th, 2011

It really does seem to be the question…  the sad part is how many people I talk to in my travels don’t really understand what cloud even is, let alone what the pros and cons are of moving your applications into it.

Background – a company is considering moving probably 3,000-5,000+ users to gmail as a ‘corporate’ email system…  They are running exchange currently…

Apparently, they don’t read the news and have missed out on the multiple spectacular failures of services like Google, Amazon and the like.

Cloud services are GREAT if you are running a small business, don’t want to / can’t afford an IT budget, or just plain don’t want to deal with it.

If you’re a billion dollar corporation with a multi-million dollar IT infrastructure already in place.  Outsourcing email seems a bit…odd.

Granted, if you are this company, you are obviously going to get the top-of-the-line service, dedicated support personel, etc.  You’re also buying plausible deniability should data-loss put you in jeopardy under subpoena. (While “I disposed of the data” is bad, “The company I was outsourcing to lost it” is not as bad.)

“Honest your honor, we had the emails but Google deleted them by accident.”

*DISCLAIMER – I’m not implying that google would ever do something like this on purpose, using them as a generic, like Xerox.

** It’s Google’s fault…they’re big enough to have become the verb.

***Does anyone actually own a Xerox branded machine anymore?

So if you’re SuperMegaCorp, LLC…you pay for the real service.  You get dedicated support staff, a private line to call, etc.  But to be honest, you might as well keep it in house because hey, you already have the staff, the datacenter, the VMWare farm, etc.  At that point you’re talking a few dollars in licensing and you’ve got email address for your thousands of employees for pennies each.  (Ok, yes, add in replication, backup, etc and it gets a bit higher, but the point is you’ve already comoditized it. (is too a word))

But think about it this way.  The company you’re contracting too has to pay for the same things *YOU* have to pay for.  *PLUS* they have to make enough of a profit to keep their shareholders off their back.  They do get a bit of a discount for bulk licensing, hardware, etc…

But what you GET for hosting it in house is immeasurable.  You get control.

At my last gig I heard the following phrase over and over again.  “I want one neck to choke.” (Oddly enough it was the argument given for moving AWAY from their previously preferred vendor, but you get the idea.)

When the email admin works for you, you have one neck to choke.  You get immediate results. Or you get the pleasure of firing someone.  (Can be fun in the right circumstances, ask The Donald.)

Now say you hosted with Amazon, just for grins.

Not only are your hosts down, potentially THOUSANDS of other hosts are down as well.  Now while we would like to believe they have a thousand techs on staff to give each customer equal time…let’s face it.  it’s not going to happen.  They  have, EXTREMELY generously, 10 technicians per thousand customers.  The techs will bring hosts up as soon as they can…

In an egalitarian society, odds are quite simply about 1000:1 against your site being the first one brought up…  990:1 against it being the second, etc.  See where I’m getting?  Eventually they’ll get around to it, but unless they figured out time travel and can loop back and do them all at the same point in time…you’re out of luck.  Yes, you’ve probably got a 99.999% uptime guarantee…but read the small print of your contract…  Their liability to you cannot exceed the cost of the hosting, if that, or some similiar legalease that limits their liability for downtime and, god forbid, data loss.

But this is not an egalitarian society…  Pure capitalism and “he who has the most gold gets their email back first.” If you’re with Amazon, well they host some PRETTY big sites…including their own.  Netflix comes to mind.  So in a downtime event if it comes down to bringing Joe the Plumber’s CRM app or Netflix’s east-coast streaming…which one do you think is going to get priority?

Right.

I have one neck to choke…  50Micron is hosted by Catbytes… the company that I do my consulting through.  Reason being that I maintain the lab anyway for “play” (officially: self-education and training) purposes, it’s easy for me to spin up an extra VM and put Exchange on it, a couple of CentOS Mailscanners, a few webservers, etc, even off-site replication of backups over a 10MBit link to a “DR” site (that happens to be in my basement)  (If someone wants to donate another CX3-20i or a couple of FCIP bridges I’ll have block-level replication. ;-) )

When Amazon EC2 had their issues, suspiciously I had a pretty major crash as well… (As did the customer I was working for at the time, don’t get me started on my paranoid theories.)

But when my stuff breaks… It’s my fault, it’s my responsibility, and *I* am the only one in line.  If I had hosted with Google or Amazon I might have been down for weeks…

I was back up in about 2 hours.  The time it took me to cycle the environment remotely. :)

Yes…building an IT infrastructure if you already have one can be pricey..  Paying someone else for hosting when you already HAVE an IT infrastructure just plain doesn’t make sense.

P.S. The funniest part is I’m now hosting about a half-dozen servers for friends/family (not free, I’m ugly, not stupid; and co-lo cages are NOT cheap) and about 40-50 websites that I’ve gotten via friends and word-of-mouth…

Of course my guarantee is as follows:

“Best effort, and you have to realize I have a day job that by it’s very nature comes first.”  :)

Good Cloud, Bad Cloud, a Titanic story…

Saturday, April 23rd, 2011

This weeks abject failure of Amazon.com’s EC2 hosting environment has caused quite the stir.  There are those who say that this proves that this incident “Proves Cloud Failure Recovery is a Myth” and others who say that we should just give it a chance.

Facts are facts.  Amazon screwed the pooch big-time last week.  Their outage caused ripple effects nation-wide.  But while it’s easy to throw the blame at Amazon for the failure ti’s important to remember that cloud computing is still only in it’s infancy, this mad rush to adopt it is part and parcel of the reason these problems are happening.  Customers rushing for a new product creates demand, companies looking to be the first to capitalize on that demand create a product that may or may not be ready for prime time.

But because no-one ever (because it’s impossible) thought to test the kind of cascade failure they experienced, they were pushing the high-availability envelope right out of the gate.

So no big deal, right?  Foursquare, parts of netflix, etc. were down due to the outage.  Other than inconvenience and the inability of narcissistic people to let the world know where they are and what they’re doing, it’s not really that big a deal (for us)

And then this came out: https://forums.aws.amazon.com/thread.jspa?threadID=65649&tstart=0

Specifically this line:

“We are a monitoring company and are monitoring hundreds of cardiac patients at home.  We were unable to see their ECG signals since 21st of April.”

Really?  You have a life-critical application and you hosted it “in the cloud”?  Did it never occur to you that it’s probably *NOT* a good place for a life-or-death application?  While I would consider it as a backup, definitely not my one and only.

People who know me know I have a rule.  I don’t say it works until I’ve seen it work at least once, and even then I’ll qualify my statement with “well I saw it work under THESE conditions.”  I do *NOT* say something works based on what some sales or marketing person tells me works.  (Trust me, this has been a major sticking point between me and my sales team. ;-)

That being said.  You have to accept that if you put your critical apps in “the cloud” by it’s very nature you are abdicating your control over it, and putting your full faith in someone ELSE to fix the problem.  Someone who may not think your application is as important as the one in the rack next to yours.

Are you going to take someone’s word that something is “Highly Available” if you haven’t actually pulled the plug yourself and watched it fail over?  I won’t.  I will candidly couch my answer in “That’s the way it’s supposed to work” or “That’s the way it’s designed to work”  But until you see a failover, that’s not the way it DOES work, because it never has.

I run my own email, my own webserver, my own infrastructure. I prefer it this way, because now if the system goes down, I know exactly whose butt to kick.

As a rule, and If I’m paying someone else to provide a service… I make sure I know where, how, and who to call when it blows up.  It’s probably the best advise I can give.

Amazon billed this as being “highly avaialble” and maybe it is, for the most part.  But obviously if you think of a million ways for something to go wrong, you can bet even money on their being at least a million and one ways for it to fail.

Instead of EC2, they should have named it “Titanic” because everyone knows the easiest way to invite disaster is to tell the world you’re immune to it.

Backup Vs. Archive

Tuesday, September 15th, 2009

The fundamental difference between BACKUP and ARCHIVE.

A backup is there to help you deal with a crisis such as “My datacenter is a smoking hole in the ground now what do I do?” or something not quite as dramatic like “A virus ate my data.”  You recover from the backup to the last known good and all is happy, right?  Well except for the two or three days that might have gone since your last good backup…  (Was in one lawfirm that lost a drive only to find out their backups hadn’t been running for two months.. came back two weeks later to find a COMPLETE change in personnel had gone on while I was gone – lawyers are not very forgiving when they lose two months worth of email.)

An archive is data that, while not “Active” still might be required on a day-to-day basis.  Film / Video / Image archives are a good candidate for and example of that.

So on a disk-based archive you have some platform, ostensibly EMC/Legato DiskExtender or Rainfinity or something along those lines – that will move the data from “Active” storage to “Archive” storage.  In some applications you can even set up a true HSM, moving data that hasn’t been accessed to Tier-2(Enterprise SATA) and even Tier-3(yes, tape) as it ages, only to be recalled to Tier-1 when it’s accessed.

More often than not I’m brought face to face with people who don’t understand that very subtle difference.  One of my recent customers is actually doing it appropriately, using DX and a smallish Centerra to archive data that, while retention is required, is almost never actually accessed.

Then there are the people who use backup technology for archival purposes.

I’m pretty “old school” when it comes down to it.

Tape is for backup.  Tape is *NOT* supposed to be used as nearline storage when there are equally inexpensive (and more reliable) disk methods out there.

My main complaint about tape as archive: You don’t know if it’s bad until you try to read it.  And time you read it the simple act of moving the tape into a tape drive that was manufactured under less than ideal conditions means you are putting your data at risk.

Spending millions of dollars on a new Room-Sized tape library doesn’t make sense when Centerra storage is fairly inexpensive *AND* provides redundancy of the data automatically.

Spending more millions of dollars on three of them is lunacy when one EMC Atmos set up could provide redundancy and a single namespace for recall.  (and if you go whole hog, geographically relevant retrieval is an option to, so you automatically get it from the closest copy.)

It pains me to see it done wrong.  Especially when it involves trying to shoe-horn two more STK monsters into an already cramped datacenter when the work of it could be done in a couple of floor-tiles of spinning disks.

Storage Tiering…

Thursday, July 9th, 2009

Ok, given the changes to the storage arena I’ve been working on a revised “Tiering system” to incorporate all of the levels of data…importance?

My version of Storage Tiering is (or should be) as follows:

  • Tier-1    – Symmetrix/Replicated – High Performance/Criticial Data
  • Tier-2    – Symmetrix/NonReplicated – High Performance/Non-Criticial Data
  • Tier-3   – Symmetrix/SATA/Replicated – High-Medium Performance/Critical Data
  • Tier-4   – Symmetrix/SATA/NonReplicated – High-Medium Performance/Non-Critical Data
  • Tier-5    – Clariion/FC/Replicated – Medium Performance/Critical Data
  • Tier-6    – Clariion/FC/NonReplicated – Medium Performance/Non-Critical Data
  • Tier-7    – Clariion/SATA/Replicated – Low Performance/Critical Data
  • Tier-8    – Clariion/SATA/NonReplicated – Low Performance/Non-Critical Data
  • Tier-9    – CelerraNAS/Replicated – Network Attached/Critical Data
  • Tier-10  – CelerraNAS/NonReplicated – Network Attached/Non-Criticial Data
  • Tier-11  – Atmos – Network Attached / Low Performance
  • Tier-12  – Centerra (Content Addressable Storage) – Low Performance Archive / Highly Available
  • Tier-13  – Primary Tape-In-Library (Automatic loading on demand via HSM)
  • Tier-14  – Primary Tape-Out-Of-Library (Manual Intervention Required)

“Critical Data” vs. “Non-Critical Data” is simply a matter of how long you can be without the data should a failure or accidental deletion occur.  As all data is available in Tier8/9 storage (in theory).

I’ve also considered using Tier1/Tier1B to describe DMX storage vs. Clariion storage, given that there is a LOT of overlap in performance characteristics these days…

Oh, and iSCSI would be somewhere between 10 and 13….

Any thoughts?

EMC Atmos

Saturday, April 4th, 2009

Got my first presentation on EMC’s new “Atmos” storage platform.

Now granted this was kind of a sales-ey (is to a word) presentation but I’m pretty impressed so far.

It seems what EMC has done is combined the best of Celerra and Centerra. (In fact, the gentleman giving the presentation sort of placed it on the map right between the two)

The basics of it is they get a bunch of 1U (Presumably Dell) Pizza-Box type servers and put them in front of a bunch of really *REALLY* cheap storage.

They then present the storage out using a variety of protocols, CIFS/NFS, and the REST/SOAP API’s.  Rumors of an iSCSI could not be confirmed…or explained (how in the world would you convert block-storage to object-storage and expect any kind of real performance?)

Downsides….well, there are multiple single-points-of-failure in each frame, which is why when you invest in the Atmos hardware you will buy a minimum of two frames.  I think this could have been avoided in a more robust deployment.

There is no “Compliance” edition (yet?)  This would/could easily be the replacement for the Centerra, if they can just get past that little hurdle.  I’ve known many customers (and been one myself) who have chosen the NetApp filer over Centerra for archiving because all we wanted/needed was a CIFS share that we could guarantee the content on.

I was not able to get reasonable performance numbers from the presenter.  Assuming Gigabit-Ethernet off the internal switch/bus/apparatus maximum sustained transfer rate would be 125 MBytes/Sec.  10Gig-Ethernet is currently running at substantially less than the 1.25G that you would expect.

I’m curious as to what the world’s thoughts are on “Cloud” storage (I hate the term “Cloud” anything – it’s a mostly meaningless term that describes nothing but outsourcing.)

Next step: Get my hands on one and try it out.  This may not be as much of a long-shot as it seems.  :)