Centerra
Backup Vs. Archive
by Jesse on Sep.15, 2009, under "Cloud", Archive, Backup, Best Practices, Centerra, Deduplciation, Gripe
The fundamental difference between BACKUP and ARCHIVE.
A backup is there to help you deal with a crisis such as “My datacenter is a smoking hole in the ground now what do I do?” or something not quite as dramatic like “A virus ate my data.” You recover from the backup to the last known good and all is happy, right? Well except for the two or three days that might have gone since your last good backup… (Was in one lawfirm that lost a drive only to find out their backups hadn’t been running for two months.. came back two weeks later to find a COMPLETE change in personnel had gone on while I was gone – lawyers are not very forgiving when they lose two months worth of email.)
An archive is data that, while not “Active” still might be required on a day-to-day basis. Film / Video / Image archives are a good candidate for and example of that.
So on a disk-based archive you have some platform, ostensibly EMC/Legato DiskExtender or Rainfinity or something along those lines – that will move the data from “Active” storage to “Archive” storage. In some applications you can even set up a true HSM, moving data that hasn’t been accessed to Tier-2(Enterprise SATA) and even Tier-3(yes, tape) as it ages, only to be recalled to Tier-1 when it’s accessed.
More often than not I’m brought face to face with people who don’t understand that very subtle difference. One of my recent customers is actually doing it appropriately, using DX and a smallish Centerra to archive data that, while retention is required, is almost never actually accessed.
Then there are the people who use backup technology for archival purposes.
I’m pretty “old school” when it comes down to it.
Tape is for backup. Tape is *NOT* supposed to be used as nearline storage when there are equally inexpensive (and more reliable) disk methods out there.
My main complaint about tape as archive: You don’t know if it’s bad until you try to read it. And time you read it the simple act of moving the tape into a tape drive that was manufactured under less than ideal conditions means you are putting your data at risk.
Spending millions of dollars on a new Room-Sized tape library doesn’t make sense when Centerra storage is fairly inexpensive *AND* provides redundancy of the data automatically.
Spending more millions of dollars on three of them is lunacy when one EMC Atmos set up could provide redundancy and a single namespace for recall. (and if you go whole hog, geographically relevant retrieval is an option to, so you automatically get it from the closest copy.)
It pains me to see it done wrong. Especially when it involves trying to shoe-horn two more STK monsters into an already cramped datacenter when the work of it could be done in a couple of floor-tiles of spinning disks.
Storage Tiering…
by Jesse on Jul.09, 2009, under "Cloud", Backup, Celerra, Centerra, Clariion, DR/COOP, ILM, RAID, Symmetrix
Ok, given the changes to the storage arena I’ve been working on a revised “Tiering system” to incorporate all of the levels of data…importance?
My version of Storage Tiering is (or should be) as follows:
- Tier-1 – Symmetrix/Replicated – High Performance/Criticial Data
- Tier-2 – Symmetrix/NonReplicated – High Performance/Non-Criticial Data
- Tier-3 – Symmetrix/SATA/Replicated – High-Medium Performance/Critical Data
- Tier-4 – Symmetrix/SATA/NonReplicated – High-Medium Performance/Non-Critical Data
- Tier-5 – Clariion/FC/Replicated – Medium Performance/Critical Data
- Tier-6 – Clariion/FC/NonReplicated – Medium Performance/Non-Critical Data
- Tier-7 – Clariion/SATA/Replicated – Low Performance/Critical Data
- Tier-8 – Clariion/SATA/NonReplicated – Low Performance/Non-Critical Data
- Tier-9 – CelerraNAS/Replicated – Network Attached/Critical Data
- Tier-10 – CelerraNAS/NonReplicated – Network Attached/Non-Criticial Data
- Tier-11 – Atmos – Network Attached / Low Performance
- Tier-12 – Centerra (Content Addressable Storage) – Low Performance Archive / Highly Available
- Tier-13 – Primary Tape-In-Library (Automatic loading on demand via HSM)
- Tier-14 – Primary Tape-Out-Of-Library (Manual Intervention Required)
“Critical Data” vs. “Non-Critical Data” is simply a matter of how long you can be without the data should a failure or accidental deletion occur. As all data is available in Tier8/9 storage (in theory).
I’ve also considered using Tier1/Tier1B to describe DMX storage vs. Clariion storage, given that there is a LOT of overlap in performance characteristics these days…
Oh, and iSCSI would be somewhere between 10 and 13….
Any thoughts?
EMC Atmos
by Jesse on Apr.04, 2009, under "Cloud", Celerra, Centerra
Got my first presentation on EMC’s new “Atmos” storage platform.
Now granted this was kind of a sales-ey (is to a word) presentation but I’m pretty impressed so far.
It seems what EMC has done is combined the best of Celerra and Centerra. (In fact, the gentleman giving the presentation sort of placed it on the map right between the two)
The basics of it is they get a bunch of 1U (Presumably Dell) Pizza-Box type servers and put them in front of a bunch of really *REALLY* cheap storage.
They then present the storage out using a variety of protocols, CIFS/NFS, and the REST/SOAP API’s. Rumors of an iSCSI could not be confirmed…or explained (how in the world would you convert block-storage to object-storage and expect any kind of real performance?)
Downsides….well, there are multiple single-points-of-failure in each frame, which is why when you invest in the Atmos hardware you will buy a minimum of two frames. I think this could have been avoided in a more robust deployment.
There is no “Compliance” edition (yet?) This would/could easily be the replacement for the Centerra, if they can just get past that little hurdle. I’ve known many customers (and been one myself) who have chosen the NetApp filer over Centerra for archiving because all we wanted/needed was a CIFS share that we could guarantee the content on.
I was not able to get reasonable performance numbers from the presenter. Assuming Gigabit-Ethernet off the internal switch/bus/apparatus maximum sustained transfer rate would be 125 MBytes/Sec. 10Gig-Ethernet is currently running at substantially less than the 1.25G that you would expect.
I’m curious as to what the world’s thoughts are on “Cloud” storage (I hate the term “Cloud” anything – it’s a mostly meaningless term that describes nothing but outsourcing.)
Next step: Get my hands on one and try it out. This may not be as much of a long-shot as it seems.
Network Appliance
by Jesse on Jul.03, 2007, under Centerra, NetApp
I went to a NetApp demo today, and they were trying desparately to show me where they competed with the Centerra.
First off, i think the demo went in the wrong direction. I am not the “average” customer, I wouldn’t have been there if I wasn’t interested, so it should have been very much less ‘sales-pitch’ and more nuts and bolts, ‘geeky details’.
My first question, and one that they were not able to answer was about the compliance clock.
First off, the coolest part of the netapp is that the structure of the fileserver itself is stored within the metadata on the disks, as well as in the processor. This means that (in theory, because I’ve never seen it happen) you can pull the disks out of one filer, put them in a new one, power it up, and have everything exactly as it was when you shut down the original.
Now this is a good thing, except that I understand the compliance clock exists and has to be initialized within the processor. Now once it’s set it is locked. The gentleman who ran the demo even admitted he doesn’t know of a way to “clear” it, though I’m sure it can be done through a fairly routine clearing of the NV ram in the storage processor.
So if you’ve got data on a raid group that can’t be deleted, you shut the array down, move the disks to a new array, and boot it. You then go and initialize and set the compliance clock in the new unit to 30 years ahead and poof, you can now delete data from the disks.
Yes – it’s an unrealistic scenario, but I have always pictured my job in situations like this to be to find the hole in the ruleset and drive a truck through it.
if you can move the disks to a new array and tinker with the clock there, then it’s not a true compliance product.
Can anyone tell me if I’m off base? Is the compliance clock dependant on the disks as well as the array?
My second problem is the idea of “block-level” remote replication. The one thing I liked about the centerra is that it’s policy-based replciation is object based, meaning that when a file is replicated it’s pushed to the remote array. This, among other things, protects the integrity of the remote filesystem. (not that Centerra has a filesystem per-se) Block level writes, when interupted, can cause filesystem-wide corruption and other general weirdness.
On another (minor) point, the fact that replication is accomplished by reading the data just written to the disk, would double the IO load on the devices. (Why do it that way, when it could be simply written directly from cache to two locations…but that’s just crazy talk, right?)
Â
Centerra vs. NetApp
by Jesse on Jun.17, 2007, under Centerra, Comparison Shopping, NetApp
Interestingly enough, my favourite Veritas sales guy from Strategic Technologies (www.stratech.com) actually managed to do the virtually impossible.
He got me to thinking and questioning my blind believe in “what EMC says.”
I’m looking at a different options for WORM archiving right now. Of course the first player in the game is the G5 Centerra. It’s reportedly bulletproof, and when the auditors come through testing your compliance, their sales shtick is that “they just look at the centerra and wave it through”. (Much like the san diego border patrol, right?)
So what got me thinking about the NetApp “Archive and Compliance Solution” is that it offers everything Centerra does, without locking you into the API that Centerra does.
One of the biggest problems with the Centerra is that you are locked into their technology. Once you start archiving to centerra, it’s a nightmare to get off it should you decide to years down the line. This is because there is no “filesystem” per-se to migrate off of. Everything going to the centerra has to go through their API.
The Network Appliance product however offers a CIFS/NFS solution, so saving files to the archive can be as simple as copying files to a directory. (I don’t know the details of how revisions are kept yet, I got about 100 pages of documentation that I was planning on going through this weekend, before the yard-work hit me.
)
This means that not only can you browse the filesystem and copy anything out of it you want to, but that you can also migrate out of it with a minimal of fuss if you need to.
The CIFS/NFS solution also makes it more compatible than the Centerra. Since the Centerra CAS system requires the Centerra API, a limited number of applications work with it. Now as of this writing the Centerra meets my needs, however who knows what the higher-ups are going to decide to bring in. And if they bring in a new application that A- requires archiving of data, and B- doesn’t support the Centerra, then we’re screwed and have to go out and get something new anyway.
Now the other bonus is that it’s my understanding that the price point of 4TB (Usable – Replicated) of NetApp storage is much more pleasant than 4TB (Usable – Replicated) of Centerra storage.
Now I know that most of my readership are Hitachi/NetApp people, so i know the way the responses to this are going to go. My question is actually this:
Does anyone (other than my EMC sales team) see a compelling reason to stick to the Centerra?
Centerra – love it?
by Jesse on Jan.05, 2007, under Backup, Centerra, ILM, Replication
We just had our sales presentation on EMC’s Centerra Content Addressable Storage system. I have to admit, I went into it knowing a little about it, and even the 60,000 foot “executive summary” EMC put together really impressed me.
The idea of putting so much data to tape but keeping it up and available just floors me. But for a “reasonable” price, I can offload all of our imaging (we don’t use paper records) voice recordings for the call center, and email traffic to a system that is widely considered to be so bulletproof (when in a multi-location DR environment) that it doesn’t require backups.
By doing object-level mirroring it seems like they’ve really conquered the need for backups, as well as the management nightmare that is records retention.  Since the objects can be mirrored within the frame, as well as to a remote frame, that makes it even more solid.
I have to say I’m impressed – now to sell it to the Execs….. (Actually our CEO is so “compliance driven” it may not be much of a hard sell)