“Cloud”
Backup Vs. Archive
by Jesse on Sep.15, 2009, under "Cloud", Archive, Backup, Best Practices, Centerra, Deduplciation, Gripe
The fundamental difference between BACKUP and ARCHIVE.
A backup is there to help you deal with a crisis such as “My datacenter is a smoking hole in the ground now what do I do?” or something not quite as dramatic like “A virus ate my data.” You recover from the backup to the last known good and all is happy, right? Well except for the two or three days that might have gone since your last good backup… (Was in one lawfirm that lost a drive only to find out their backups hadn’t been running for two months.. came back two weeks later to find a COMPLETE change in personnel had gone on while I was gone – lawyers are not very forgiving when they lose two months worth of email.)
An archive is data that, while not “Active” still might be required on a day-to-day basis. Film / Video / Image archives are a good candidate for and example of that.
So on a disk-based archive you have some platform, ostensibly EMC/Legato DiskExtender or Rainfinity or something along those lines – that will move the data from “Active” storage to “Archive” storage. In some applications you can even set up a true HSM, moving data that hasn’t been accessed to Tier-2(Enterprise SATA) and even Tier-3(yes, tape) as it ages, only to be recalled to Tier-1 when it’s accessed.
More often than not I’m brought face to face with people who don’t understand that very subtle difference. One of my recent customers is actually doing it appropriately, using DX and a smallish Centerra to archive data that, while retention is required, is almost never actually accessed.
Then there are the people who use backup technology for archival purposes.
I’m pretty “old school” when it comes down to it.
Tape is for backup. Tape is *NOT* supposed to be used as nearline storage when there are equally inexpensive (and more reliable) disk methods out there.
My main complaint about tape as archive: You don’t know if it’s bad until you try to read it. And time you read it the simple act of moving the tape into a tape drive that was manufactured under less than ideal conditions means you are putting your data at risk.
Spending millions of dollars on a new Room-Sized tape library doesn’t make sense when Centerra storage is fairly inexpensive *AND* provides redundancy of the data automatically.
Spending more millions of dollars on three of them is lunacy when one EMC Atmos set up could provide redundancy and a single namespace for recall. (and if you go whole hog, geographically relevant retrieval is an option to, so you automatically get it from the closest copy.)
It pains me to see it done wrong. Especially when it involves trying to shoe-horn two more STK monsters into an already cramped datacenter when the work of it could be done in a couple of floor-tiles of spinning disks.
Storage Tiering…
by Jesse on Jul.09, 2009, under "Cloud", Backup, Celerra, Centerra, Clariion, DR/COOP, ILM, RAID, Symmetrix
Ok, given the changes to the storage arena I’ve been working on a revised “Tiering system” to incorporate all of the levels of data…importance?
My version of Storage Tiering is (or should be) as follows:
- Tier-1 – Symmetrix/Replicated – High Performance/Criticial Data
- Tier-2 – Symmetrix/NonReplicated – High Performance/Non-Criticial Data
- Tier-3 – Symmetrix/SATA/Replicated – High-Medium Performance/Critical Data
- Tier-4 – Symmetrix/SATA/NonReplicated – High-Medium Performance/Non-Critical Data
- Tier-5 – Clariion/FC/Replicated – Medium Performance/Critical Data
- Tier-6 – Clariion/FC/NonReplicated – Medium Performance/Non-Critical Data
- Tier-7 – Clariion/SATA/Replicated – Low Performance/Critical Data
- Tier-8 – Clariion/SATA/NonReplicated – Low Performance/Non-Critical Data
- Tier-9 – CelerraNAS/Replicated – Network Attached/Critical Data
- Tier-10 – CelerraNAS/NonReplicated – Network Attached/Non-Criticial Data
- Tier-11 – Atmos – Network Attached / Low Performance
- Tier-12 – Centerra (Content Addressable Storage) – Low Performance Archive / Highly Available
- Tier-13 – Primary Tape-In-Library (Automatic loading on demand via HSM)
- Tier-14 – Primary Tape-Out-Of-Library (Manual Intervention Required)
“Critical Data” vs. “Non-Critical Data” is simply a matter of how long you can be without the data should a failure or accidental deletion occur. As all data is available in Tier8/9 storage (in theory).
I’ve also considered using Tier1/Tier1B to describe DMX storage vs. Clariion storage, given that there is a LOT of overlap in performance characteristics these days…
Oh, and iSCSI would be somewhere between 10 and 13….
Any thoughts?
EMC Atmos
by Jesse on Apr.04, 2009, under "Cloud", Celerra, Centerra
Got my first presentation on EMC’s new “Atmos” storage platform.
Now granted this was kind of a sales-ey (is to a word) presentation but I’m pretty impressed so far.
It seems what EMC has done is combined the best of Celerra and Centerra. (In fact, the gentleman giving the presentation sort of placed it on the map right between the two)
The basics of it is they get a bunch of 1U (Presumably Dell) Pizza-Box type servers and put them in front of a bunch of really *REALLY* cheap storage.
They then present the storage out using a variety of protocols, CIFS/NFS, and the REST/SOAP API’s. Rumors of an iSCSI could not be confirmed…or explained (how in the world would you convert block-storage to object-storage and expect any kind of real performance?)
Downsides….well, there are multiple single-points-of-failure in each frame, which is why when you invest in the Atmos hardware you will buy a minimum of two frames. I think this could have been avoided in a more robust deployment.
There is no “Compliance” edition (yet?) This would/could easily be the replacement for the Centerra, if they can just get past that little hurdle. I’ve known many customers (and been one myself) who have chosen the NetApp filer over Centerra for archiving because all we wanted/needed was a CIFS share that we could guarantee the content on.
I was not able to get reasonable performance numbers from the presenter. Assuming Gigabit-Ethernet off the internal switch/bus/apparatus maximum sustained transfer rate would be 125 MBytes/Sec. 10Gig-Ethernet is currently running at substantially less than the 1.25G that you would expect.
I’m curious as to what the world’s thoughts are on “Cloud” storage (I hate the term “Cloud” anything – it’s a mostly meaningless term that describes nothing but outsourcing.)
Next step: Get my hands on one and try it out. This may not be as much of a long-shot as it seems.