Archive for the 'Deduplciation' Category

Backup Vs. Archive

Tuesday, September 15th, 2009

The fundamental difference between BACKUP and ARCHIVE.

A backup is there to help you deal with a crisis such as “My datacenter is a smoking hole in the ground now what do I do?” or something not quite as dramatic like “A virus ate my data.”  You recover from the backup to the last known good and all is happy, right?  Well except for the two or three days that might have gone since your last good backup…  (Was in one lawfirm that lost a drive only to find out their backups hadn’t been running for two months.. came back two weeks later to find a COMPLETE change in personnel had gone on while I was gone – lawyers are not very forgiving when they lose two months worth of email.)

An archive is data that, while not “Active” still might be required on a day-to-day basis.  Film / Video / Image archives are a good candidate for and example of that.

So on a disk-based archive you have some platform, ostensibly EMC/Legato DiskExtender or Rainfinity or something along those lines – that will move the data from “Active” storage to “Archive” storage.  In some applications you can even set up a true HSM, moving data that hasn’t been accessed to Tier-2(Enterprise SATA) and even Tier-3(yes, tape) as it ages, only to be recalled to Tier-1 when it’s accessed.

More often than not I’m brought face to face with people who don’t understand that very subtle difference.  One of my recent customers is actually doing it appropriately, using DX and a smallish Centerra to archive data that, while retention is required, is almost never actually accessed.

Then there are the people who use backup technology for archival purposes.

I’m pretty “old school” when it comes down to it.

Tape is for backup.  Tape is *NOT* supposed to be used as nearline storage when there are equally inexpensive (and more reliable) disk methods out there.

My main complaint about tape as archive: You don’t know if it’s bad until you try to read it.  And time you read it the simple act of moving the tape into a tape drive that was manufactured under less than ideal conditions means you are putting your data at risk.

Spending millions of dollars on a new Room-Sized tape library doesn’t make sense when Centerra storage is fairly inexpensive *AND* provides redundancy of the data automatically.

Spending more millions of dollars on three of them is lunacy when one EMC Atmos set up could provide redundancy and a single namespace for recall.  (and if you go whole hog, geographically relevant retrieval is an option to, so you automatically get it from the closest copy.)

It pains me to see it done wrong.  Especially when it involves trying to shoe-horn two more STK monsters into an already cramped datacenter when the work of it could be done in a couple of floor-tiles of spinning disks.

Data De-Duplication

Wednesday, May 14th, 2008

Data De-Duplication on SearchStorage.com

Beth on SearchStorage.com started this great thread and I wanted to comment on it on my own home turf, as it were.

Data DeDuplication.  Also known as compression, hasn’t changed since the early days of PKZIP 1.0. 

Compression works by identifying like blocks of data and replacing them with a single block and pointers to every place the block was found.  One of the main reasons it works so well in plain text applications is that there are only so many combinations of ascii characters that can be found.

I find it interesting that this seemingly old technology has found new life in the form of the seemingly complicated “Data DeDuplication”.

So far – no one has sufficiently explained to me the benefits of using a Data Deduplication product over the conventional in-band tape compression.  Obviously offloading compression to something with a real processor might gain you some performance and maybe even allow compression to happen without causing a tape to ‘shoe-shine’ across the head as it keeps having to back up.  However I have not yet seen a single example that justifies the cost and effort involved.

Anyone?  Bueller?