Archive for the 'Archive' Category

Backup Vs. Archive

Tuesday, September 15th, 2009

The fundamental difference between BACKUP and ARCHIVE.

A backup is there to help you deal with a crisis such as “My datacenter is a smoking hole in the ground now what do I do?” or something not quite as dramatic like “A virus ate my data.”  You recover from the backup to the last known good and all is happy, right?  Well except for the two or three days that might have gone since your last good backup…  (Was in one lawfirm that lost a drive only to find out their backups hadn’t been running for two months.. came back two weeks later to find a COMPLETE change in personnel had gone on while I was gone – lawyers are not very forgiving when they lose two months worth of email.)

An archive is data that, while not “Active” still might be required on a day-to-day basis.  Film / Video / Image archives are a good candidate for and example of that.

So on a disk-based archive you have some platform, ostensibly EMC/Legato DiskExtender or Rainfinity or something along those lines – that will move the data from “Active” storage to “Archive” storage.  In some applications you can even set up a true HSM, moving data that hasn’t been accessed to Tier-2(Enterprise SATA) and even Tier-3(yes, tape) as it ages, only to be recalled to Tier-1 when it’s accessed.

More often than not I’m brought face to face with people who don’t understand that very subtle difference.  One of my recent customers is actually doing it appropriately, using DX and a smallish Centerra to archive data that, while retention is required, is almost never actually accessed.

Then there are the people who use backup technology for archival purposes.

I’m pretty “old school” when it comes down to it.

Tape is for backup.  Tape is *NOT* supposed to be used as nearline storage when there are equally inexpensive (and more reliable) disk methods out there.

My main complaint about tape as archive: You don’t know if it’s bad until you try to read it.  And time you read it the simple act of moving the tape into a tape drive that was manufactured under less than ideal conditions means you are putting your data at risk.

Spending millions of dollars on a new Room-Sized tape library doesn’t make sense when Centerra storage is fairly inexpensive *AND* provides redundancy of the data automatically.

Spending more millions of dollars on three of them is lunacy when one EMC Atmos set up could provide redundancy and a single namespace for recall.  (and if you go whole hog, geographically relevant retrieval is an option to, so you automatically get it from the closest copy.)

It pains me to see it done wrong.  Especially when it involves trying to shoe-horn two more STK monsters into an already cramped datacenter when the work of it could be done in a couple of floor-tiles of spinning disks.