50Micron.com

Backup

Veritas

by Jesse on May.02, 2007, under Backup, Veritas NetBackup

Just a warning – keep an eye on the install drive for Veritas.   Letting the drive fill can result in a corrputed EMM database and Veritas forgetting things like the last year’s worth of backups.

So I ended up rebuilding both the master server and one of the media servers.

One thing that might come in handy. 

Mount a secondary volume as “C:\Program Files\VERITAS\Netbackup\db\images” and copy the original images catalog there.  This allows for massive growth plus the ability to expand the images database without having to reinstall veritas…(again).

Even Veritas said it was an interesting solution to the problem.  My goal was as much flexibility as possible going forward, because I have found out quite the hard way what happens when you don’t allow enough space to grow.  If the images catalog should fill up, it can going to cause the EMM (Enterprise Media Manager) database to corrupt itself..

So with the images catalog in-tact but the EMM database gone, you end up with a list of files and what tapes they’re on, but not a list of tapes, volume pools, EXPIRATION DATES, etc.

6 Comments more...

Veritas Media Server Encryption Option…

by Jesse on Apr.12, 2007, under Backup, Encryption, Veritas NetBackup

So the biggest problem with Veritas, is that their client-side encryption option, which is the standard deployment, negates the use of Veritas Bare-Metal-Restore (BMR).

For those who aren’t Veritas geeks like me, BMR is the handy-dandy application that allows you to rebuild a server from scratch using only a floppy (or bootable CD) with little or no input.  All of the particulars of a server are captured when it is backed up.  Drivers, hardware, IP settings, hostname, etc.  You then build a BMR boot disk for that server.  When it crashes and you have to replace it, you boot from the boot disk and it takes all the settings and builds a new server from the last backup from absolute scratch.

I’ve seen it work, it’s a miracle in the making.

However, if you’re using the Veritas client-side encryption, the key is managed by the client server.  And for some reason, this key is not included in the BMR boot disk that is generated by the BMR boot server.  This means that while it can start to rebuild the environment, it can’t restore the last backup because it can’t unencrypt it.

I’ve been looking at options, such as Decru’s Data-Fort inline FC encryption engine, as well as some of the options from Neoscale.

Both would have done the job nicely, however the prices quoted made selling these options up the river to those with the three-letter-initials painfull.

Now I find that Veritas has a recently released MSOE, or Media-Server-Encryption-Option.   Since the encryption is done at the media server, the BMR incompatibility is done away with, and lo and behold, everything works as advertised.  The only real down-side I think I can come up with is the increase in host-overhead on the media server, which means I may have to increase the number of media servers in the environment, which of course makes Veritas more expensive.

I’ve not gotten the quote on this, but I’m assuming it’s going to be less than the almost $50K some of the other options have come to.  I’ll let you know.

5 Comments more...

Woah – Bigtime Duck

by Jesse on Mar.20, 2007, under Backup, Duck

I think I’m going to send this guy the biggest duck ever.

Oops! Techie wipes out $38 billion fundWe’ve all done it, accidentally deleted a file, maybe some of us have even formatted the wrong drive.

But I think it’s safe to say that most of us have not wiped out $38 billion dollars.

According to the MSNBC story, this guy formatted a drive at the Alask Department of Revenue, accidentally deleting application information.

If that wasn’t enough, the guy then follows this with a perfect encore:  He formats the backup drive.

*THEN* when they realize that their backups are useless.

Talk about the pefect storm.  Three disasters.  Three totally preventable mistakes, put together it cost them over $200,000 because they had one more backup.

 300 cardboard boxes containing some 800,000 applications.

The impressive thing was the following quote, from Revenue Commissioner Bill Corbus.  “Everybody felt very bad about it and we all learned a lesson. There was no witch hunt,”

I’m impressed.

 

Leave a Comment more...

TimeFinder Clone Part II

by Jesse on Feb.27, 2007, under Backup, TimeFinder

A user posted this as a response to “The Many flavors of EMC TimeFinder“  I felt it rated it’s own post.

———————————————

Q: “My experience is mostly using IBM Sharks. I’m now working in a very large EMC environment, foucsed on backup. I’m wondering if the TimeFinder/Clone’s cloned volume can be permanently mounted on another host. Specifically, I want to avoid importing a nd exporting DGs from Veritas every time I need to ’split’ the mirror, as I would have to do with TineFinder’/Mirror. “

A: TimeFinder/Clone can in fact be a permanent copy of the data.  As long as the either of the copy options are set, the -precopy option, which copies all data before the session is activated, or the -copy option, which performs a full background copy of the data while the target (clone) device is available to the backup/development host, is used.

The default, (no switches used) will not result in a full clone and as such is only available while the clone session is in the “Active” state.

Q: What I’m looking for a is a point in time copy which is mounted on a backup media server while the production data disk are mounted on the production server. The application on the production server would be paused, then the clone would happen, then the application could be restarted. The cloned data would “magically” appear on the clone volume set mounted on the backup media server.

A: If you’re doing this simply for backup, then copy-on-access is easier and more flexible.  COA allows you to create and activate a session, copy the data, and terminate it immediately.  Truthfully TimeFinder/SNAP is equally up to this task (as it’s largely the same thing) but in my opinion you get more flexibility by purchasing clone instead of snap, though you do spend more money on disks in the process.

Unfortunately, as I’m assuming you’re talking about a Windows environment, there isn’t much “Magical” about it. 

Q: The largest issue I’ve run into is the insistence by a few EMC folks that the clone volume must be mounted on the prod host, then unmounted and mounted on the backup host.

A: Again, assuming you’re talking about Windows, this is incorrect.  You can’t remount the snap or clone of a production volume back to the same host because windows is largely dependant on the Signature of the drive.  Doing so can actually confuse windows into thinking that there are two copies of the disk and cause data corruption.  (I can’t say anything, the same is true of AIX, which, if you’re using LVM and not raw disks, has the same signature (they call it PVID) dependency.)

I’ve worked in several environments where the target volumes are simply unmounted, synched, and remounted.  So long as the target ID and the signature don’t change, this is not usually an issue.

The benefit to using clone is that you can leave the session active, or use one of the ‘copy’ options to produce a full copy.  Then, restoring from the disk in the case of a failure becomes a real option.  You simply reverse the sync, remount the production volumes, and start the application or database from there.  (Ok, it’s not simple, but I’m not sure even I have the drive space to post the full procedure here.  :) )

 

12 Comments more...

Has anyone found anything good about Microsoft servers yet?

by Jesse on Feb.10, 2007, under Backup, Bug, General, Microsuck, TimeFinder

I’m not really bashing their workstations, i’m actually quite fond of Vista on my laptop.

However, when it comes to servers, I view being in an environment where Microsoft is the PRIMARY operating system by a factor of 20:1 as a form of torture akin to having my finger-naiils pulled out or being tied to a chair and forced to listen to “Barney” all day.

What I hate most about Microsoft – (and if I keep this up, I’m going to have to rename this site to Microsoft-Hates-Me.Com) – is that it can’t handle the simplest tasks.

For a split-mirror backup, whether it be TimeFinder/Mirror, TF/Clone, or TF/SNAP, the process is the same:

1.  Freeze the database / filesystem
2.  Snap the volumes.
3.  Thaw the database / filesystem
4.  Mount the volumes on your media server host.
5.  Back the filesystems up.
6.  Unmount the volumes from the media server
7.  Terminate the Snap session

Seems pretty basic.  Microsoft seems to have trouble with #4 and #6.  Seems this “Super OS” they’ve got can’t handle the idea that SCSI devices might go on and off the bus at different times. 

EMC gives a tool, TFIM (TimeFinder Integration Modules) at at least allows you to perform the commands that Microsoft doesn’t even make available, mount, unmount, flush, etc.   But god forbid you reboot a host while the SNAPS are inactive or the BCV’s are established (and thereby not ready to the host).  You’re screwed.

Can *SOMEONE* please write a decent SCSI driver for Windows?  Please?

6 Comments more...

TimeFinder Integration

by Jesse on Jan.28, 2007, under Backup, General, Microsuck

I’ve been reading the TimeFinder Integration guide for SQL and Exchange – this is going to be FUN…. (not)

it would be nice if, at least from the SQL standpoint, they do transaction logging in the traditional sense, so that I could just snap the database, snap the transaction logs, and back the flat-files up.

What MS seems to want, is for you to snap the database and logs, and then mount them on a remote SQL system and back it up using the SQL tools, which is just about 10 more steps than should be required to perform this.

Why does Microsoft feel it necessary to complicate things so painfully?

4 Comments more...

Centerra – love it?

by Jesse on Jan.05, 2007, under Backup, Centerra, ILM, Replication

We just had our sales presentation on EMC’s Centerra Content Addressable Storage system.  I have to admit, I went into it knowing a little about it, and even the 60,000 foot “executive summary” EMC put together really impressed me.

The idea of putting so much data to tape but keeping it up and available just floors me.  But for a “reasonable” price, I can offload all of our imaging (we don’t use paper records) voice recordings for the call center, and email traffic to a system that is widely considered to be so bulletproof (when in a multi-location DR environment) that it doesn’t require backups.

By doing object-level mirroring it seems like they’ve really conquered the need for backups, as well as the management nightmare that is records retention.   Since the objects can be mirrored within the frame, as well as to a remote frame, that makes it even more solid.

I have to say I’m impressed – now to sell it to the Execs…..  (Actually our CEO is so “compliance driven” it may not be much of a hard sell)

2 Comments more...

Enterprise Vault for Exchange

by Jesse on Dec.06, 2006, under Backup, Data Migration, ILM

The boys over at Symantec (www.symantec.com) just came by last week and gave us an interesting presentation on Enterprise Vault.  (Not to be confused with the Vault extension for NetBackup, which is a different beast)

The short answer is this.  EV is an application that dives into your exchange environment and strips out any email/attachments over (x) days old.  It then creates an HTML view copy of it and stores the original out of the Exchange information store in Tier-2 storage.  Then, after an even longer period, say a couple of years or so, you can even stage it from Tier-2 (say slow disks like Clariion ATA) to Tier-3 (Tape) storage.

The cool part is, that there is a header file that stays in the user’s email that shows that it’s a vaulted email.  If they double click to open it like the would a normal email, it figures out where it is, and if it’s in Tier-2 storage it brings it up, if it’s in Tier-3 storage, it sends a tape request to the appropriate person so the tape can be recalled from off-site storage and restored.

For companies that have to retain data for 20+ years, how bloated can an email infrastructure get?  I’ve got 90G in my information store after the first year, and it’s only going to get worse from this point on.

Though I’ll bet EMC is frothing at the mouth at the idea. ;-)

10 Comments more...

TimeFinder on MSSQL a possibility?

by Jesse on Dec.04, 2006, under Backup, Microsuck, TimeFinder

Let’s face it, if it’s Oracle, DB2, or anything along those lines, I can snap a copy and back it up with my eyes closed.

MSSQL, being a pretend database, has me stumped.  I’m so used to archive logs that I’m not even sure how to use TF/Snap to back up the database.

This is my understanding.  MSSQL doesn’t do “Archive Logging” in the traditional sense.

In a “REAL” database system the process is as follows:

1.  You put the database into “Hot Backup” mode.  In Oracle this quiesces the data files and writes all changes to the transaction log.  (When you take the database out of backup mode, the transactions in the log are then played into the database)

2.  When the above is complete, you can issue a command in one form or another to switch out the last transaction log, which closes one file and opens the next one, and then back up the database files along with the closed transaction log files via whatever file level backup process you have in place, whether it be TimeFinder or just having NetBackup pull the files from that server.

At Disney we did just that, with DB2, and moved in the neighborhood of 250+ Terabytes to tape every night.

At a number of other sites I’ve done the exact same process with Oracle.

Enter MSSQL, a Playschool excuse for an RDBMS, and I’m stumped.  See – the problem is there is never more than one “database.LDF” file for logging.  How am I supposed to quiesce writes to a logfile when it never closes it?

Then add that to the process for rolling transaction log backups forward in MSSQL is dependent on the idea that you used the MS Backup process to back it up.  It seems to be completely unaware of file level backups of the database.

I’m at a loss here – any ideas?

2 Comments more...

RPO vs. RTO

by Jesse on Oct.25, 2006, under Backup, DR/COOP, Replication

I had an engineer friend of mine (real engineer, not affiliated with computers) once told me.

 ”There are three options:

     1. You can have it faster.
     2. You can have it smaller.
     3. You can have it cheaper.

….Now pick any two.”

Over and over in my life I’ve put that theory to the test and to this day it has always held true.  The smaller and faster something is the more expensive it gets.  The cheaper something is the more it is slow and less portable.

Disaster Recovery and COOP (Continuation of OPerations – for the layman) follow a lot of the same rules.

There are Three main criteria you’re aiming for.  The main two are RPO and RTO.  That’s “Recovery Point Objective” and “Recovery Time Objective”

The third is, of course, cost.

RPO is defined by the point at which you need to be able to recover to.  Goals are sometimes easy to obtain, “Midnight on the morning of the failure” is usually pretty easily obtainable, as you can do that by restoring from backups.  Financial institutions aim for somewhat stricter objectives.  Most banks will require an RPO of “Zero” meaning “I want to see the last committed transaction on my DR site in the even the source site becomes a smoking hole in the ground.”

This is doable of course, provided the DR site is close enough to the source site to run dark fibre between the two with low enough latency to add negligible impact to production.  (the rule of thumb for synchronous replication is 2ms per 10k, that is for every 10 kilometers you’re adding 2ms of latency.  A normal physical drive has a latency of about 9-14ms, so if you go to far you’re going to slow your system to a crawl.

RTO is defined as “how long can I afford to have my environment down to affect a failover.”  I’ve worked in one environment where transaction logs were backed up to tape and shipped across country from the L.A. area to Orlando, Florida, where the tapes were then restored into a standby system.  The recovery time to a 15 minute increment was effectively days, because they actually had to wait for the last tape to make it to the target site before they could restore it and bring the system on-line.  It was Insane.

Your goal is to get RPO and RTO to as close to zero as possible without bankrupting the budget (or the company).

An RPO of zero can be obtained with a DR site within about 10 kilometers, 20 if you can live with the slower response times in production.  This is full synchronous transfer from one array to another, every write from host to disk has to be acknowledged by the REMOTE array before it is reported to the host that the write is committed.

EMC’s SRDF/A and SRDF/AR mitigate that in environments where the DR site is far enough away as to kill any chance of SRDF/Syncronous working. 

SRDF/A is a “packetized” SRDF, where the receiving Symm has to receive two consecutive “checkpoints” before it commits the block of data.  That way if an incomplete block is received, it’s discarded to prevent data corruption resulting from incomplete write information.  The downside to SRDF/A of course it that it requires an insane amount of cache to function properly.  (And don’t let an SE tell you it doesn’t, he’s lying or not capable of understanding that for the remote Symm to receive a block of data, it has to be able to store it somewhere other than disk until it receives two checkpoints.

SRDF/AR is an automated replication product.  You are essentially mirroring production to a TimeFinder BCV, which is then sent synchronously to the remote site.  You can run a Sync transfer because the BCV’s are not connected to the production volumes, and as such the production volumes do not require any ACK/NAK from the remote system.  Depending on the time it takes to replicate (how fast the pipe is between the two sites) you can get RTO to about 10 minutes, which is good enough for most.  The effects of SRDF/AR can be duplicated by anyone proficient in Korn shell, as it literally runs a series of waits and whiles for each stage of the process.  AR has the added bonus that you can actually keep a second set of BCV’s on the target host and run your backups from them.  The down-side to the AR type of scenario (whether it be SRDF/AR or a scripted set-up) is that it costs disks – and lots of them.  There are the production volumes, mirrored, the first set of BCV’s, unprotected, the SRDF target devices (Mirrored or Raid5) and the second set of BCV’s.

Scary huh?

As I prepare to start my own replciation design this was formost on my mind, which is how it ended up here.  (this is after all the dumping ground for my random thoughts)

4 Comments more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...