Backup
Is Microsoft VSS a real Snap? Maybe. Does it suck? Absolutly!
by Jesse on Oct.17, 2006, under Backup, Microsuck, Symmetrix, TimeFinder
I can’t even talk during the day because of the great sucking sound coming from our microsoft infrastructure.
From a storage end, it’s even harder, because natively Microsoft doesn’t have ANY tools to unmount a filesystem or quiesce a production volume so you can take a hardware based snapshot of it.
Of course they’ve introduced VSS, which is like saying that there is never any way but their way to clone a volume.
The main problem with VSS (besides it being a product of the limited minds at microsoft) is that it’s yet another stupid host-based application that requires system resources on the host when engaged.
VSS, and most other volume “Snapshot” providers, work in the same way. The simplistic description is “Copy on first write.”
Let’s go over it step-by-step.
Veritas NetBackup 6.0 – Synthetic Backups?
by Jesse on Oct.06, 2006, under Backup
Just an interesting note.
When I took the NBU 6.0 management course earlier this year, we breifly touched on “Synthetic Backups”.
Synthetic backups are a great idea – you can build a full backup by taking the last full backup and rolling the incrementals into it. This results in a backup that takes 1/10th the time, (assuming a disk storage unit, might be longer going directly to tape) and requires minimal bandwith.
Playing with my test environment at home, this works without an issue. The first full backup (over 10/100 network connections) took about 10 hours, the second backup took slightly over 3 hours and captured all the changes perfectly.
This works on any filesystem backup, but not database or exchange backups.
The Different Flavors of EMC TimeFinder
by Jesse on Sep.26, 2006, under Backup, General, RAID, Replication, TimeFinder
I don’t know what most people know about TimeFinder, so I’ll start with a short introduction.
EMC Timefinder was developed to provide customers with a dynamic mirror they could use to try and cut some of the tediousness out of copying data, whether from one host to another or within the same host.
When I was working for EMC, Timefinder was still more or less in it’s infancy, and only came in one flavor. (Now referred to as “TimeFinder/Mirror”)
A Timefinder volume is a volume that is essentially a dynamic mirror of a production volume. Called a ‘BCV’ (for Business Continuance Volume) it was a straight 1:1 mirror of your production data. (If you were running in a 2-Way-Mirrored configuration, the BCV essentially become a third mirror.) At a point-in-time, you could split the third mirror off it’s production pair and make it available to another host.
The main benefits were obviously discovered in Backup. You could split a BCV of your production data off, mount it to another host, and back it up to a locally attached tape library with zero network overhead. Using this you could also use a single backup host to back-up copies of BCV’s from multiple production hosts.
Another good use for BCV’s was in development. One story I like to tell was from I was an admin several years ago. Developers liked to “break” their database on a friday afternoon, knowing that the restore from tape would take the better part of a day and in doing so they guaranteed they would make their tee-time. With the advent of TimeFinder, I was able to tell them “Not a problem, I’ll have it back up in 20 minutes.”  The reason being that I could restore from the mirror almost instantly.
The main negatives for TF/Mirror are that in all cases, the initial synchronization has to complete before the data can be made available to the target system. Now this is mitigated by the fact that after the initial relationship is established all further mirrors are incremental, meaning that only changed tracks are copied to the BCV volume, but it can still be a time consuming process.
Now EMC has come out with two new forms of TimeFinder in Symmetrix, very similar to the Clariion functionality.
TimeFinder/SNAP
  SNAP uses a process called “Copy On First Write”.  This uses a much smaller volume than the production volume as a “virtual device” (Called a VDEV)  The VDEV serves as a list of pointers for each track in the volume. Reads are serviced from the original volume until the track is changed. When the track is changed the original data is copied to a cache area, and the pointer for this track in the VDEV device is changed to point to the cached original track. In doing this the VDEV device will contain an exact copy of the production data as it was when the Snap session was activated. When the data is no longer needed, you terminate the Snap session and the cached changes are discarded.
The data is available the instant the SNAP session is activated.
The downside to this is that all reads touch the production volumes. In a heavily utilized system there can be a noticible impact. Another negative is that SNAP sessions are limited to the amount of cache set aside. A usual configuration is to set aside about 20% of the area used by production as “SnapCache” This can then be used as needed. If the SnapCache fills up, the Snap session ends and that is that.
TimeFinder/CLONE
  Clone uses another process, similar to SNAP, called “Copy On Access”. Clone volumes are identically sized to the production volumes, which of course uses up more space, but provides for a more permanent home. This provides the data permanance of TimeFinder Mirror, the speed of TimeFinder Snap, and the agility to move data from Standard volume to another standard volume. (Raid-1 production volumes to Raid-5 Development volumes is a good example)
What copy-on-access offers is the unique ability to use the volume before it’s actually finished mirroring. When a clone session is first started, all the target volume contains are pointers to the source volume. Every time a track is accessed, (read or write) it is copied to the target volume first. (prior to any write operations) If no options are selected this is the only time a track is copied. If the -copy option is selected when the Clone session is created, a background copy of the production volume is started. This will eventually result in a copy of the data that will persist after the clone session is terminated. (when no option is specified, the data will disappear when the session is terminated) There is also the option to copy (mirror) the entire production volume to the target volume before the session is activated. This is called “Precopy” and is a close emulation to what is done using TimeFinder mirror without the limitations of having to use BCV’s as targets.
TF/Clone has to be the best of all worlds. It gives you the flexibility of Snap with the data-resilience of Mirror, and the flexibility of being able to go from one volume to another without restrictions on what type of volume your target is. (TimeFinder/Mirror requires the use of BCV volumes)
Timefinder is the production that gave me my introduction into EMC. TimeFinder and SRDF are also the technologies I’ve implemented more often than any others in my work for (and with) EMC.
If you’ve got questions, feel free to post them. You’re probably not the only ones.
Are backup tapes headed for the graveyard?
by Jesse on Sep.14, 2006, under Backup
I read an article today in Byte&Switch ( Link ) called “Can’t quite kick the tape habit” ( Read-it-here ). This article seems to imply that actual tape backups may be going by the wayside in favor of Disk storage unites and VTL (Virtual-Tape-Library) systems.
I think the author is on crack, and that while the disk vendors would very much like to see tape go by the wayside, it’s simply not going to happen.Â
Add that to the fact that most companies, including mine, will always require the ability to archive off tape. (This presents a challenge when it comes to people who use backup products like Veritas (www.veritas.com) because are we really sure that the tape format (Veritas writes all it’s tapes in GnuTAR format) is going to be readable after so many years?
A virtual tape library system, at least the ones I’ve read up on in my research, are basically disk arrays filled with inexpensive and not-so-fast disks, that in turn are carved up and utilized exactly as a tape library, with volume numbers, a “robot arm” and such. (Network Appliance (www.netapp.com) and FalconStor (www.falconstor.com) both make VTL units)
I’m sorry, but there isn’t a virtual tape library in the world that will handle very-long-term retention. Especially in the volume that we do. Our disk storage unit is 17TB of 500G SATA drives. At any given time, with only two weeks of backup data (Fulls on weekends and differentials daily) that storage unit runs at about 70% capacity. And this is with just six months of data. I can’t conceive of how much storage would be required to maintain 30 years of data. (actually, I can, without growth it’s 6.35 PETABYTES in 27 years)  I had planned for about a terabyte a year in growth at minimum, (and some say my estimates are very low) and if my math is right, (234*27+(52*(27^2)) that’s about 44 petabytes of data after the 27 year mark. Mind you that’s the point where we can start expiring our first tapes.
As much as EMC would love us for it, I don’t think we’ll be buying that much Clariion space in the near future.
The down-side to tape is obvious. No tape manufacturer will guarantee their tapes in storage for more than 7 years. So now what we have to do to ensure restorability, is after five years, every month we will bring a set of tapes back and duplicate them to whatever the “format-du-jour” is and send the new tapes back out.Â
Then again, I know companies who put all their faith in CD-R storage, before all the stories of “Laser-Rot” starting coming out. (The fact that if you didn’t use a CD Friendly marker on the CDR, that the ink would seep in and oxidize the surface of the disk making it unreadable – Sharpie(tm) brand markers are not by default CDR friendly, though they do now make one that is.
Exchange backups a problem?
by Jesse on Sep.11, 2006, under Backup
74 Gigs should *NOT* take 24 hours to back up. Keep in mind, we are not going to tape, we are backing up to Disk Storage Units and then copying backup images from disk to tape later. So tape bandwidth is not the issue here.
I’ve been working on an exchange backup problem. Now I know that the exchange server in question was not set up as “best practices”. Single information store, (used to be installed on the C: drive, we finally moved that) for about 350 users. The new exchange server is coming online soon (not soon enough for my tastes) but for now this is what I have to work with.
A single stream backup has taken about 24 hours to complete, even for differentials, using the default directive:
Microsoft Exchange Mailboxes:\
You create a single stream for all mailboxes.
So knowing there has to be a better way to do this, I tried the usual wildcard, as follows:
Microsoft Exchange Mailboxes:\*
With disasterous results. The system spawned 400+ backup streams, which held the entire backup environment hostage. Half the backup jobs couldn’t run within the 5 hour window we had set for ourselves. A little research through the Symantec/Veritas site (their site is not exactly easty to sift through) turns up the following set of directives in the “Exchange Administrator’s Guide”:
NEW_STREAM
Microsoft Exchange Public Folders:\
NEW_STREAM
Microsoft Information Store:\
NEW_STREAM
Microsoft Exchange Mailboxes:\[a-e]*
NEW_STREAM
Microsoft Exchange Mailboxes:\[f-j]*
NEW_STREAM
Microsoft Exchange Mailboxes:\[k-o]*
NEW_STREAM
Microsoft Exchange Mailboxes:\[p-t]*
NEW_STREAM
Microsoft Exchange Mailboxes:\[u-z]*
Now the first two are easy – back up the public folders, and back up the information store as a whole. Backing up the information store as well as the mailboxes can be said to be a bit redundant, however this is a big deal if you have to do a full restore – restoring from the Information store backup is much much faster than the item by item backup. It’s a waste of time until you’re down and need it, so is worth the extra time / storage.Â
The remaining directives group a collection of mailboxes into a single stream. In our case we’ve put all mailboxes starting with A through all mailboxes starting with E in the single stream. F through J in the second stream, etc.
There is further tuning that can be done, moving sets of mailboxes from one stream to another to balance them out.
Only time will tell if this really helps. My testing indicates that using 5 streams, show that the 500-600 Kps slows to 300-400kps, but when multiplied by 5 streams it still looks like this might be an improvement.
Raid as backup? Only if you have your resume spell checked.
by Jesse on Sep.09, 2006, under Backup
So my cousin runs a small ISP in the Phoenix, Arizona area. Nothing special, a few hundred DSL and Dial-Up users, webmail, etc.Â
A couple of weeks ago, I was down visiting and he was wringing his hands over a technical “glitch.”  Apparently a drive in his raid set had failed and due to either a controller bug (rare but possible) or user error (much more common) the blank disk started rebuilding over the parity information.Â
He asked my advice – I told him easy. Let the raid set finish rebuilding and restore from your most recent full backup, then lay whatever incremental backups you have over that to bring it as close to POF (Point-of-Failure) as humanly possible. It’s not sophisticated enough of a system to expect that they would be doing any kind of transaction logging.
Apparently they weren’t doing backups at all. The feeling being that since they had their disks protected in a RAID configuration, that backups were a uselessly redundant exercise.
Let me explain why this is a bad idea – Garbage in – Garbage out.  RAID, whither it be Raid1 or Raid5 only gets you uptime. Because a corruption will replicate and spread from disk to disk before the user is aware there is a problem.
The same holds true for replication. If you’re replicating a harddrive offsite, a database corruption will replicate right along with the production data. The only exception being in cases of transactional replication. (such as Quest Software’s old “NetBase” product, which detects an invalid change and halts replciation before sending the change to the target system.)
So how to implement a backup? It’s easy. Follow the 2of3 rule (faster, cheaper, or smaller – pick any two) and you will have a backup solution that you can live with.
Or keep your resume spell-checked, because it’s not a matter of if, it’s a matter of when.