General
The Clariion@Home Project
by Jesse on Jul.06, 2009, under General
Well – it’s been interesting. if i’ve been remiss in my postings of late it’s because I haven’t had time to scratch, let alone actually spend any time working on non-work stuff… (Which, sadly, this qualifies as since no-one is paying me to do this)
It’s in – it’s up – it’s running. After literally years of working at getting one, I finally have a *REAL* Clariion running in my basement-come-datacenter. As an added bonus I now have a Celerra running in here as well.
The Clariion I ended up with started out life as an NS502 with 15x 73G Fibrechannel drives and 15x 500G SATA drives, for an almost 8 TB of storage. Now some clever ebaying got me 10 146G FC drives to upgrade the non-vault drives so now I’m almost 9TB of storage.
It was a fun project. It took a bit longer than I expected (silly me, shoulda known) and didn’t go exactly as I’d planned (also shoulda known.)
Started with the rack. Had to get the PowerVault 660F out. So I put the ESX servers in stand-by, vmotioned everything to one server, powered it off an moved it to my happy bakers rack in the corner.

The old PV660F That I used to run.
So I took it out, direct attached the PowerVault to one ESX server and ran the site off it for about a week while I rebuilt the rack. (You’ll know when this happened, because if you went to the site during this time it probably just finished loading)
The new system started out as an NS502, which as some will know is a Celerra with a “captured” Clariion CX500 back-end.
Well I have nothing against captivity as a practice, but if I’m going to run a SAN in my house it’s damned sure going to be a SAN, not some NAS pretending it’s a SAN. (I believe my thoughts on iSCSI are fairly well known?)
So a few things I learned. The NS502 boxes rarely get upgraded beyond the code they ship with, because most customers are of the ‘if it ain’t broke don’t fix it’ mentality. (Which in some ways I approve of)
I learned that when you update the Clariion FLARE code from FLARE19 to FLARE26, you *WILL* kill the attached Celerra unless it was updated ahead of time.
I also learned that you can’t simply replace the bank-end cables (between the celerra and ‘captured’ clariion) with fibre SFP’s and hooke cables up to the switch. I’m not entirely sure why, but the optical SFP’s I connected caused the Storage Processors to hang on boot. The SP would boot fine with them not in place, and then continue to function perfectly when they were plugged in after the fact, but they would not boot. So I replaced the Optical SFP’s and cables with copper SFP–>SFP cables, and lo-and-behold everything works.
The "Finished" Product
Long story short, zone the switch, install the Celerra, and it’s now a CX500 with an NS502G attached, and my 3-node VMWare ESX cluster works fine and dandy on the pair of 500G luns I provisioned for it on the 146G FibreChannel drives 8+1 Raid-5 because speed isn’t really an issue. The SATA drives were carved into two 6+1 raid-groups with a hotspare and provisioned almost entirely to the Celerra – I kept two 500G luns in reserve in case I should need to move the ESX storage off the 146G drives for any reason.
Now the cool part. The webcontent for this (and other) sites is on the Celerra, NFS mounted to this server. I have SnapSure enabled keeping 30 days of checkpoints in case I should manage to delete something important and forget about it for weeks at a time. The content (MySQL database) is still on the VMWare virtual disks, I’m not sure how moving the database to NAS as well would affect the site, and as I’m not horrifically short on space I’m not horribly worried about it.
So there we are. Now all I need is a place to replicate and a government grant to help pay for power and all will be good.
On Security….
by Jesse on Mar.25, 2009, under Best Practices, General, Job Market, Security
Security is a good thing….until it isn’t.
Security isn’t a good then when it interferes needlessly with productivity. By needlessly I mean to say when you don’t get the security you’re looking for but instead make it harder for your people to do their job than needs to be.
A few examples:
1. Company “A” hires consultants to perform day-to-day tasks. Company “A” then refuses to give them access to the troubleshooting tools and software downloads they are supposed to be supporting.
2. Company “B” decides that it’s employees can’t be trusted. (If you can’t trust an employee, why are they an employee?) Company “B” then decides to lock down PC workstations so that *NO* software can be installed or removed by said employee. Company “B” instructs their helpdesk to ignore all requests for installation of needed software.
3. Company “C” requires an contractor to be on-call for 24×7 support. Company “C” refuses to grant said contractor remote access to support the equipment he’s on-call to support, forcing a 45 minute drive in the event of an emergency. Company “C” then reams the contractor for not being timely in his/her support.
4. (My Favourite) Company “D” gets *VERY* creative with Windows Group Policies on a workstation, rendering said workstation a paperweight. Company “D” neglects to block access to the system BIOS and allows booting from USB only to allow any user to introduce any unlocked/unguarded operating system in the world into their environment by virtue of a thumbdrive.
In my career, I’ve been said employee/contractor in every one of these instances.
(Just an aside - my favorite gotcha came from watching a help-desk guy come in and disable the USB ports in the bios of a system only to be rudely reminded that the keyboard and mouse are USB (and that they don’t make PS2 connections for them any longer))
My point is this: If you’re going to implement security make sure it’s effective security that also allows your employees to do their jobs.
If it’s not effective security – IE going to show a security benefit (that benefit being a quantifiable improvement in the security of your data or the stability of your environment) don’t bother with it – you do nothing but alienate the people you hire to work for you and make them want to go elsewhere.
Contrary to popular belief, there are still elsewheres to go.
Support Calls
by Jesse on Mar.24, 2009, under General
So I had a failure – Tuesday night last week, which caused me (forced me, really) to write this post:
Now in the grand scheme of things, it was probably a bit snarky, but at 4am I think I legally am not responsible for my actions.
But the bottom line is on Tuesday at about 5pm I called to open a hardware case. Hardware. The high-school kid who answers the phone routes me into the Software group. Correct me if I’m wrong, but that’s kindof the opposite of hardware. (And no, if this kid was a college graduate, please god tell me what college he graduated from so i can forbid my son from going there)
2 hour wait for the call-back, 2 hours of trading email back and forth before I convince him this is a hardware case and to please route it to the hardware people.
He never does the transfer.
In the meantime, it’s about 2am and I go fix the bloody problem myself, restart my change scripts and all is happy. I got home about 4 that morning after an 18 hour day.
Leap forward to Thursday night. Same process, different array, same failure. Now it’s 10pm and I call in and STRESS to the triage guy that this is a HARDWARE case. He routes me to hardware. 75 minutes to call-back, 2 hours to fix the problem and step the script through to the end, plus I get handy knowledge like root-cause analysis and a set of steps to ensure it doesn’t happen again.
My point is this. So many problem can be avoided if you simply LISTEN to the person who is calling in. Assess their skill level and if someone asks to be transferred to a specific group (ESPECIALLY if he knows the actual name of the group he wants to be transferred to – means he’s done this before)
Long and the short of it was on Tuesday, a failed drive on the target array was failed when the script started. This caused it to error out because it saw invalid tracks existing between the mirrors.
Same thing happened on Thursday, different drive, different symm, and out of 4,000 volumes it happened to hit a volume I was working with both times.
I should play the lottery… You’ll know if I win too because my blackberry doesn’t work on the beach in Cozumel.
Reverse Darwinism in Profressional Services
by Jesse on Feb.16, 2009, under General
This started off as a comment to “A bail-out for little-old me?” But on careful consideration I think the subject warranted a post of it’s own.
If a company is experiencing a slow-down in sales, especially PS work suffers. (Companies still need to buy the storage, but may be more likely to handle more of the installation/configuration themselves to save money) So what ends up happening is that the field-engineers will find themselves short on work first.
Most companies are loathe to give up their sales staff too early, simply because the only way they’re going to stay in business is to continue to try to sell. They all think that when they book more work they can always re-hire their engineers.
Here is the problem with this theory – they act like we’re just sitting around on our thumbs waiting for their business to pick up.
No, we’re out hustling new work, and the smarter/better / more well known of us are getting new jobs, despite the down-turn. So when their business DOES finally turn around (and it will, eventually) they’ll find the best and the brightest have locked in new engagements, and the ones that they have to choose from are the ones that haven’t found new work for a reason.
It’s reverse-Darwinism at work. The company is forced to hire from what amounts to the bottom 50% of the people they laid off two or three months earlier. They may save a few bucks, but their customer satisfaction numbers go through the floor. I’ve been hearing customer-service complaint stories throughout this downturn, and they’ve gotten quite a lot louder in the past 6-8 weeks.
When the stockholders, most of whom couldn’t navigate their E-Trade account without help, start making technical decisions for a technical company, the end of the road is not far away.
Now I fully understand a company laying people off to keep from going under, but that’s what it should be. A last ditch effort.
If you as a company are laying people off to protect profit margins and dividend payments, you are RESPONOSIBLE for the financial crisis we find ourselves in.
Mind you – I’m not targeting any particular organization with this story. I’m sure Every Major Corporation has this problem.
Three simple words….
by Jesse on Feb.12, 2009, under General
For anyone new to the Cisco world, whether it be fibrechannel MDS switches or IOS Network switches. Remember the three magic words:
‘copy run start’
Missed that on my home switch…apparently about 4 months and numerous vlan changes ago. Today at about 10am we had a powerfailure, first one long enough to cause the rack to go dark.
When it came back up, no network. Took me a while to realize that I had re-partitioned the switch a while ago and it didn’t stick.
so ‘copy run start’
Good night.
Good luck.
A bail-out for little-old me?
by Jesse on Feb.10, 2009, under General
It’s like being strapped into the gurney with the I.V. line in your arm and getting the last minute phone-call from the governor.
This week the federal government bailed me and my little almost-a-real-company out.
Several years ago I worked for a particular federal client. Data migration, switch migration, cable remediation, and a number of other projects.
Apparently it went well because for years they’ve been trying to get me back in and it just came down the wire that I’ve just been awarded a 1-year contract to go in as their new storage admin.
So the company is saved, or at least has been granted a one-year stay of execution.
This was, in very large part, the reason I turned down the Pre-Sales gig I was offered last month, among others. It’s also the reason the EMC Reseller that I was consulting through did a wonderful job keeping me on throughout the holidays, something I will always remember. (As in – some day you’re going to come to me for a favor…. and I will grant this favor…)
So wish me luck in my new endeavors.

Internal Celerra Migration
by Jesse on Feb.03, 2009, under General
I got sucked into this job, and the only benefit of it is that it’s in California, which, when it comes down to it is not a bad place to be when it’s snowing back home. Bygones.
Anyway, my job, whether I want to or not, is to figure out a way to move about 4 Terabytes of Celerra data from one set of disks (6+1 R5 -7.2KSATA) to newer, faster disks (4+1R5-15KFC)
And the rub is, that I have to do it online.
This is one of those places where I hate the celerra. Found this great primus article, (emc144545 if you’re interested) that states quite unequivocally that you can only use the back-end Clariion lun migration if you are migrating to an identical raid group, which to me, negates the reasoning for doing it in the first place.
Identical raid group. If it’s SATA, the target has to be SATA. If it’s 4+1 Raid-5 the target MUST be 4+1 Raid-5.
Near as I can figure, and this is not stated clearly in the article, that it has to do with how the Celerra builds it’s raid-pools. Since you USUALLY build filesystems and set them to expand into a raid-pool, my guess is that changing the make-up of the disks underneath the filesystem screws up the pool database.
Come on guys, this should be an easy fix. (This and the ability to easily shrink a filesystem would be nice) When a customer makes one of those mistakes, you know, buying the wrong disks from the outset because they’re focused on capacity and forget a little thing called performance, the hardware should offer an easy way to fix this.
In the case of the customer I’m working on now, Clariion LUN migration was out because of the disk-mark issue, the standard SecureCopy is out because minimal downtime is allowed.
Long and the short of it is I’m getting ready to do an internal CDMS migration. Now anyone who has used CDMS knows it’s not the fastest product in the world. You also know it can be maddening because one of the things you *STILL* can’t see is what percentage complete the migration is.
But as far as technology goes, the usefulness of it is awe inspiring.
CDMS is a “Copy-On-Access” file-level clone. Essentially it builds a duplicate I-Node table pointing to the old files, and presents this to the client. Browsing the new directory structure shows you all of the filesystem structure exactly as it is on the old source box. WHen you attempt the access a file for the first time, it then copies that file from the old filesystem to the new file system and then passes it to the end-client.
Now this is where it’s a pain. it’s a slow process, depending on the speed of the source system/network/etc you can increase your initial access time 20-fold. (subsequent accesses come from the new disks, so it’s a one-time-hit.)
So tomorrow, I start moving almost 4TB this way. Running 32 threads internally (this is inter-Celerra migration) it should run fairly fast, depending on how fast the network stack can process it.
To EMC – fix the disk-mark database to allow a celerra lun to be migrated without peanalty (or at least with easy-to-moderate reconfiguration) You’ll sell more disks because people won’t feel married to the disks they’ve got, or worry about committing to a disk-type if they’re not absolutely sure of the perofrmance numbers of it.
To all Sales people. Don’t sell SATA disks for production-level applications. They don’t work. (see my next post, SATA, SAS, and Fibrechannel)
The best Ubuntu yet!
by Jesse on Jan.29, 2009, under General
Just installed Ubuntu 8.10 on my laptop, upgraded from 8.04 – and I’m telling you. I think I’m home.
So far, EVERYTHING works, right out of the box. Including, with a very minor modification, my internal Sprint CDMA wireless card.
I think I posted about this before. When I ordered my (then) new Dell D620 I ordered it with an internal (mini-PCI) SprintPCS CDMA wireless card. This is a big boost for me because I *HATE* having things sticking out of my laptop when I’m using it, IE PCMCIA cards, etc.
But to date, the only operating system I could use this card with without a major amount of hassle, was Windows. And we all know how I feel about that.
So on a whim, I tried 8.10 and I have to say I’m IMPRESSED.
The only customization I had to make was to include the device specific data for the card in /etc/modprobe.d/options in order for the native usb-to-serial driver to take over.
For the faint of heart, here are the specifics.
Find the vendor and product data for the card in question – this can sometimes be trial and error, but isn’t too difficult in most cases. (Mine was pretty straight forward.).
# lsusb
Bus 005 Device 006: ID 0b97:7762 O2 Micro, Inc. Oz776 SmartCard Reader
Bus 005 Device 005: ID 413c:8103 Dell Computer Corp. Wireless 350 Bluetooth
Bus 005 Device 004: ID 0b97:7761 O2 Micro, Inc. Oz776 1.1 Hub
Bus 005 Device 003: ID 413c:8134 Dell Computer Corp. Wireless 5 720 Sprint Mobile Broadband (EVDO Rev-A) Minicard Status Port
Bus 005 Device 002: ID 413c:a005 Dell Computer Corp. Internal 2.0 Hub
Then add the entry to /etc/modprobe.d/options to enable the /dev/ttyUSB0 device.
# cat /etc/modprobe.d/options
…..
# usbserial mod for Sprint CDMA card.
options usbserial vendor=0x413c product=0×8134
And we’re good.
The card should automatically be detected and show up in the drop-down under the network icon in the upper-right corner of the screen. (Or wherever you keep your network icon)
Enjoy the freedom from Microsoft’s oppression!!!!
Powerlink is down – Thanks EMC!
by Jesse on Jan.27, 2009, under General
Service Unavailable – Zero size object
The server is temporarily unable to service your request. Please try again later.
Reference #15.8e7ffea5.1233071610.2502e4
Powerlink has been up and down for the past few days – fun stuff.
The hilarious part is…there is a powerlink article saying how powerlink is experiencing issues. Isn’t that like sending an email out saying that email is down?
This wouldn’t be such a problem if so many products didn’t depend on the licensing portal. Forcing customers to go through this extra step does NOT make life easy on anyone involved.
Jobs’ leaving Apple?
by Jesse on Jan.15, 2009, under General
A sad day for Apple. This gives me cause for concern for the health of Apple as a whole, not just the “core” (no pun intended, Steve Jobs is indeed central to Apple’s recent successes.)
Pancreatic cancer is nothing to sneeze at. It kills people – Maybe not right away, but inevitably it will. I’ve often said that Apple doesn’t seem to have a strategy for the inevitable departure of Steve Jobs, and I am maintaining that position.
I wish Mr. Jobs well… But more realistically I hope for him a painless departure.
Mr. Jobs sent this letter to his employees:
Team,
I am sure all of you saw my letter last week sharing something very personal with the Apple community. Unfortunately, the curiosity over my personal health continues to be a distraction not only for me and my family, but everyone else at Apple as well. In addition, during the past week I have learned that my health-related issues are more complex than I originally thought.
In order to take myself out of the limelight and focus on my health, and to allow everyone at Apple to focus on delivering extraordinary products, I have decided to take a medical leave of absence until the end of June.
I have asked Tim Cook to be responsible for Apple’s day to day operations, and I know he and the rest of the executive management team will do a great job. As CEO, I plan to remain involved in major strategic decisions while I am out. Our board of directors fully supports this plan.
I look forward to seeing all of you this summer.
Steve