HomeLab – The Next Generation

HomeLab – Before

This is the “Lab” I’ve used for probably the last 5 years or so. It’s a Dell Precision T7400 with 64G of PC5300F memory and Dual Xeon X5460 Quad-Core Processors, 8x 2T Seagate SATA Drives, 2x 500G Samsung EVO850 SSD disks, and runs VMWare ESXi 6.0 nicely.

Not a bad toy, in the grand scheme. Yeah, it’s *WAY* out of date…but when it comes to playing with VMWare, having a dumping ground for various “Test Cases” I have needed to use over the years.. etc..

For Example:

  • A 4-node Isilon cluster for testing and scripting..
  • A full Windows AD/Exchange infrastructure for backup testing.
  • EMC Control Center (back in the day) Infrastructure (5 hosts)
  • Various coding environments, CentOS7, Fedora28.
  • Windows 2012 Server running Veritas Volume Manager (for Migration Testing)

You get my drift… It’s great to have a “burn down” environment that is entirely within your own control. (And have the power-switch within reach for emergencies)

But there have been a few things I’ve not been able to play with.

The X5460 processor doesn’t support ESX6.5, so I’ve been hamstrung there. I reached a plateau as far as what I was able to test/play with as far as upgrades.

It’s all, still a single host. So anything involving VMotion, HA/DRS, VSAN, was beyond my abilities.

So I decided I needed an upgrade. A friend had what can only be described as an early “blade” enclosure floating around, and donated it to the cause. It’s a Dell model C6100 enclosure. 4 Blades, distinct cross-connects for disks, not a bad toy.

The first new member of my family.

These are great little blades. Dual E5540 (quad-core, 2.5ghz) processors, 32G of PC3-10600 RAM in each blade. I did some research, and found that these can be had for from between $250 (for a 2-blade unit) up through $1,000 (for a fully loaded 4-blade unit)

Dell C6100 – Side-by-Side/Over-Under configuration (top panel removed)

They’ll support up to 12 sticks of PC3-10600 RAM in each node, so if you wanted to, you could fill all 4 nodes with 96G of RAM each without really breaking the bank. (I bought 4 8G sticks for about $50 to fill out the last node)

So placement was my next issue. I don’t have a rack in my basement anymore (don’t judge), so I needed a place to put them that allowed me easy access, as well as keep them stable. I found a $25 wire-shelf from Lowes to do the trick nicely. Added a Dell PowerConnect 5324 managed gigabit switch to the rack as both my interconnect and “back end” switch (also, that I had lying around)

I also, because I had a specific purpose in mind, found a second C6100 on EBay so that I would have 8 nodes to play with.. and “mounted” them both in the rack with the network switch.

That’s 8 VMWare Nodes in 4U of rack space.

The Front-View – each enclosure has 12 3.5″ disk bays. I found 120G SSD disks on Amazon for $21/each for the Cache volumes, and repurposed my 8 2TB volumes from my old Lab box so that each node got one 2T volume. (The original box had 8x 500G disks in it, I re-distributed it so each node gets 1x 120G SSD, 1x 500G SATA, and 1x 2TB SATA.)

The back-end..  (Ugly, but functional)

So I carved the switch into 3 parts…sort of. The Blue links are the “Primary” network (vmnic0), used for data and external access. They’re in VLAN2. the White links are the “Storage” Back-end network (vmnic1), which is used for vmotion, HA/DRS, and VSAN, those are in VLAN100, which doesn’t have an uplink.

Same gigabit switch, so performance isn’t great at the moment, but it works.

The Black links are for IPMI/management. Put them (also) in VLAN2 so I can get to them from my desktop. Screwed up in my math, forgot that the 8×3 = 24 and I have a 24-port switch, which doesn’t allow for an uplink, so I removed one and will move things around as I need. I have a keyboard and mouse that I can move around to the units as necessary, so it’s not like that’s hyper-critical.

My 4-node VSAN cluster

So here – you see each node’s 2T SATA disk, and each node’s 120G SSD disk. This is the part I’m still learning about. It’s my understanding (encouraging anyone to correct me if I’m wrong) The SSD is used as a sort of ‘flash-cache’ Writes go to the local SSD disks and are then de-staged from there to the other nodes in the cluster. I still haven’t quite figured out how the back-end protection is handled… I just know there is some level of redundancy to guard against a single node failure. I’ll keep reading.

Before you go trying to hack in, 50micron.net is my internal network. No, I’m not stupid enough to make it available to the outside world. 😉

The goal, in all of this, is for me to have a platform where I can easily simulate a “production” vsan environment and try to see what breaks it, what works, what doesn’t, so that when someone asks me “Have you ever done xxxx” I can answer honestly. (I’ve never, in my career, told a customer something worked that I hadn’t actually seen work – something that drove my sales people nuts sometimes – but there is often a bit of a disconnect between marketing and reality, and the one thing I’ve got to my name, that no-one can take away, is my sense of ethics.)

So next steps… I need a better back-end. I’ve run storage on Gig-E before, and while it works in a pinch, when you don’t have other options, it isn’t a great option either. In looking around, trying to find a better back-end for the storage, i started thinking about using Infiniband… A little digging provided me the win. I’m waiting on a 32-port infiniband switch I found on eBay for $56, and 8 low-profile QLogic 7340 Infiniband adapters. It was a shot in the dark, but I think the 40Gbit back-end will be a big step up…and will be fun to see if I can get configured without breaking my current storage. 🙂

I’ll keep you in the loop.

Idiot-proofing…

We all know the term:  “Idiot-proofing”   Making a job or a task so fool proof any idiot can do it.

It seems I’m asked every day to “idiot-proof” some process or another.  Make it so the Jr. guys can handle it.  make it so they couldn’t break it if they tried.  Make it so we can hire 5 guys at 20% of your salary to do your job.

Here’s an idea.  How about instead of idiot-proofing every process, we stop hiring, for lack of a better word, idiots?

If you have to worry about the fact your unix engineers don’t know their way around VI, or your windows engineers are too frightened to type ‘regedit’ maybe the problem isn’t in the process, maybe it’s in the hiring.

I know everyone wants to save money…but staffing is the WRONG place to save money.  Mostly because it doesn’t work.  I have billed out more consulting hours trying to ‘idiot-proof’ a process than the companies have probably saved in 5 years of cheap labor.  And guess what?  Technology changes, and people who can’t learn and adapt to a new technology or situation, aren’t going to cut it, then you’re going to be calling me (or someone like me) in to do it all over again.

I don’t mind, I love having the work.  Being a storage guy who specializes on Disaster Recovery, Replication, Migration, and Automation has meant I haven’t had a single period of unemployment longer than a vacation since 1996.

But when you make me jump through hoops designed to keep the $25/hr “admins” from breaking the SAN, it gets old, and it slows me down.

Star Trek: Discovery (nerd-post)

So… My honest take on StarTrek:Discovery last night.

So far, so good. Compelling story, GREAT effects… Cast seems to be fleshing out nicely, though I anticipate some dramatic changes in the first couple of episodes.

Because of the delay in programming they were a little rushed at the end, which meant you really didn’t know when it ended (we got caught by surprise by the end credits)

A few Homages to previous Star Trek movies and series and the occasional tongue-in-cheek, snarky remark makes me look forward to the next episode – it really is everything I expect in a StarTrek.

And it’s good to have Star Trek back on the air..

Episodes 1&2 are available on CBS All Access – I haven’t seen Ep.2 yet so no spoilers.

If you haven’t signed up for All Access, I suggest it – I’m fairly certain that “a-la-carte” programming is the direction of television..

http://www.cbs.com/shows/star-trek-discovery/

 

On Backups…

A friend passed away recently… On going through his computer files, we found years worth of photos with a .ccc suffix… Ransomware… With two teenagers in the house, my biggest fear is some network replicating bug that takes down my entire network.

Apparently it hit him a while ago, and he didn’t tell me. (I was his IT guy, but he hated the idea that he might have made a mistake). Years of pictures, potentially important, lost, probably forever… (As the files in his user directory had been restored long after the computer in question had been wiped, there was no indication of which virus caused the problem.)

So what to do about backups.

The only totally secure system is one disconnected from the network, and powered off. The minute you connect *ANY* computer to the internet it becomes vulnerable. Sure there are steps you can take to prevent data-loss, anti-virus, a good firewall, etc. But eventually you’re probably going to run into a site with embedded malware on it, and it’s all over.

Personally, I like the idea of off-host, disconnected backups.

Every morning I wake up, stumble downstairs, get a cup of coffee and a bowl of cereal, and sit down for my morning staff meeting. (Where I find out the messes I’m going to have to clean up from the night before)

While I’m sitting there, I take the 4TB drive out of the removable bay and replace it with the OTHER 4TB drive I’ve got sitting on a shelf. One marked Odd, one marked Even. At Midnight every day, Acronis True Image kicks off a disk image backup of my boot drive, and important data drive. (My games drives and multimedia drives are ignored, because Steam, Origin, and iTunes pretty much covers those, it only takes bandwidth to recover those from the cloud, and those can’t be modified by my computer.

I like this way because A, I’m never out more than 24 hours worth of work, and B, I know that there is no virus on the planet that can infect a hard-drive sitting on a shelf in a plastic case. (Though I bet some idiot somewhere is trying to figure that one out)

So my RPO (Recovery Point Objective) is usually “within 24 hours”. RTO is about 5-6 hours to do a full restore. (I keep the data drive even though it’s synced to Office365 because we all know, corruption mirrors just as fast as good data.)

All that for a few hundred dollars in hard drives and a $25 removable drive bay for my PC, I’m protected.

So my question is this: What do you use for your home/home-office backups? Acronis is getting a bit long in the tooth, and I’m considering alternatives.

Side Projects…

My wife used to tell me that I’m the only person she knew who could relax after working all day in front of a computer, sitting in front of a computer.

She doesn’t know IT geeks well does she. 😉

She’s right.  I’ve recently re-discovered computer-gaming, and PC building in general…just for grins decided I was going to build myself a no-holds-barred monster PC for both Gaming and work-related stuff.

So what I ended up with:

CPU: Intel Core i7-5820K 3.3GHz 6-Core Processor
Motherboard: MSI X99A GAMING 7 ATX LGA2011-3 Motherboard
Memory: Crucial Ballistix Sport 16GB (2 x 8GB) DDR4-2400 Memory
Memory: Crucial Ballistix Sport 16GB (2 x 8GB) DDR4-2400 Memory
Memory: Crucial Ballistix Sport 16GB (2 x 8GB) DDR4-2400 Memory
Memory: Crucial Ballistix Sport 16GB (2 x 8GB) DDR4-2400 Memory
Storage: 3xSamsung EVO850 500GB 2.5″ Solid State Drives
Storage: 3xSeagate Barracuda 2TB 3.5″ 7200RPM Internal Hard Drive
Video Card: EVGA GeForce GTX 980 4GB Superclocked Video Card (2-Way SLI)
Video Card: EVGA GeForce GTX 980 4GB Superclocked Video Card (2-Way SLI)
Case: Nanoxia NXDS6B ATX Full Tower Case
Power Supply: EVGA 1300W 80+ Gold Certified Fully-Modular ATX Power Supply
Optical Drive: Sony BD-5300S Blu-Ray/DVD/CD Writer
Operating System: Microsoft Windows 10 Pro (64-bit)
Monitor: LG 23MP55HQ-P 60Hz 23.0″ Monitor
Monitor: LG 23MP55HQ-P 60Hz 23.0″ Monitor
Monitor: LG 23MP55HQ-P 60Hz 23.0″ Monitor
Keyboard: Logitech G910 Orion Spark Wired Gaming Keyboard
Mouse: Logitech G502 Wired Optical Mouse
Speakers: Logitech Z313 25W 2.1ch Speakers
UPS: APC BX1500G UPS

Custom Water Loop:
XS-PC Raystorm CPU Block
2x XP-PC Memory Waterblock
2x XS-PC Razor GTX980 Blocks / Blackplates
XSPC Photon 170 Reservoir D5 Pump Combo
XSPC 360mm Radiator
XSPC 280mm Radiator

 

So a couple of interesting bits.  I started out with a Single GTX980 and an All-In-One watercooler for the CPU only.  Then the upgrade bug hit and I took it the rest of the way.  Second GPU, customer watercooling loop (Including for some reason RAM Coolers, which are pretty but don’t do a lot) then adding a third GPU, then backing that out because 3-Way SLI isn’t as stable as I would have liked it to be, and most importantly, modifying the case to add a window which for some reason wasn’t available in the case I purchased.

The Nanoxia Deep Silence 6 is an amazing case.  all 1mm steel, weighs a ton, but quiet as hell.  I got sick of my office sounding like a server room so opted for more passive cooling options.  (The WC is quiet, just a couple of fans that all run at slow speed unless I’m gaming.)

End result is a computer that runs Rise of the Tomb Raider on Ultra graphics across a 5670×1080 “Surround” display without breaking 55 degrees.

Next to replace the 3x 23″ monitors with 3x 27″ 4K Monitors. 🙂  (MIght need the third GPU for that, good thing I kept it) 🙂

 

Is this thing on?

Surprised to find this blog still here.  It’s been…oh…a long time since I’ve ventured into the blogging world.  Work has kept me busy…going into year 4 of a six month contract and making all sorts of discoveries of late.

Discovery #1 – Brocade is still a third-rate switch company.  The hardware is fairly bulletproof, when it comes to reliability…  But they’re still married to the idea of “local-switching” as an alternative to building a backplane that’s worth a damn.  Sorry, I’ll take the Cisco MDS 9700 series any day of the week and twice on Sunday.

Discovery #2 – Well not a discovery really.  EMC Symmetrix (Symmetrix/VMAX) is still the flagship storage array.  If you put anything else in, you’re going cheap.  Not that there’s anything wrong with that, but it’s time to admit that that’s what you’re doing.

I say that having worked now with HP 3par – which I put as equivalent to the Clariion/VNX line in stature and performance, and HP XP7 (Hitachi G1000) which is higher end, but a bloody nightmare to manage.  (I don’t know if that’s Hitachi or just HP’s version of Hitachi that makes it a nightmare, I’ll have to wait until I get hands on an actual Hitachi to see)

Let me be clear, this is a personal preference.  Both arrays, 3par to some extent, and XP7 to greater extent, seem to be trying to steer people away from using the CLI to manage their arrays.  GUI’s are fine, but they don’t offer the level of control that you need to micromanage the hell out of your storage (as I like to)  And GUI’s also make scripting changes more difficult, and more prone to error.

I haven’t had a chance to really beat the daylights out of the XP7 yet but will in the months to come, I’ll report further as I discover.

 

Losing the cloud, Part 2.

6e3f3e411a404cb1b713982804d2baf0When I have an application that goes down (and face it, it does happen) I want the person responsible for getting it back up and running to be within choking distance.  And if he’s within choking distance the servers need to be as well, because otherwise he’s powerless to actually fix the problem, and I’m putting my business in the hands of someone paid minimum-wage (or only slightly better, night-time computer-operator wages) and his ability to go out and physically push a button (and god hope it’s the right one)

If you don’t hold your data, you don’t really own it.  If you don’t hold your data it can go away at any point.

Several years ago I was renting space in a datacenter up in Springfield – for a little web-hosting business I was using, but also so i could run some equipment for testing and training.  (the hosting almost paid for the space, so it wasn’t out of line)

Someone on the datacenter network had a PXE server running to install software.  On the public network

Well the hosting company, which was incompetent to it’s core, didn’t put their users in separate vlans like would normally be done in shared environments.

They also did “cloud application” hosting on crappy 1cpu, 1PS supermicro servers that came with PXEBoot enabled.

They lost a half-dozen servers before they realized what was going on.  I mean lost as in they PXE booted, wiped the drives, and started installing this custom application that was installed on another customers systems.  (Thankfully I had my environment firewalled off from the datacenter network, I was pretty safe)

That was customer data that was just GONE.  No backups, just missing servers.  Servers that they were paid to keep safe and secure.

This is obviously a worst-case-scenario…but obviously it does happen.

 

 

Losing the cloud… (Part 1)

f766c471a5d48e6571a72628843e4710There’s a Dilbert comic strip that I found hilarious a while back…

The hilarious part is there is that the chance of this happening in real life is non-zero.  Not that it is likely to happen, but it’s impossible, statistically speaking, to completely rule out the idea.

Now there are “big” cloud providers like AWS or…well…AWS.  The chances of your datacenter getting lost there is less, they’re not going to disappear, and they’re a pretty together company so the odds are in your favor.

But what if it were to happen?

Say I’m a small business (I am actually) and because I’m cheap, I want to outsource all of my datacenter operations, email, etc, to “Bobs clouds and stuff.”  Email, Database, Custom Widget Application, all of it.

The migration is easy, virtualize my systems and upload them right?  (Or the smarter way is to create new ones and migrate to them, but that’s a different story.)

But what if Bob decides that he’s done, that he’s going to shut everything down and run to aruba because his ex-wife is after him for 10 years of back-child support?  Or comes down with a rash no-one can identify and dies?

Ok, a little far-fetched, but you get the drift.  What’s a small business’ recourse if their cloud provider just folds?  Do you have any?  Can you pay the lawyers to fight out who owns what while you’re not making any money because your entire operation has been “turned off”?

It’s a horrifically overstated problem, but it brings out the potential downside to cloud computing.  You don’t actually have control.  You are putting your data, your livelihood, your company’s very being, in the hands of someone else who may or may-not care.

I’m a control freak.  Anyone who knows me or has tried unsuccessfully to have me committed in the past 20 years knows that.

I want control of my data.  I want it in my hot little hands.  I want to have tapes.  I want to know where they are and I want to have instant access to them at 2am if I wake up and find I’ve had a nightmare about all of my data being gone.

 

Competitive Marketing advice…

Last week I had to sit through one of those “competitive sales pitch” meetings.  You know, where Company A compares their product to Company B and of course, tries to make you draw the conclusion that Company A’s product is light-year ahead of the competition, even if it isn’t.

Now I’m under NDA, so i can’t disclose the brands, or in fact anything about the specs involved, but i can speak to the tone of the meeting.

It was mean, and spiteful, and nasty, and put me off Company A’s product entirely.  (Needless to say, we’re not buying any)

Listen.  I know every hardware vendor things their product is the best thing since sliced bread, (and really, what isn’t right?)  But if you’re going to do a comparison, make it about how great your product is, not how lousy your competitor’s is.  When you do that, you come off as petty, and bitter, and spiteful, and not very believable.

Show me the numbers.  And not the marketing numbers, the real numbers.  You say your array can do 1.5 million IOP/S, show me the breakdown.  You say your switch can do sub-microsecond switching, don’t forget to clarify that that’s only to adjacent ports, you say your backup software can backup a multi-terabyte system, show me that it can restore it as well.

And don’t show me slides with pictures of your parts and talk to me about how much better looking, prettier, well laid out, your hardware is.  It means nothing.  Functionality is everything.  Yes you’ve combined multiple redundant components into one chip, but now, if that one chip fails, you’re losing 8x the functionality.  (IE the only thing you’ve taken out of the system is the redundancy.)

I’m a big proponent of “you get what you pay for”  Especially in enterprise systems.  You show me a vendor who is selling their hardware for 10% of what another “comparable” vendor is, and my first question is “what is missing.”

That’s all.

</rant>

Cutting the ends off the roast…

When I used to teach, I always told the story of making the roast. It’s a parable, but it works.

As follows:

I was making a roast one day, and I cut the ends off it before i put it in the pan. My kid asks “Why did you cut the ends off the roast?”

“Because that’s how my mom did it.”

Curiosity got the better of me and I asked my mom “Mom, why do you cut the ends off a roast when you make it?”

“Because that’s how Grandma did it.”

Again, curiosity – I call my grandmother and ask HER: “Grandma, why do you cut the ends of the roast?”

“Oh, well my pan is too short.”

<head meets desk>

There is an inherent danger in doing the things the way they’ve always been done without giving thought to why. Situations change, technology evolves, and suddenly the “way you’ve always done it” bcomes the most inefficient way possible because some new method has come along, or even worse, becomes the WRONG way to do something because the underlying technology has changed.

“Hard” vs. “Soft” zoning comes to mind.  No-one in their right mind does hard-zoning anymore…Most vendors discourage it, a few won’t even support it.

But 15 years ago, it was best practice.  Things change, technology changes, so people MUST change along with it.