This week is a bigger geek milestone than last week. I’ve spent the past two days sitting next to a truly monstrous storage system. Each disk tray contains 16 750GB disks. Each rack contains 10 racks and two controllers. We have SIXTEEN racks. Theoretically, that’s 1.9 petabytes of raw space. In reality, depending on how we divvy it up, we’ll have somewhere between 1.2 and 1.5PB of usable space. Enh! Enh!
That’s more than 10 times bigger than the system from last week. Fortunately, I don’t have to bleed and lift heavy crap on this one. There are vendor people, local sysadmins, and very interested customers working feverishly just to get the damn thing wired up. We’re still testing basic connectivity and power. Next week is the OS install and filesystem build. It doesn’t go “live” for customers until December, so we’ve got time to “do it right.” A rare luxury in this world.
I’ve been writing crap-loads of documentation. My role is to capture how it’s being built, what the options were, and why we’re doing it the way we are. The hope is that I’ll produce the cheat sheet that serves as a guide through the various manuals diagrams, and pictures being produced by each of the sub-unit teams. Along the way I’m learning all about really big storage. For example, the core fibre switch currently tops out at 308 ports. We’ve got 8 connections per disk rack, so that’s 128 right there. Throw in the compute cluster (yeah, baby!) and the three 64 way SMP machines (YEAH!) … plus the GPU processors from Tesla (Unf! Unf!) and I’m basically walking around with vague geeky arousal the whole time I’m here. Anyway, I believe I was about to mention that we had to order another 48 ports of fibre overnighted … or else fail to use all the available ports on the systems.
On a sadder note, I saw one of the first clusters that I ever built, sitting turned off and unused. There have been troubles with power and storage, leading the users to wander off and find other ways to do their thing. This filled me with sadness, and a vague desire to stay up all night making the damn thing bulletproof again. I was stopped because I don’t have time to do the job I’m here for now … much less back out and make systems from three years ago work again.
Leave a Reply