Doing what we are passionate about

By joe

April 2, 2014 - 7 minutes read - 1344 words

I am lucky. I fully admit this. There are people out there whom will tell you that its pure skill that they have been in business and been successful for a long time. Others will admit luck is part of it, but will again, pat themselves on the back for their intestinal fortitude. Few will say “I am lucky”. Which is a shame, as luck, timing (which you can never really, truly, control), and any number of other factors really are critical to one being able to have the luxury of doing what we are doing. I’ve been, and remain, a speed demon. No, not the way I drive … but the way we design and build systems. We’ve shown again and again how well designed and implemented systems can demolish more general systems which aren’t designed to the problems at hand. I enjoy this, and have been doing things like this since the mid 1980’s in one form or the other. Optimizing code, rebuilding hardware (yeah, I put NEC V20’s or V30’s in my IBM PCs for “free” speedup). Tweaking OSes. Tuning drivers. Though our current linux kernel patch sets are smaller than they’ve been in the past, they are still important. Tuning apps … These are the things we’re passionate about. Building hellaciously fast hardware and software stacks. We’ve done great things over the last 12.7 years. Really incredible things. Things that no small company, ought to be able to do. We’ve come a long way from being a one man shop operating out of my basement. But we’ve never lost that passion, the drive to go faster. Or bigger. If anything, that drive has gotten worse. We see data growth rates driving 60+% growth per year in capacities. We see data motion, which I referred to as being “hard” in 2002, as being one of the most critical factors going forward in any storage, computing, and networking system. I had my eye on that exponential curve 12 years ago, worried about when it would start flattening out. It hasn’t yet. It has to some time soon though. Because we have to store and process all that data, that means we have to move all that data. That realization ~10 years ago let me to explore cluster and scale out file systems … ones without single central points of information flow. Ones where when you add capacity you add bandwidth. These are the only systems that matter now, as the filer/array model is a rapidly declining legacy platform … suitable for replacement, not refresh. Its that way, as you can’t scale this type of design out horizontally, and that simply doesn’t work for gargantuan data volumes. Which must be scaled out in that manner. Object storage is a great case-in-point for this. It was the realization 8 years ago that RAID builds and rebuilds were highly problematic for reliability due to potential correlated device failures, and recomputing block parity/CRCs on unused blocks made no sense (yet this is what all hardware RAID does). This is what led us to research FEC methods, and develop new ways of thinking about these things. We can implement very space and performance efficient designs leveraging smart algorithms and acceleration technologies. It was the realization 10 years ago that HPC had, to a degree, hit a clock speed wall, that drove us to look at accelerators, what I called APU (Accelerator Processor Units) as a riff on CPU, back then. AMD appropriated the name (which is fine as they paid us to write white papers for them, where I used the term), and slightly changed its meaning. But the concept was that a very fast processing system, designed to perform one type of tasks very well, could do an outstanding job offloading those tasks from the CPU. You needed a powerful software stack, and as I had learned developing CT-BLAST (SGI’s GenomeCluster product), you need to make it drop-in easy to deploy. It has to be simply faster (hey, thats our tagline!) which means you can’t make it hard to use … it has to be bloody easy, and demonstrably faster. We got there with accelerators for specific problem sets, but all our attempts (pre-2006) to raise capital to build them and the ecosystem, failed. No one cared about this space we were told, and accelerators were unimportant. But we were passionate about this. And we kept banging on this until we could see that no one really was interested … in funding it. Today, 2014, where we predicted accelerators would hit about a 5-10% penetration on the computing market … we could be low in that estimate. Everything we’ve done, we’ve done because we’ve believed our future is data and processing rich. Moving data is hard, so you need to have that motion occur in parallel, and simultaneously be as local as possible. Computing on this data is hard and often slow, so you need acceleration technology. Extracting useful insights is demanding, so we’ve developed very high performance appliances focused upon enabling people to seamlessly use massive quantities of data, very very rapidly for their computations. Its because performance is an enabling technology. It really is. Its also a green technology. I’ll explain in a moment. For an enabling technology, this is something that opens up completely new possibilities, that would have been simply out of reach in the past. For any reason … cost prohibitive, performance restricted, etc. Accelerated processing enables many more and more efficient processor cycles per unit time. So it reduces the energy cost per cycle (hey, doesn’t that help make it greener?). This makes computationally expensive problems often more tractable. Heck, if you have the right gear, it make things that could be hard and makes them possible as part of day to day usage, and not merely for special cases. Its these things that change the landscape. We gave examples of what we could do with one of the technologies to a few customers, and most of them went away thinking “meh”. Every one of them came back later on and used words to the effect of “you can really do that? That would change everything for us …”. Really … they are right. It does change everything. I wasn’t kidding about greener. I didn’t mention costs, but its fairly trivial to show that you need (often far) less hardware to provide the same performance with a well designed computing and storage system. Which lowers your acquisition and TCOs. But for a fixed cost, you get more performance. So you can play that either way. And again, its that efficiency on a per processor cycle basis that drives the relative green-ness. What? Joe is concerned about “green”? Damn right. I want all of my systems to use less power, and use what they use more efficiently. I want to be able to pack my systems denser, use less cooling for the same performance/capacity as I’ve done before. Its an “elegance” in engineering bias on my part, I like more efficient systems. And more efficient systems should cost less to deploy over their lives. Not always (CF lights anyone? Good concept, terrible implementation, and the added Hg? Not so smart). Why all of this? Well, past is prologue as Shakespeare said. All of these bits are needed for the next acts. And they will be a doozy. The first hints of them are going to be discussed this year. Never mind the next set of benchmarks and other bits. We’ll get those done, and we’ll probably set a few more records. Guessing at this … :D But what I am talking about is enabling something … Its far more than fast storage, fast computing, etc. This isn’t a solution looking for a problem. This is much different. None of this would have been possible without that passion to build something that enables people to think and work differently. Its the passion that matters. Building things, that people can use to solve effectively intractable problems … yeah, that makes us feel good.