Many happenings in HPC ...

By joe

August 8, 2011 - 10 minutes read - 2024 words

I’ve been mostly heads down for the last month, very little time to work on posts. This is a good thing, as this has been mostly (new) business bits. We’ve got a range of new products we’ve been working on to address specific market segments, and have a number of nice new wins in a market segment we’ve been working on for a while. Working on more of course, and our core markets. Signing up more reseller partners, opening new geo’s, and working with some of the best partners I could have ever hoped for. I’ve been exceptionally busy, and adding a short “vacation” (e.g. working from a hotel room rather than my office :( ), a nasty illness (working from home rather than the office, on some mind altering meds) … Yeah not much time for posting. Still don’t have huge amounts of time, but I did want to touch on some of the things we’ve all heard of, as well as address an … well … unfortunate … article on HPCwire. First off, Dell bought Force10. This gives them (expensive) networking. Force10 is good. There is no doubt. But their gear is generally towards the high range of things, and I am not sure they are developing in-blade chassis switches. Not that they won’t but I think Dell might like to use them for this. This is interesting, as Dell tries to become a more vertically oriented player in order to counter HP, Cisco, IBM, and others. If I had to guess, I would have bet they would buy Arista. Force10 makes a little less sense to me, but maybe it was just too good of a deal to pass up.

So now Dell, after digesting Compellent, Force10, and a few others, is going to be vertically integrated. All it needs is a virtualization solution. It is supporting OpenStack, as are others. Could it buy Citrix? Possible. What does this mean to HPC? More “Dell” content from Dell HPC. Less dependency upon external folks for Dell HPC. Probably a move more towards 10GbE/40GbE and away from Infiniband for HPC. But Dell still doesn’t have an HPC storage play (Compellent is a Data Center play, not an HPC play) outside of their JBOD/RAID units. There is the “Lustre GUI” from Terascala, but that plays in a limited set of domains. The existing solution appears to be a bit slower than current market leaders, requiring more spindles to achieve the same performance as competitors. I’d look for Dell to make an acquisition in this space, as it leaves large market swaths unavailable to them (and again, Lustre doesn’t play in any but extremely narrow markets). All of the LSI Engenio and related chassis customers are smarting from Netapp’s purchase, and their decision to go direct for HPC storage. Could be a ripe opportunity to pick up one of the better ones (DDN). Of course FusionIO went public, and its done pretty well. There are several other major vendors (Virident, Texas Memory Systems, Stec, …) in this market. TMS has been around the longest. Virident is, IMO, ripe for acquisition. They are a partner of ours, we do work with them, I am not sharing any privileged information, just what I think based upon public information. We also are starting to work with TMS. I had asked out loud whether or not there was a real market for PCIe Flash a few months ago. I’ve been thinking hard about this, and I think the answer is a resounding yes, but probably more on the MLC side than the SLC side. Customers see SSD drive pricing dropping like a rock, and expect similar things out of the Flash (I differentiate the disk form factor from the PCI form factor by calling the former SSD, and the latter Flash … its a dubious distinction, but it helps to segment some market realities). They expect Flash pricing to drop rapidly as well. I do see some segmentation in this market … there are hobbyist like PCIe units, and there are professional units. The pro units have some seriously interesting engineering in their PCIe logic. The question is whether or not there is a significant difference in performance between the two. 10-20% performance difference will not justify a 4-8x price delta. What we’ve observed is that the SSD’s advertise 50-80k IOPs, but in reality, they deliver closer to 10k (current crop) for real test cases. Some of them, from large well know vendors, advertise 50k and perform closer to 2-5k (ahem!). We’ve also observed that the professional PCIe flash units promise 150+k IOPs, and … they deliver. Actually more than that. We are leveraging both of these technologies in a set of new products. One leverages our already unbeatable streaming performance on JackRabbit, and adds in a Flash or SSD , to elevate our IOP rates to ~50-100k for the SSD version, and >300k for the Flash version. Seamless to the user, and the cache is tunable. But we are also building all SSD and all Flash arrays. We have one 48 drive SSD array in the lab right now for a customer, and expect to have another order for this soon. Our measurements … well … the ones we haven’t released … have blown what remains of my hair, back. We build these units at some very nice price points for customers. This gets very interesting from an application perspective, as there are some extremely high IOP problems that we can attack with these units. Spinning rust isn’t dead. Not by a long shot. But SSD and Flash are changing the storage landscape rapidly. Expect to hear more about these changes over the next few months. Mellanox completed it acquisition of Voltaire and then pushed hard into converged 10/40GbE. This is getting very interesting. I like Arista switches, and I like HP Procurve for 1GbE. But along comes Mellanox, and builds a single switch we can put into a rack top to handle the high performance fabric. Arista is coming out with its own, as is Gnodal, and others. Now if we could converge 10/40GbE and IB into one 36 port top of rack switch, autoselecting the technology … hmmmm …. There have been other acquisitions of note, but I won’t cover them here. Finally, I note with sadness, a recent article in HPCwire that appeared to attack me, obliquely, for daring to point out the fractured and uncertain nature of Lustre over the last year. This is an unfortunate article at multiple levels. Lets just deal with the sad reality of Lustre until, call it March 2011. Prior to March 2011, some of us with live customers running Lustre, with customers asking us to support something whose upstream project that was in serious existential crisis … some of us had the temerity to point out the shaky ground upon which Lustre stood. Yeah, Oracle canceled Lustre support effectively en-masse, though they do seem to retain a small group to handle existing support contracts. And we had called this as a possible, though hopefully unlikely event. Check the blog last year. Several scenarios had been possible, we looked on cancelation as the worst of the lot, especially if Oracle retained the IP and refused to transfer it to another entity to continue the work. Which, sadly, is exactly what happened. Shooting the messenger gets you no extra points. Between the end of the year and the beginning of the 2011 LUG, 3 groups sprung up with differing objectives and focus. There really wasn’t as much coordination as people would have liked (I spoke to quite a few people in the community about this). Which meant, likely, 3 different branches of a fork going forward. Again, we called it, and noted that the best possible case was for all 3 groups to sit down, join forces, and present a single face to the world. This is what happened around LUG 2011 time. Again, note that shooting the messenger does little to gain you points. Fast forward to today. It would seem that Lustre is in a good spot. It forked. Yeah, everyone bent over backward to try to prevent using that word, and I called it, euphemistically, a spork … but the code base at Oracle, which owns the IP up to Dec 2011, and the code base in the Whamcloud servers, now differs. That is a fork, like it or not. And even more importantly than this, its a damn good thing it did. See the notes on shooting the messenger again. The HPCwire article obliquely and indirectly fired upon this messenger. I could level some pretty harsh fire back if I wished, but there is no reason to. I was right about every single point I raised, and more to the point, history has done a pretty good job of agreeing with me. So taking out hostility upon me … maybe its not such a good idea to level fire at your natural allies? Just a thought. I will point out something pretty obvious to me, and our customer base. There is a need for a very good, well integrated, very scalable file system, in general. One that is easy to use, easy to administer, scales in all directions, is safe to use, handles errors competently, etc. This isn’t just an HPC problem. But the question I have is, can Lustre (or something named Lustre) become this? Or will something else eclipse it? Right now, Lustre’s use case is quite limited. It is, I am sorry for being blunt, hard to use. It is hard to deal with its more esoteric failure modes, and some of these failure modes crop up with regularity. It has no real conception of how to handle a failure in a component … no replication of critical elements, or real fault tolerance built in. It sits atop ext4 (probably not the wisest move in retrospect). Short version is that the Lustre of today has issues, some are implementation issues (Jeff Darcy has a brilliant deconstruction of them on his Canned Platypus site, google for it). And it isn’t the only choice out there. Not that others are perfect. We have had … well … maddening failures … in pretty much every system we’ve deployed internally or at customer sites. Not that the other choices are perfect. They aren’t. Some range between pretty interesting, to less interesting. But there are choices out there. The choices that win in the business marketspace will be the ones that address the really big data issues there. As these are HPC problems. Thats systems like Gluster, Hadoop, and a number of others. But this said, there are choices. I think the issue is in part though that the author of unfortunate article is CEO of a company completely committed to only one of these choices. We are, by necessity, agnostic. We use several of the choices to deliver solutions to customers, and have the scars to prove it. Sometimes from a product that sorta-kinda worked, that we had to go back and fix. Sometimes from a product set that did not live up to customers expectations. When a file system goes down on one of our units, we get concerned, as we are usually on the hook to help get it going again (assuming the customer purchased support). This is true regardless of which file system it is. Yeah the article was shoot/kick the messenger for having the temerity to note what was effectively, the very obvious. Sad. But, as I note in the statement of this blog … I call em, as I see em. And history seems to show I was dead on right, on every point I made. So yelling at me serves no real purpose. Other than alienating a natural ally. I am much more positive about Lustre’s future now than I was in the past. With the fork, there is a real leadership for Lustre going forward. Which is important, as there are real issues to work through, real problems to address. Just don’t shoot the messengers for pointing out the obvious.