SC'05 wrap up

By joe

December 3, 2005 - 13 minutes read - 2732 words

This took me a while to post in part due to heavy year end load, but also, that I wanted to think through what I did see, and what I didn’t. It is important in many processes to take a moment, step back from where you are, and try to assemble the bigger picture of the situation. This introspection can yield invaluable insights. Failing to do it can blind you to what was there, with you focusing mostly on the minutae. If you read the livewire blogs, you can see what I mean. Big picture time. A customer asked me a few days ago whether I saw anything exciting there. Absolutely. He then asked me if there was anything there that will get him orders of magnitued (OOM) better performance on his codes with minimal effort. Unfortunately, we didn’t see such a thing. Basically there was litte at this conference that one might call game changing, earth shattering, or anything like that. You had vendors with row after row after row of similar machines. Sure, some are in nice non-rectangular cases. You had booth after booth of similar software functionality. This is not a slap in the faces of the hardware and software vendors, just noting that the relative differentiation in terms of a few tenths to hundreths of microseconds of network latency is not what one might call earth shattering, though your marketing-bots might take issue with this. Some of the more original vendors, that is, the vendors who built somewhat more original machines, were there in force, though less so than before, and now with some competition. Some of us have been talking about the era of personal supercomputing for the last 6+ years (I have found it in slides I have given at presentations back in 1998). Some of us have been talking about the era of disposable computing (vis a vis low cost compute nodes) for quite a while as well. These memes were growing at SC'05. Personal supercomputing is exciting. Providing a user with a box, a card, a chip, whatever, that gets them 100x type performance at a reasonable cost is a game changer from what every one of our customers tell us. What many vendors don’t get is that you cannot charge linearly for the linear performance difference. IBM and others will gladly sell huge Blue Gene boxes, for millions of dollars. They may even be able to give you 1000x performance, for something that looks like 1000x the price of a node. Is this game changing? Not as far as we can see. It still costs you 1000x for the 1000x performance. Other vendors which are augering in for a pretty rough landing were there, in far less force than in the past. One past employer was there; they have had a long history of chosing the wrong horses to back, and making bad decisions in the process. They are paying a heavy price for this, and they are probably not long for this world. Bad decisions are expensive and the SC market in general is very unforgiving. Also, there has been much hype in recent months about FPGAs. There is a nice site named OpenFPGA.org with lots of nice info. Much of the hype glosses over some important details. First, they are very hard to program, as you are designing circuits, and not writing code. This is changing, but the C to Verilog compilers are still not even close to performing as well as manual Verilog-based circuit design. Each new code port could take many weeks/several months. This does not lend itself to dealing with rapidly changing programs, or many programs for that matter. Second, and this one is a doozy, the bitfiles are not compatible across the same vendors' different FPGAs, never mind another vendors FPGAs. Moreover, the board design issues will impact how the circuit needs to be built, so even with the same FPGA on a different board design, this will require a “port”. Third, these boards are not cheap. This is a volume issue in part, though 2-4k$US per FPGA doesn’t help either. We have seen boards that at the low end are in the 6-7k$US region, and at the high end, over 20k$. You can always design your own, and then you need to amortize the design cost across many of them. Thats the downside. The upside is that a dedicated circuit can be 10-100x faster than a general purpose CPU on specific calculations. Put a few of them on a board, as they consume usually under 20W at peak, and you can have quite a specialized computing platform. This is a really good upside. If you can mitigate the problems then this could be very interesting. More on this in a minute. The hype has been how many companies are going FPGA routes. We did see lots of FPGA board vendors. What we did see were lots of building block tools companies. We didn’t see lots of solution companies using these things, though we did see one or two. This was interesting. The hype is about the potential, but we don’t see people doing much to tap into that potential. This is a shame, as the upside is huge. Microsoft was there. No, really, they were. Were they talking about HPC? Not sure, as they had a collection of vendors in their booth. I won’t go into all of the details or vendors, but the sense I got was one of a reach for validation. That is, Microsoft wanted to have these folks validate their vision of HPC. What was that vision? I don’t know. And if you are a marketing manager at Microsoft, you need to be quite worried that I don’t know. I did see Bill Gates talk, and I liked it. Mr. Gates was an entrepreneur at a small company with an idea and vision at one point in time. He took that company from a tiny entity, to a world dominating monopoly. Regardless of what you think of his companies products and tactics, you have gotta admire him for getting his company to where it is. His talk made the point of making high performance computing cycles easily accessible. This is exactly correct. Access to high performance computing cycles needs to be simple and transparent. It also needs to be cross platform. And standards based (not necessarily defacto standards, or vendor imposed and protected proprietary standards based). But was that Microsoft’s message? I am not sure. With all those partners in their booths, they were trying to make sure people got the message that they can run applications. But we already knew that. The question is, can we use Microsoft tools to build high performance computing systems. This is less well known or understood. I suppose this is what WCC2003 is all about. If it can make it as easy and low cost as Linux, this could be a good thing. If the licensing model for a 128 node cluster starts with “you must pay $X per node, and $y/client connecting” then this is a non-starter. I like to tell people that things designed to fail often do. That model is designed to fail. Hopefully, hopefully, this is not the model they are going to market with. It is hard to get control of the TCO if you start by blowing up the acquisition cost. Now Microsoft is on a buying binge, stocking their warchest with high powered people. They just snagged Dr Burton Smith. They have had Gordon Bell for a while. They will likely grab a few more luminaries. Grabbing luminaries does not an HPC strategy make. Giving these folks the tools and the mandates to do good things, and specifically giving them lattitude could do wonders. Will that happen? Not sure. Nothing in life is guaranteed, and more than one company has snatched defeat from the jaws of victory with mis-steps. I am not sure if Microsoft can get out of the way of these folks to give them enough room to do good for HPC (and ultimately Microsoft). Will Microsoft be a factor in HPC? Probably in some form or the other. They are the 8000 lb gorilla in the room you cannot and should not ignore. I disagree with colleagues, friends, and other Pundits when they suggest that Microsoft has no chance in this space. I remember hearing that years ago with Unix. Remember Unix? It was around in the data centers before Microsoft tools displaced it. Were the Microsoft tools better? Didn’t matter then. Remember our friend George Santayana. It would be … unwise … to discount Microsoft in this space. You would do so at your own peril. It makes more sense in my opinion to constructively engage them, to get them to do right in this market, than it does to ignore and deride them. Of course an issue is that Microsoft has in the past demonstrated tremendous hubrus and contempt for all things non-Microsoft. This may result in them trying to force their particular world view on the market. That would be a shame, and like other things (Microsoft Bob) would not likely be a spectacular success. Customers and end users vote with their buying habits. Right now they are buying into Linux clusters in a big way. This is not a slap at Microsoft; they need to understand why people are doing this, and how they can work in this world. I don’t expect them to change these buying habits just because Microsoft releases something called “Windows Cluster….”. On other fronts, some vendors were less prevalent as compared to past years. Then again that happens. Thats part of the market. Even some of the powerhouses were mere shadows of themselves. Others had dueling presentation booths, spaced maybe 20 feet apart with simultaneous presentations going on. So far, I haven’t talked about the exciting stuff. In large part because most of it wasn’t at vendor booths, or in vendor talks. There were a few, such as LightSpace Technologies that showed what an incredible thing real 3D can be. It was amazing to see a molecule on the screen that you could walk around. This molecule had a channel, though which other ions could pass. From one view, the channel was obscured, but by walking about 45 degrees off the normal from the screen, you could look down the channel without rotating the image via software. This was absolutely incredible. The possibilities of this for a number of areas is just astounding. This is not 3D on a 2D screen, this is real 3D. Using the display, and some software that they were developing, we explored virtual segmentation on the virtual human models. This is a remarkable display. It is not inexpensive, but if you have to do what it does (volumetric data display), it is really hard to beat, and cheap relative to rear projected displays with 3D glasses… and you don’t need 3D glasses. Sure, you might say, but there is that nice 3D laptop display. Well, I compared the two. I urge you to as well. Night and day. You will understand after the comparison. Really. There is some really neat research on FPGA application compilers. Hidden over at the GRAPE booth from U-Tokyo. Very cool stuff. FPGAs are traditionally not very good at floating point calculations, as the logic required to implement full IEEE 754/854 is huge. The fastest implementations of FPGA based IEEE math may be on the SRC MAP processors. Those processors running full speed would give about the same performance as a dual core Opteron running SSE2 code. Channeling Barbie, IEEE math is hard. This will eventually change. But not likely by orders of magnitude. However, if you are willing to work with non-IEEE math … They were demonstrating a gravity simulator running on 4 FPGAs and it was hitting about 150 Gigaflop. On a single board. Now take 8 of these boards together, write some nice software to load balance, put it in a 4U chassis with lots of cooling … and you have in excess of 1 Teraflop. As I said, the exciting stuff was the hidden stuff. Well, not all of it was on the show floor or in the talks. Some of it was in what the companies were doing. Some companies are busy re-inventing themselves. They will continue to have lots of legacy stuff, but a smart organization is going to realize it needs to adapt to thrive. Ossification is a fast path to extinction. How many petrified-in-the-wool companies did you spot on the show floor? Hint: they were the folks that started to blink quickly and stammer when you asked them why they were purveying technology that had terrible price performance, or offer up meaningless FUD about their competition. It is hard to justify difficult positions. As someone whom had to go through that in a past life, I took the approach of articulating the positive aspects and benefits. I left that company after realizing that there really were no more benefits. Of the companies in turn-about mode, Sun in particular, in embracing the Opteron, is now a contender in HPC cluster and related systems. Their win at TiTech was not only interesting, but it underscored a point your humble author has been making for the last few years on computational acceleration tools. We were asked once if the market for hardware accelerators was as large as 200 units per year. I think we can answer that now. The market is there, the demand is there, and companies such as Sun whom no longer appear to be afraid to challenge/break some rules (in a good sense), are going to help drive the demand for such things. The era of embedded supercomputing is beginning. And this is exciting. Some companies have been flying in stealth mode, and are showing a little bit. Panta Systems is one such. Very interesting architecture and design. The issues may just be the cost structure of the 8xx series of Opteron. As dual core is online, and quad comes on line, this architecture gets very interesting for integrated high density processing power. They have some other quite exciting stuff, but I am not sure what the NDA lets me write about. This is an interesting company, one to watch. Some have been out there with products which are interesting, though I wonder what their limitations are. Penguin showed a personal supercluster with 48 cores in a deskside box. This unit will likely rather badly thrash 96 processor Orion Efficion based offerings in performance on FP heavy codes. As Orion appears in the process of a forced CPU shift due to Transmeta’s woes, their next CPU is critical. And it is interesting. I am guessing that the real seller is the personal cluster 12 processor unit. That is priced right. There were 9500+ attendees, and happily unlike other conferences we have attended, they were not just vendors and their agents attending. We spoke with quite a few customers. Not so many VC’s. Only had a report of two confirmed sightings. Probably a few more lurkers. Considering that Microsoft is getting the message that HPC is mainstream and a growth industry, hopefully it should not be too long before the money folks come along. However, as others have noted recently, there is far too much money chasing far too few deals. A fair number of VC’s seem to have forgotten what the “V” means. This may change at some point when they realize that large rewards imply commensurate risks, and that most of them will not uncover the next Google, or Microsoft (and how much VC money did Microsoft receive? More like Angel money from friends/relatives of Bill and team). Next year should be interesting. Lets see what Microsoft’s vision evolves to (or solidifies into). Lets see if some vendors are still there, and how well and committed others are. That seems to be an issue in this market. It is tough. The big picture take away message is that this is a vital, mature, yet rapidly growing market. Some folks are doing real innovation, but at the end of the day, you still need to convince customers to buy your stuff. Pursuing speed can kill if you cannot make the speed economical. Access to cycles needs to be easy.