From today’s HPCWire.
What Professor Snir said for programming HPC also holds true for designing HPC systems and clusters.
Anyone can take group of machines and “turn them into a cluster”. Heck, you can even ask your local, neighborhood MCSE to do it for you.
And it may work well for some set of problems.
But what happens when performance goes into the dirt on a critical code, and you don’t understand why? This is not a theoretical problem. We get questions like this all the time from people with such ad-hoc clusters. We have one we are working on resolving now.
Professor Snir continues
Similarly for designing and building a scalable computing resource.
Most people can program. Few people can milk every last bit of performance out of a program. Which one of these groups do you want to design and build your algorithms? Remember, both will use the same hardware. One will get far better performance than the other.
Same thing for cluster design. Anyone can string Cat-6 between machines and a switch. Who can architect the system to be able to scale under tremendous demand for resources? It might help to start with a group that knows what “to scale” actually means.
You shouldn’t get your brain surgery from a nurses aide. You shouldn’t get your HPC systems from a PC technician. In both cases, the upside is that you might save a little money by doing what you shouldn’t do. The downside can be really bad. Better than even odds that things go pear shaped awful fast.