"The Grid"(TM) (with extra hype, no information content ...)
By joe
- 3 minutes read - 528 wordsI read an “amusing” piece this past weekend, where people connected with the LHC project at CERN talked about how they would do data distribution and computation. Basically they are building their own data network, and doing some interesting bits with large volume data caching/distribution.
Ok. Then we get this. You know, the “Grid” will make the internet obsolete. Oh bother. Let me ask a simple question of said media outlets. Is it possible to, say, assign stories to people with an understanding about what they write? Or at least vet the story with someone with a clue? Please? (shakes his head in disbelief) The Grid is an unfortunately over used/abused term to describe a distributed computing paradigm that is more loosely coupled than a cluster. Its current incarnation is what you hear in the “cloud” computing discussions. I would argue that a better term would be “nebula”, though that is a little tongue-in-cheek. Using the current “cloud computing” terms, there are quite a few applications that do benefit from distributed computing implementations. Distributed data processing is one of them, and that is being used for the LHC work. The problem in any distributed computing model is data motion. If data motion is your problem, as in your codes algorithms depend upon timely delivery of data, then you are bound by one or more of latency (how long you have to wait for data to be available after requesting it), or bandwidth (how much data you can pump down the pipe to your program). LHC folks built a data distribution network above the normal internet, as the internet connections did not have the necessary bandwidth out to the primary data sites. The secondary sites are connected (as I understand) by dedicated bandwidth lines, possibly with bandwidth reservation over the internet connections, or possibly with their own fibre pulls. The tertiary sites are connected via internet as far as I know. Yet this article breathlessly talks about downloading DVDs in 2 seconds. It talks about how the Internet has been made obsolete by this distribution network. Ugh. Maybe it is time for those media outlets to hire scientists and engineers who can write, and explain things, rather than having a non-technical person attempt to string nice sounding technical “factoids” together into something that on the face, looks like a story, but really isn’t. LHC is building a data distribution network. They are doing so as the existing internet connections between primary and secondary data centers are not fast enough to handle the extreme data outflows. From there, they will be distributing the data to computing systems in a loosely coupled conglomeration called “the grid” which will handle the tremendous volume of computing tasks needed to find signals in their noise. They need so much data as their events are very rare, they need to gather as much information as possible in order to get reasonable statistical measures and error estimates. The computing load is intense. Only distibuted groups of machines (clusters and grids) could handle it. The data load is intense, several gigabytes per second. Sounds quite a bit different from “downloading DVDs in 2 seconds” now, doesn’t it.