Almost two decades ago, Kendall Square Research, an ill-fated Massachusetts start-up company, hinted at the future by making the microprocessors for its KSR-1 massively parallel supercomputer on the same chip-making factory production line that made the processor chip that powered the Sharp Wizard, one of the first P.D.A.’s.
The Cold War was over. It was obvious that the economics of scale that once led innovation in computing to trickle down from the high end of the industry to the bottom were being turned topsy-turvy.
The tail was wagging the dog, the consumer electronics industry was where innovation happened first and computing would never be the same.
I took to describing the the change by saying: “The companies that make the world’s fastest computers are the companies that make things that go under Christmas trees.”
It may not have been precisely accurate, but it underscored the fundamental fact of modern computing: By increasing the number of transistors on a single sliver of silicon at an accelerating rate described as Moore’s Law, the computer industry is literally remaking the world in the space of a few decades. Not only does each computing generation upend the previously dominant technology, the free-falling cost of computing is remaking science, industry, medicine and every other nook and cranny of modern life.
Lo and behold, several decades after the ill-fated KSR-1 and my statement is actually true–more or less. The world’s fastest computer, as measured by the ability to solve an array of linear equations, is a Los Alamos National Laboratory supercomputer, assembled from components originally designed for Sony PS3 video game machines. In the twice-annual rankings called the Top 500 list, published on Wednesday morning, the machine dubbed Roadrunner reached a long-sought-after computing milestone by processing more than 1.026 quadrillion calculations per second.
To be sure, there should be an asterisk after the record, for one quadrillion instructions a second, or a petaflop, is a peak number, achieved while calculating just one problem with the aid of extensive hand tuning.
To the extent that the Roadrunner’s 12,960 I.B.M. Cell processor chips cannot be kept busy simultaneously by the machine’s programmers, what the Laboratory has is merely a very impressive space heater. And programming these machines has long been their Achilles heel. Anecdotal reports from the designers and the programmers of massively parallel computers indicates that efficient use of tens of thousands of processor chips on a single problem is generally the exception rather than rule even after more than two decades of experience.
There are some hopeful signs. A technical paper produced by Los Alamos to be presented in Germany indicates that the Roadrunner is capable of producing impressive results in computing problems the nation’s nuclear weapons stockpile stewardship program cares about most. In computing a program called Sweep3D, which is apparently useful for simulating the three-dimensional shock-waves generated by nuclear bomb blasts, it was possible to accomplish the task between three and seven times faster.
But that also suggests that performance increases won’t come for free. Each new problem will have to be painstakingly optimized to extract the full potential from the Roadrunner.
I.B.M. again dominated the peak of the Top 500 lists, with five of the top 10 machines. SGI has two computers in the top 10 while Cray and Sun Microsystems each have one in the summer of 2008 ranking.
The Sun Ranger, which ranked fourth, is a particularly sad story. The machine, designed by Sun co-founder and uber-architect Andreas Bechtolsheim, was originally planned to be completed for last November’s ranking and it stood a good chance of topping the list. But because of a delay in the arrival of the of the 15,000-plus quad-core A.M.D. Opterons that power the computer, it missed its window of opportunity. John Fowler, Sun’s executive vice president, said in a telephone interview on Tuesday that the Ranger, which was funded by the National Science Foundation, is an unusually well “balanced” machine. That means that there are large data pipes that connect the A.M.D. processor chips, insuring that they won’t starve for data to calculate. Less time waiting means the less time a supercomputer spends as a space heater.
A footnote: What comes after a petaflop, or a quadrillion calculations per second? Jack Dongarra, a University of Tennessee computer scientist who has been one of the keepers of the Top 500 list, provides a roadmap: “There’s peta, exa, zetta, and yotta,” he wrote in an e-mail recently. “Then beyond yotta it begins to get a bit vague.“ He pointed to one effort to follow multiple orders of magnitude.
If you liked my post, feel free to subscribe to my rss feeds
























BlogoSquare