Paul Headshot with Name and TitleA couple of weeks ago, the R&D Daily featured news of the Tianhe-1A supercomputer assuming top dog status in the global list of the 500 fastest supercomputers.

The buzz was, unfortunately, less about the relative merits of the computer’s performance (and what it might contribute to humanity) than it was a fearful assessment of where the U.S. stands in global innovation rankings. While some warned of China’s ascendancy, others pointed to Tianhe’s reliance on cheaper GPU architecture as indication that China has work to do if it wants to be considered at the top in overall computing technology. Some rejoined, anyway, that IBM is working on Sequoia, a 20-petaflop wunder-machine that will, if finished to spec, be faster than all of the Top500 computers combined. Its performance will spring from 1.6 million 45-nm cores and 1.6 petabytes of memory, generating enough heat to require 96 refrigerators covering about 3,000 square feet.

The new 45-nm cores will be powerful, but that 1.6 million number indicates where supercomputing has gone: the processors no longer matter as much as how the data is shared between processing nodes. New supercomputers are constructed using commodity processors and other parts that are joined together using custom interconnects, and, often, highly sophisticated software schemes to deal with the fact that data latency is a big problem in massively parallel systems. It’s all carefully  

In computing parlance, it’s called the difference between a compute-bound and I/O-bound problem. In the early days of supercomputing, great effort was put into developing specialized types of microprocessors. Later, developers saw that processors were only part of the supercomputing problems, as latencies among massively-parallel processors setups could be in the microsecond range. It begged the question: What’s the point of using so many processors if they all have to wait for data to arrive?

The answer is that it’s cheaper to deal with latency than it is to design new microprocessors. The IBM Roadrunner at Los Alamos National Laboratory, for example, derives thousands of its nearly 20,000 processors from ones used in Sony’s PlayStation 3 and was the top supercomputer in 2008 after nearly pegging 1.5 petaflops. The Cell Broadband chips are impressive, but so is the engineering that went into making the Roadrunner operate efficiently.

The lightning flow of data is the great advantage of supercomputers. Without it, their need to exist would evaporate. Take CERN. The monumental number-crunching tasks that experiments at the Large Hadron Collider generate can be broken into pieces, and it makes sense to process them on a relatively small nodes and combine them as quickly as the Internet allows. The grid computing approach is sensible, cost-effective solution to what normally would require a number of exceedingly expensive supercomputers.

But some tasks resist fracturing: the modeling of weather, the sequence of a supernova, or the prediction of an earthquake. Grid computing won’t fulfill this need in the next few years. Network speeds are improving, but even 100G isn’t nearly enough to pace a Roadrunner. Or a Tianhe. And especially not a Sequoia. So for the immediate future, we can expect the supercomputer horse race to continue.