National labs join forces to develop next supercomputers
Lawrence Livermore National Laboratory (LLNL) has joined forces with two other national laboratories—Oak Ridge and Argonne—to deliver next-generation supercomputers able to perform up to 200 peak petaflops, about 10 times faster than today's most powerful high-performance computing (HPC) systems.
The Collaboration of Oak Ridge, Argonne and Livermore (CORAL) national laboratories will produce systems in the 2017 to 2018 timeframe to support the research missions at their respective institutions. At LLNL, the system will serve NNSA's Advanced Simulation and Computing (ASC) program in support of stockpile stewardship.
CORAL is an important step in the development of the exascale systems needed to take on complex scientific problems—such as global climate and weather modeling—that today's top HPC machines cannot address with sufficient resolution. Such HPC systems are a cornerstone of the nation's' effort to ensure the safety, security and reliability of the nation's aging nuclear deterrent as well as other national security challenges. The first well-balanced and power-efficient exascale systems are expected before 2025, assuming the country embarks on an exascale initiative. It’s possible, however, that inefficient systems could appear as early as 2020 somewhere in the world. A joint Request for Proposals for the CORAL procurement was issued January 6, 2014, and responses were submitted February 18, 2014. These are now being evaluated. The intention is that CORAL partners will select two different vendors and procure a total of three systems, two from one vendor and one from the other. Livermore is leading the procurement process.
Livermore's system, to be called Sierra, will be best suited to support the applications critical to stockpile stewardship. Oak Ridge and Argonne will employ systems that meet the needs of their DOE Office of Science missions under the Advanced Scientific Computing Research (ASCR) program. Because of the technological advances required for CORAL systems, a "deliberate and strategic" investment plan is an integral part of the collaboration. Consequently, CORAL includes targeted investment in "non-recurring engineering" (NRE) research and development contracts.
"The NRE R&D contracts enhance what we would otherwise get," said Bronis de Supinski, noting the R&D allows for earlier optimization of the applications that will run on the new system. "Vendors work with our application teams transferring knowledge of system architecture to our applications." The objective is for scientists to be able to run their applications as soon as possible on the new system, according to de Supinski.
The R&D contracts also help address the technological challenges of developing new systems, such as containing power requirements; ensuring memory bandwidth is sufficient to give scientists the full benefit of the machine's computing power; and making sure the system is reliable and resilient given the machine's many components.
The vendors selected will build small prototype systems that will be used to determine the final decision on building the full systems.