As researchers scramble to deliver R&D results and bring products to market, they are
turning to high-performance computing. Vendors are competing for their business. Can everyone
adapt to the cloud?
|
ANSYS-CFX was used in the cloud via Windows HPC Server to depict wave formulation around a seafaring vessel. ANSYS is one of many vendors to develop software specifically designed to remotely take advantage of highly parallel computing systems, offering customers high-end performance and faster results. Image: ANSYS
|
What laboratory tool has made the most difference in research and development? Arguably, it’s
the personal computer. In the early days of computing, specialized clusters of high-performing
processors were often needed for data-intensive tasks. But as chipmakers upheld Moore’s Law,
desktop machines and even laptops became powerful enough to handle complex design and processing
tasks.
Personal computers are ubiquitous and indispensible, but often are no longer powerful enough,
even for daily research tasks such a processing a Microsoft Excel spreadsheet. Circumstances have
conspired to force researchers to seek a better solution. As microprocessor speed has stalled,
data volume has exploded. In 2004, according to Dave Turek, vice president of deep computing at
IBM Corp., Armonk, N.Y., computer scientists recognized the limits of
microprocessor technology and realized the best avenue for more performance was to group large
numbers of processors together and leverage strength in numbers. Multi-core was born.
Now, through a combination of multicore processing, commoditization of high-end service
components, and high-speed communications, high-performance computing (HPC) is handling the heavy
lifting of high-technology R&D.
“What really has changed is the migration of traditional techniques and approaches into a
non-classical domain. When you peel back the covers, at its core, software is sophisticated
mathematics used to answer problems,” says Turek.
HPC represents this new domain, where linear programming gives way to counter-intuitive
parallel processing and where researchers stand to make tremendous gains in knowledge, if they
know how to get the most out it. As a result, research organizations cannot consider adopting HPC
without gaining knowledge of the associated software, tools, components, storage, and services
that together form the infrastructure for intensive computation.
A cloudbank of high performance
HPC has typically been associated with supercomputers. The world’s first petascale computer, the
U.S. Dept. of Energy’s Roadrunner at Los Alamos (N.M.) National Laboratory, is a hybrid, comprised
of 12,960 dual-core processors from both IBM and AMD, using specialized blade servers. Its
distributed computing software is optimized for large-scale parallel computing based on an Open
Message Passing Interface (MPI).
As complex as it is, however, Roadrunner shares a large number of commonly available components
with advanced desktop computers. Dual-core processors are widely available at reasonable prices.
InfiniBand interconnects are an industry standard. Roadrunner’s Red Hat Enterprise Linux is a
widely available platform. Simply put, even the most advanced supercomputers are a essentially a
collection commoditized components.
|
Speedo’s LZR RACER swimsuit made a splash at the Beijing Olympics, helping break a number of swimming record. The design features panels that reduce drag, and are positioned based on fluid flow analysis using ANSYS software. Product design projects like this are increasingly taking place using cloud-based HPC. Image: ANSYS
|
As a result, HPC has become available to companies large and small at costs that are a small
fraction of what was required just a few years ago. Even the U.S. government has been able to
economize. The U.S. Air Force in 2010 completed a half-petaflop supercomputer using more than
1,700 Sony Playstations, which themselves rely on high-performance graphics processing units
developed for the gaming market. The cost, $2 million, is modest compared to relative expense of a
supercomputer in the 1990s.
HPC, therefore, is a broad definition for what has become a multi-billion dollar industry
centered around servers, cluster, and supercomputers. With the advent of virtualization software
and hosted online work environments—often referred to as “the cloud”—the distinctions of a
“computer” break down. Computing is now measured in CPU cycles. The most important metric of a
physical cluster is now Watts per flop.
Users of common online tools like Gmail, Flickr, and Youtube already know the beauty of the
cloud: spreadsheets, photos, videos, and documents can be edited, processed, reshaped, and
distributed without running anything other than a Web browser. Researchers who use tools like
GAUSSIAN (Gaussian, Inc., Wallingford, Conn.) for chemical research, MATLAB
(MathWorks, Natick, Mass) for symbolic computation, or ANSYS
(ANSYS, Canonsburg, Pa.) for modeling and simulation, are just as likely now to
rent space on a high-performance cluster as they are to run their routines locally.
Why the rush to abandon the desktop processor? Speed and convenience.
Microsoft, which began its cloud-based service effort in 2005 with the Compute
Cluster, cites several case studies of companies that benefit from HPC, including
ORECA, a Toulon, France-based car racing company that was able to reduce its
computational fluid dynamics calculation times from a day and a half to four hours.
Scripps Research Institute (La Jolla, Calif.), which also adopted the Windows HPC
Server offering, was able to scale up data throughput of cancer analysis by 30 to 50 times.
“In the last few years, we have quickly seen a growth in customers using cloud-based resources to
address customer workloads,” says Bill Hamilton, director of technical computing at Microsoft.
While relatively new to HPC, Microsoft quickly established pricing/licensing model geared around
short-term demand.
“It’s pretty cheap. Users can rent a core for an hour for 4 cents,” says Hamilton. “Some
application providers are still trying to figure this out. It’s a consumption-based model versus a
fixed-price model.”
Hamilton says the burst scenario has become popular with users. Bursting refers to on-demand
access to HPC only after exceeding a certain level of capacity with on-premise hardware. This
allows users to maximize the return on investment in their own computing resources, while at the
same time seamlessly leverage cloud HPC services without losing time.
The ability for third-party HPC service providers to accommodate independent software vendors
(ISVs) has gone a long way toward giving researchers the courage to adopt virtualized work
environments. Major applications such as MATLAB, ABAQUS/Simulia (Dassault
Systémes, Cedex, France), and Star CCM (CD-adapco, Melville, N.Y.), have
worked with HPC providers like Microsoft to develop tools that take advantage of parallel
processing.
Broad industry applications
Based on what R&D Magazine learned from vendors and computing industry experts in a survey
of reader (see sidebar) as many as three-quarters of research organizations in all fields have
been using HPC in some manner. John Fruehe, director of product marketing for server/workstation
products at AMD, Sunnyvale, Calif., cites automobile manufacturers, many of which
are now reliant on HPC.
“Typically, to test crash performance, you build a car, put it on a closed track, shoot it down
the track, and then look at it, see how it reacted. This is very costly and time-consuming. People
who are now doing that type of modeling in the lab using HPC can smash that same car into the wall
on a hundred different angles. It allows them to prototype and build a car a lot faster,” he
says.
|
A IBM’s Blue Gene installation. IBM has been at the forefront of high-performance computing developments, and its InfiniBand interconnect technology is at the heart of many computer clusters used for HPC on the cloud. Image: IBM
|
Another attraction of cloud-based HPC is its lack of infrastructure. By moving major projects
off-site, companies can complete major projects without having to invest in brick-and-mortar
computing resources.
In addition to the cost of purchasing a high-performance cluster, the skill set required to
maintain such a system pushes costs much higher. An IT department that is geared to handle a
network of desktops will need to be upgraded to handle a cluster that requires optimization for
parallel processing.
At Cornell Univ., Ithaca, N.Y., Dave Lifka, director of the Center for
Advanced Computing, helps manage a 512-core cluster that coordinates with a developer of
scientific software, MathWorks, to allow researchers to conduct high-level computing using the
company’s popular MATLAB numerical computing environment and Simulink algorithm engine.
“There’s a nice buzzword connection with cloud, but I prefer to call it utility computing,”
says Lifka. To that end, his cluster at Cornell offers a seamless interface to parallel MATLAB.
“The motivation for providing this is two-fold,” he says. “There are more users of MATLAB than
there are users of national supercomputing centers. Whereas people using high-end supercomputers
at national labs optimize their code for any new type of hardware or networking technology, and
technology enthusiasts at that level can’t wait to see how far they can push the envelope, we want
to broaden participation in HPC and help those with scientific needs that might be less
demanding.”
Anticipating the growth of HPC, MathWorks has been building parallelization modifications into
the software. Typically, the user can adopt a parallelized approach to software execution by
inserting a few key commands. The complicated routines necessary to distribute computations over
multiple cores are accomplished in the background by MATLAB.
Without these measures, “the user can get into deadlock situations that are difficult to
overcome,” says Silvina Grad-Freilich, parallel computing marketing manager at MathWorks. That’s
skipping into low-level programming. That is a show-stopper for engineers to get into HPC.”
To further the convenience, MathWorks has built toolboxes that sit on top of MATLAB that
include a cluster interface. Users can jump from one cluster to another cluster if the speed isn’t
fast enough. These measures are not without responsibility on the part of users, however.
“The programming aspect is very simple, but you need to know what you are doing. There is the
cost of bringing the data and pushing it out,” says Grad-Freilich.
At Oak Ridge National Laboratory, Tenn., one of the world’s fastest computers,
Jaguar, is serving a test bed not only for the way high-level computing is performed, but also for
how the latest software for HPC is being developed. Some projects ORNL have never been attempted
before the installation of Jaguar: global climate models, supernova explosion models, weapons
testing, and combustion modeling.
“HPC is enabling things that couldn’t be done anywhere else,” says Buddy Bland, project
director of ORNL’s Leadership Computing Facility, Oak Ridge, Tenn. The top application, he says,
is modeling global climate change. “The computational climate group here at Oak Ridge is working
on various big systems, and models we had no confidence in five years have become models we can
use with a very high degree of confidence,” says Bland.
Getting there hasn’t been easy, however. According to Ricky Kendall, Group Leader, Scientific
Computing, National Center for Computational Science, Oak Ridge, Tenn., the
biggest challenge he faces is not running programs on Jaguar. It’s working with scientists.
“We’re software detectives. What happens a lot of the time is that users think they know how
the code works and they don’t,” says Kendall. Some users are sophisticated and can appreciate the
measures that need to be taken to run a complex project efficiently on a supercomputer. Others are
not.”
This isn’t surprising, admits Kendall. Researchers tend to specialize in their domain, and even
if they’ve optimized their model to run on a cluster, they would still have to make adjustments
when running on a larger system like Jaguar, which has specialized algorithms and hardware.
“The game plan, usually, is to conduct an analysis of the code and convince them it’s a good
idea to fix the code,” says Kendall. Not everyone is amenable to “fixing” code, but Kendall has
gotten used to it. Citing an example of the pitfalls that face researchers that want to use HPC,
Kendall described the way a project team analyzing the way the nuclei of atoms are built was
running a sorting algorithm. A member of the code team at ORNL found a bad “bubble” sort and
replaced with a “heap” sort. That one change, which consumed 50% of runtime for the model, dropped
the sort load to 3% of overall runtime, a huge savings of CPU cycles.

click to enlarge
HPC Survery Graph 1 |
|
“We told them about the new sort, and they told us they didn’t they were running a sort,” says
Kendall. “They’d been running the code for years without needing it.”
The function could have been introduced as a debugging tool at some point in the past, says
Kendall, and just never removed.
Another adjustment researchers will have to make is to know which ISVs are ready for HPC.
Already, Microsoft, Dell, HP, Amazon, and major
HPC environment providers offer major software titles that have been pre-optimized. In addition,
there are so-called virtualization providers that go to the next level by providing an entire work
environment with local file access. ScaleMP, Cupertino, Calif., is one such
provider, having developed its Versatile SMP (symmetric multiprocessor) architecture to aggregate
multiple x86 systems into a single virtual system. The process is akin to building a parallel
supercomputer on the fly. The service combines up to 128 x86 systems with a capacity of up to
1,000 processors (or 8,000 cores) and up to 64 TB of shared memory.
Shai Fultheim, founder, president, and CEO of ScaleMP, found that a surprising number of large
organizations were being stymied by both the cost of building parallel systems and by the demands
of running complex computing projects on parallel systems.
“What I found out from talking to customers and running development groups, scientists, whether
they are in life science or materials, they understand their domains, they can write serial code,
but when its gets to parallelization, that’s where it ends,” he says.
Right now, his virtualization business is dominated by higher education, which accounts for
eight in 10 customers. But that ratio is changing with the addition of customers such as
STMicroelectronics, Volkswagen, and Pfizer.
According to Fultheim, in the last 12 to 18 months, “we are seeing a diffusion of
virtualization into the commercial space.”
Management, he says, has caught wind of the benefits of full virtualization and is beginning a
move in that direction. By specializing in virtualized HPC, ScaleMP has earned business from other
HPC providers, such as Dell and HP.
HPC becomes big business
According to market research and consulting firm Intersect360 Research’s report Traditional HPC
Total Market Forecast: 2010 to 2014, the traditional HPC industry will be worth $21.8 billion by
the year 2014.
However, for research labs, access to capital is still limited, and for this reason cost
remains a major concern for laboratory managers. The cost concerns stem from whether on-demand
services are less expensive than physical cluster assets, especially in the context of long-term
HPC projects. Public cloud services can charge according to the number of CPUs an organization
uses, how much memory, network throughput and the amount of data they need to store. According to
U.K.-based research firm Ovum, CPU costs are still relatively high on the cloud, while data
storage has become relatively inexpensive. An application that uses a lot of processing cycles but
not a lot of data might not benefit from a cloud service. However, the advantages really depend on
how the companies charge users. Some offer services based on data footprints.

click to enlarge
HPC Survery Graph 2 |
|
The decisions run deeper still. Some HPC routines, for example, using the Gaussian quantum
chemistry engine, benefit from a lot of data-processing cores (64 or more). Others, according to
Fultheim, require lots of memory, such as Pfizer’s or Roche’s terabyte-scale bioinformatics
tasks.
Another factor is timescale. A project that scales out but concludes quickly, such as a
comparative proteomics analysis, should deliver a high return on investment. But HPC might be
costly for a long-running series of design simulations.
Finally, the headaches of acquiring HPC the traditional way might not be worth the extra cost.
Companies who build clusters must factor the cost of the IT infrastructure, as well the IT
personnel and skill set, into the project cost. This potentially destroys the ROI.
“There are certainly challenges from the standpoint of IT,” says Fruehe. “Instead of building a
cluster, people can buy compute cycles. The biggest benefit of HPC to the researcher is not having
to buy hardware.”
Implicit in Fruehe’s statement is the additional savings from not having to lease space or pay
for power use when using HPC on the cloud.
Software is the core
The other major concern of R&D’s readers is software. Vendors are scrambling to provide revisions
to their products that allow users to seamlessly integrate parallel processing routines in their
projects without having to rewrite a massive quantity of code. Fultheim sees a large gap between
the capabilities of the HPC ecosystem and the ability of the end user to utilize them to the
highest degree. “One of the biggest problems is that it’s hard to build parallel software,” he
says.
Hamilton agrees that parallel processing has been a roadblock for software developers. “Going
from single-core to multi-core is a really difficult computer science problem. We need to make it
easier for developers to program for multi-core, and this will continue to provide innovation in
the cloud and make it easier to consume HPC,” he says.
This responsibility will lie on HPC providers and ISVs. For researchers, the future of
computing is one that hopefully features less frustration and more performance.
“The beauty of a cloud-based system is that you don’t have to be at the cutting-edge of
hardware,” says Lifka, who, in his years at Cornell has noticed that the focus has shifted away
from the prestige of owning and running a cluster and has moved more toward simply enabling
research.
“Now what’s important,” he says, “is the ability to quickly stand up as a powerful resource
that researchers can take advantage of quickly. In this way it helps researchers to become more
competitive in their research and win more grants.”
High-Performance Computing Survey
R&D Magazine surveyed readers to discover the impact high-performance computing is
having on researchers and laboratory managers.
The survey revealed that nearly three-quarters (about 75%) of participants have already used
HPC for their work.
More than a quarter of respondents (27.9%) reported that their R&D work involves engineering.
This is followed by about 15% who conduct research in life science, 13.5% in manufacturing and 6%
in materials R&D. More specifically, nearly a quarter (24.3%) of survey participants listed their
job function as engineer, followed closely by general researcher (22.9%). Executive- and
director-level participants accounted for approximately 40% of survey participants.
In fundamental research, respondents who use HPC claimed it was either extremely important
(44.1%) or very important (38.2%). When asked about its importance for research “results”, 88.2%
of respondents believed HPC to be extremely or very important. In terms of cost savings, product
development time, and budget requirements, HPC was less valuable, but no major categories of
HPC-enabled R&D scored less than “somewhat important”. Those who do use HPC resources do so
frequently: 42.9% use it at least once per day.
What is HPC used for?
The results, unsurprisingly, show a wide range of disciplines deriving benefit from HPC, from
astrophysics to chemistry (see chart 1 in article above).
Categories that saw frequent use of HPC by respondents included data acquisition and testing
(55.2%), design (40.8%), simulation (40.6%), modeling (37.9%), and materials sciences (25.9%).
Respondents described their tasks using HPC, including “modeling and simulation using finite
element analysis”, “computation fluid dynamics”, “forming simulation with multiphysics”,
“high-speed data acquisition and simulations”, and “predictive life cycle performance of advanced
high-performance materials systems”.
The life science disciplines were not as well represented among frequent HPC users, but the
tasks that do usually require HPC to handle massive data amounts (genomics, proteomics, drug
development) are likely in the minority in the larger scope of biological research. Respondents in
this research area cited their HPC usage as involving “in-silico screening for drug
discovery” or “molecular modeling for prediction of molecular properties”.
Where does HPC fall short?
The cost of using HPC and the constrainments of R&D budgets were top concerns, with nearly 70% of
respondent ranking this issue as either “extremely important” or “very important”. Worries about
data integrity and security followed, with about 60% placing them in the top two rankings.
Nearly 40% of survey participants said that software optimized for parallel processing will be
“extremely important” for HPC success in the next five years. The expansion of HPC for small -and
medium-sized businesses (29.7%), better software integration (data and graphics) (24.3%),
massively parallel multi-core systems (22%), and the spread of graphical processing units (21.7%)
followed as the crucial trends that users say will transform HPC in the near future.
|
|
Published in R & D magazine: Vol. 53, No. 1, February, 2011.