In this image, 3 million red blood cells are being simulated moving through a microvasculature topology. A circulating tumor cell is shown in white. 1586 MPI tasks were used on the Duke Compute Cluster. Credit: John Gounley, Duke University.

When a cancer cell detaches from a tumor and enters the bloodstream, where in the body is it likely to come to rest and possibly metastasize? When a cardiac surgeon needs to implant a shunt to bypass a faulty ventricle, where exactly should the surgeon place the device to best ensure the patient’s long-term survival? And if we know that hardening of the arteries contributes to hypertension and plaque buildup, how can we measure an individual patient’s level of risk from those diseases?

Those are the kinds of questions Dr. Amanda Randles, an assistant professor of biomedical engineering at Duke University, is working to answer. She and her team at Duke are modeling human blood flow on supercomputers as a way to improve both the diagnosis and treatment of a wide variety of conditions. 

She cites the treatment of a particular type of heart disease as a case in point.

“We are working with a pediatric cardiologist here at Duke where we’re looking at patients with hypoplastic left heart syndrome. It’s a defect where the left side of the heart is the only side that’s really working. So the surgeon has to go in and shunt blood to the coronary artery in a way that bypasses part of the heart. And what we want to understand is where the shunt or conduit should be. What’s the best location? What’s the best geometry? How would you actually place that shunt and plan that treatment? Before the physician ever goes into the operating room we would model maybe a thousand different options for how they could put that shunt in and identify using the simulations which ones have the best blood flow setup.”

The simulation, Randles says, can help prevent the shunt from failing. “If we know the risk factors — like if we know there is high pressure and the pressure goes above a certain point at certain angles — then we know when the shunt is likely to fail. So we can modify the geometry of the blood flow setup until we have mitigated that pressure.”

Another use case is predicting the spread of cancer.

“For one of the our projects we’re trying to see where the cancer cell is going to go once it has moved off of the tumor and gone back into the bloodstream. So, we want to understand how the cancer cell’s interacting with the fluid and how it is interacting with all the red blood cells around it — when they’re bumping around and moving through the vessels together and how they interact with the walls — and how that might change the trajectory of how that cancer cell’s going to move on through your vessels.”

How the model works — optimizing Lattice Boltzmann

Randles’ team wrote their models from scratch and did so based on the Lattice Boltzmann method rather than on Navier-Stokes equations, a more traditional computational fluid dynamics (CFD) approach. That’s because Lattice Boltzmann represents fluid flows as multi-dimensional meshes, which lend themselves very well to the parallel processing that enables supercomputers to compute a very high speed. Each cell or zone in the mesh represents a point in the fluid so that the higher the mesh resolution, the smaller the zones, the more zones there are to process, and the more accurately the zone corresponds to the physical fluid at each point. That is based on the idea that the speed, direction, position, and other fluid properties at each point are partly responsible for those at its neighbors. So different CPUs in a supercomputer can solve for these parameters at different mesh points at the same time, using solutions derived from previous calculations on their closest neighboring points as inputs. “Nearest neighbor communications” is in fact a critical innovation that greatly improves computational scalability compared to Lattice Boltzmann implementations that require communication among points beyond just those that are neighbors.

Very high scalability is needed because of very high mesh lengths and resolutions. The human vasculature is more than 10,000 miles long, not counting capillaries under one millimeter in diameter. And since it’s not just blood flow, but the movement of cells within the blood, that is being modeled, resolution must also be very high — which makes the number of mesh points to be individually processed very high as well.

 “When we’re looking at something like cancer,” Randles says, “what we’re interested in is the trajectory of this cancer cell through a maze of red blood cells. So we actually do have to model at the cellular level and take into account the interactions with the red blood cells.”

Shapes during constriction passage (from left to right): entering, transiting, exiting, and returning to the full channel, for simple (top) and compound (bottom) capsules. Parameters are Ca= 0.5 and φ = 0.125. Colored by membrane velocity in m/s. Ref: Gounley, J., Draeger, E.W., and Randles, A. “Numerical simulation of a compound capsule in a constricted microchannel." Elsevier Procedia Computer Science, International Conference on Computational Science, Vol. 108 (2017).

Huge computing workload

Even when optimized, the simulation still makes for a huge computing workload. Randles’ (and the world’s) first three-dimensional full-body blood flow simulation (excluding capillaries) at cellular resolution consumed the entire 20-petaflop Sequoia system at the Lawrence Livermore National Laboratory. That is the same system used to simulate nuclear weapons tests.

And this was without simulating cellular interactions. That simulation requires modeling not just the blood and blood vessels (using parameters like fluid viscosity, vessel geometry and wall stiffness) but also the fluid inside the cells, and their membranes too.

“Right now it’s pretty coarse,” she says, referring to the model. “We just have a two-membrane system. We have the outer cell membrane, which is kind of similar to how you would normally do it for red blood cells [which have no nucleus]. It’s just showing the shape and you can take into account the Young’s Modulus [stiffness] of the cell. But for a cancer cell we model another membrane embedded inside to represent the nucleus. And then we have to account for the different viscosities — because the fluid inside the nucleus versus the fluid inside the cell versus the plasma outside the cell could all have different viscosities.”

More optimizations coming

Simulating all this on systems that caregivers might someday actually access requires further model optimizations — making the model not just more robust, but also much more scalable than it already is. To accomplish that goal, Randles and her team are doing all their development work on a 50-node Intel-based cluster with two Intel Xeon processors E5-2899 v4 per node. Each node has 44 cores, two threads per core, of which 40 cores are available to the SLURM workload manager. In addition there are 400GB of RAM and 350GB of SSD scratch memory. In total, that means that 2000 2-threaded CPUs perform the calculations that resolve each point in the blood flow mesh.

The processor choice makes sense given how Lattice Boltzmann makes each processor wait for results from other processors before resolving the next mesh point. “Our code is extremely bandwidth limited,” Randles says. “The lower the bandwidth, the more time processors wait and the slower the simulation runs.”

The Intel Xeon processor has several architectural features designed to speed up I/O-bound applications, like Randles’. One is the ability to utilize fast DDR4-200 memory for high bandwidth during I/O DMA operations. Another is the large amount (55MB) of fast memory in the last level cache (LLC), i.e., the memory closest to the CPU, for very low latency communication between CPUs. In addition, Intel Xeon Processors (formerly known as Broadwell) also support Intel® Advanced Vector Extensions 2 (Intel AVX2) vector technology, so processor cores can execute 16 floating point operations per second (FLOPS) per cycle.

Randles is also relying on Intel experts to help optimize the code. “One of the big ways Intel helps is to show us the bottlenecks these kinds of applications face. I think that’s really helpful. We want hardware partners who really understand the underlying architecture and will help us make the most of it.”

The sooner those optimizations occur, the sooner patients will begin to enjoy the benefits that Randles foresees coming from this technology.

“This will help diagnose in a non-invasive way — like you’ll be able to simulate blood flow from a CT scan and identify regions of low wall shear stress to identify patients at risk of developing certain diseases. But I think the bigger impact will come on the treatment side. You’ll be able to model different options and provide the best treatment on a per patient basis.”

Randall Cronk is the owner of greatwriting, LLC, a technology marketing writing company in Boston, Massachusetts.

This article was produced as part of Intel’s HPC editorial program, with the goal of highlighting cutting-edge science, research and innovation driven by the HPC community through advanced technology