Neko logo. ACM Gordon Bell Prize

Neko – a Swedish led code development effort shortlisted for the 2023 ACM Gordon Bell Prize

It has been a while since the last European nomination for the precious ACM Gordon Bell Prize. For the 2023 edition, a group of researchers from Europe were shortlisted for this prize, regarded also as “the Nobel prize” for HPC, for their simulations done with a code called Neko. Neko is a new generation Computational Fluid Dynamics (CFD) framework for high-order spectral element flow simulations. The simulations for the Gordon Bell Prize proposal were done on EuroHPC Joint Undertaking’s supercomputers LUMI and Leonardo.

This article appeared originally on the LUMI website.

– It is game-changing to have access to these leadership machines in Europe. These systems really put Europe on the same level as USA and Japan and China in terms of supercomputing capabilities. Now we can finally do this research at this level in Europe, thanks to the EuroHPC JU. But it’s not only about the machines, it’s also about the software that needs to be developed and invested in, tells the Neko code’s lead developer, researcher Niclas Jansson at PDC, KTH Royal Institute of Technology in Stockholm, Sweden.

The code is developed at KTH with an interdisciplinary team including HPC experts, computer scientists and domain scientists scattered across various departments at KTH. The work that led to the Gordon Bell Prize nomination included also collaboration with several universities across Europe. A team of 12 researchers from KTH, University of Erlangen-Nuremberg, Ilmenau University of Technology and the Max Planck Institute in Germany were included in the effort.

Neko computational fluid dynamics
Source: Neko’s website

Fine-grained turbulence simulations, reducing energy consumption

Let’s dig deeper into Neko and computational fluid dynamics, how it all started.

– Back in 2016 we were using a code called Nek5000, developed at MIT and Argonne National Laboratory in the US, for high-fidelity spectral element simulations. But we realized that the code wasn’t very flexible, as all new machines will use GPUs or other accelerators. We couldn’t really see how to get that into the old code. We decided to write something completely new based on the good numerical methods that we knew from the previous code. So that’s how it started, Jansson recalls.

Neko is currently mainly used for very detailed turbulence simulations.

– Around 10% of world’s energy is spent on overcoming turbulent friction in pipes, wings and such, so any kind of better understanding on the phenomena makes a significant impact both on engineering but also for energy consumption, Jansson tells.

– We have for example used Neko to simulate something called Flettner rotors. This kind of rotors are placed on top of ships. A big cylinder that is rotating and acts as a wind sail, reducing the energy consumption of the ship, he describes.

What got Neko into the shortlist for the Gordon Bell Prize?

For the Gordon Bell Prize nomination, the group was exploring a physical phenomenon called the Rayleigh-Bénard ultimate regime of convection.

– We tried to answer a long-standing debate inside the turbulent research called the ultimate regime in Rayleigh-Bénard convection. You have a big cylinder that is heated at the bottom and cooled at the top and you get thermal convection of the flow. You may think of it as a lava lamp so there are lumps going up and down. And there’s a big debate that if you have this at a very fine scale, do the thin boundary layers become turbulent? This cannot be done via experiments, it has to be done via simulations. No one hasn’t been able to get this high with what’s called Rayleigh numbers until we have now. So, with our simulations we are trying to settle this debate in the turbulent research community, he explains.

The convection studies can be related to, for example, what’s happening inside the Sun when the lava is compressing up and down.

A computational fluid dynamics code for all platforms

The runs for the Gordon Bell Prize nominated work were done on EuroHPC’s LUMI and Leonardo supercomputers. What makes this interesting is, that LUMI and Leonardo have different architectures. LUMI’s massive computing capacity is based on AMD’s GPU processors whereas Leonardo relies on Nvidia’s GPUs. The Neko code was developed from the very beginning with a strategy for multiple different platforms.

– The entire code is in written in modern Fortran. For the accelerator part we investigated different options, and ended up to write it in GPUs’ native languages. For AMD in HIP and for Nvidia in CUDA. The code actually has both CUDA code and HIP code for the kernels that go on to GPU. When you compile the code, you select which GPU you will target on. The code development of Neko will continue with both AMD and Nvidia platforms, and we need to also consider what to do with Intel. We want to target all different exascale platforms with this code, Jansson outlines.

– It sounds like a lot of code to maintain, but the methods we are using allow us to really minimize the amount of duplicated code, a positive side effect of this is that this has also turned into an interesting software engineering exercise as well, he continues.

Perfect scaling on an unprecedented scale

In addition to the multiple platforms, the team was able to exploit 16,384 GPUs on LUMI simultaneously. To be able to manage this, the team needed to develop a completely new algorithm to keep the GPUs busy.

– In the Gordon Bell Prize proposal, we achieved a perfect scaling of the code on 80% load on LUMI. To get this, we developed novel task-parallel algorithms to better utilize the GPUs and new workflows to handle the data, so we did what is called in situcompression of the data, because in this scale, the output would otherwise be too large. I believe these are all reasons why we were nominated as a Gordon Bell Prize finalist, he presumes.

How did LUMI help?

The team was amazed by the performance they were able to reach on LUMI.

– We also really liked the tightly integrated environment that LUMI, the HPE Cray Ex system, is. Having the network directly on the GPUs is a very important thing for us who do quite a lot of communication, he explains.

The team is looking forward to the exascale and post-exascale era:

– Computational fluid dynamics is one of the few fields that greatly benefits from larger systems. We are not even close to getting the scientific results we need from the simulations, so we definitely need even bigger systems. We don’t even know if we can answer this Rayleigh-Bénard ultimate regime question on an exascale machine, we probably need a post-exascale for that. More is more in our case.

The 2023 ACM Gordon Bell Prize winners will be awarded at the SC23 conference in Denver, Colorado, this November.

Niclas Jansson will be visiting the SC23 LUMI booth #206 on Wednesday 15 November at 12.30–13.00.

ENCCS supports and has also supported multiple code developments in a wide range of sectors. See our work here.