Recently we performed Nek5000 bencherking tests on several heterogeneous HPC systems, namely Piz Daint at CSCS (Swiss), Longhorn at TACC (U.S.A), JUWELS Boost at Juelich (Germany), and Berzelius at NSC (Sweden).
Speed-up between 3-5 can be achieved using the GPU accelerated version in comparison with the CPU version on these systems . The following figure shows the CPU and GPU performance for a pipe simulation with Re_tau =550 for 20 timesteps JUWELS Booster system, where each node is equipped with 48 Intel CPU cores and 4 NVIDIA A100 GPUs. The case consists of 853632 elements with 9-th polynomial order.
 Vincent, et al., Strong Scaling of OpenACC Enable Nek5000 on Several GPU Based HPC Systems, preprint, submitted to HPC ASIA 2022.