The LUMI GPU / Nomad CoE Hackathon was hosted on Sept. 4-6 at CSC in Finland targeting research software developer teams targeting the GPU partition of LUMI with AMD MI250X GPUs. Seven teams participated in projects focusing on computational materials science and computational fluid dynamics while being mentored by experts from AMD, HPE and National Competence Centres.
Two members from ENCCS, Yonglei Wang and Wei Li, worked as active mentors for this GPU hackathon.
Within the three-day GPU hackathon, Yonglei (ENCCS Training Coordinator) worked with the QuantumESPRESSO (a suite for first-principles electronic-structure calculations and materials modelling) team (Fabrizio Ferrari Ruffino, Ivan Carnimeo, Oscar Baseggio, and Laura Bellentani) focusing on batched/streamed FFTs (async, data movement) and porting/profiling Hubbard code (matrices, optimal batch sizes).
For the first topic, we proposed using double loop as hip kernels so that one can execute these kernels on given streams, implement relevant models, and then compare the performance of FFT schemes with and without streaming computations implemented in CUDA and HIP code.
For the second topic, we worked on Hubbard forces and stress unifying interfaces for different offload models (openACC vs. OpenMP), identified the bottlenecks in the Hubbard code, and found a suitable test case to trace the performance of different code blocks.
A QuantumESPRESSO team from MaX CoE together with ENCCS training coordinator Yonglei Wang during the hackathon.
Wei Li served as one of the mentors for the FHI-AIMS team. The team consisted of three talented members from TU Dresden and Aalto University. They quickly adapted their CUDA code to run on LUMI-G node using HIPIFY.
Even though at the beginning the code was even slower than the CPU version, they soon realised there was a big overhead caused by creating a stream and allocating arrays inside the nested loops. With the guidance of the AMD expert, they perfectly solved this problem.
The three-day hackathon was not enough for them to implement their final idea of reordering the loops related to tensor operations for better GPU suitability. We wish the FHI-AIMS team good luck in their future work!
After the three-day GPU hackathon, not only the participation teams but also the mentors from ENCCS got significant improvement regarding specific applications of GPU programming.