The Natioal Library of Sweden (Kungliga Bibliotek, KB) accesses MeluXina supercomputer through the EuroHPC JU Regular Call with the help of ENCCS. It’s the second time KB gains access to a EuroHPC JU supercomputer and makes it the first public administration entity to be accepted through the EuroHPC JU Regular Call which gives the ability to use an even larger ammount of CPU and GPU cores.
With this access, KB is now able to use 10 000 000 GPU core hours to train their Natural Language Processing models using their enormous text, audio, and video archive.
But let’s take a step back and explain why and how KB is using such large coputing power.
The transformer neural network (Vaswani et al. 2017) and the subsequent transformer language models (LM) such as GPT (Radford et al. 2018) and BERT (Devlin et al. 2018), they have become the new standard of a pretraining-finetuning paradigm. This approach leverages large amounts of unannotated text data in a self-supervised pretraining step, producing generalist models that can then be finetuned on specific tasks.
The largest of these models consists of several hundred billion parameters requiring the model itself to be split over multiple GPUs, further needing up to multiple thousand GPUs to be trained in a reasonable amount of time. Due to the extreme size of the models and their ability to absorb massive datasets, they have shown to be very adept at learning new tasks with only a few training examples or even none, instead only needing a prompt describing the task (Brown et al. 2020).
Due to the high financial cost of training these models, most of them have only been trained by large companies on English data. In developing a competitively-sized model for Swedish, the Royal Library of Sweden hopes to enable commercial and non-commercial usage of this technology, while allowing researchers to understand what these models learn not only for English.
ENCCS has produced a video with Love Börjeson, the director of KBLab at the National Library of Sweden, explaining their work on building Swedish language models and the importance of European supercomputing infrustructure.