The Natural Language Processing (NLP) Group at RISE was awarded 3500 GPU node hours at the EuroHPC JU system Leonardo. They will use LLMs to extract information on extreme climate events from online texts.
Organisations involved
RISE is an independent, state-owned research institute that conducts research in a broad range of areas, including artificial intelligence and advanced data analysis. RISE’s mission is to support the renewal, sustainability and competitiveness of the public and private sectors in Sweden. The NLP group carries out research on fundamental topics such as resource-efficiency, multilinguality and language model-evaluation, and on applications such as political survey analysis, detection of adverse effects of medicines from medical data, and retrieval of impacts of extreme climates events by mining online texts.
Technical/scientific challenge
Extreme climate events cause adverse environmental and socio-economic impacts. Mitigating these impacts requires understanding how they link to climatic drivers. However, existing climate hazard impact databases have limitations in terms of completeness and accuracy. While a wealth of detailed climate impact information is available from online text sources such as news websites and reports, its extraction by human experts is time-consuming, and they are thus not typically used in impact databases. The NLP group seeks to overcome this gap by producing an impacts database of climate extremes through mining of online text sources.
Proposed solution
By using their language models (LLMs) on Leonardo the NLP group at RISE will develop and evaluate an approach to extract impact data, such as place, time, number of people affected and monetary costs. The NLP group will use the open-source suite of Llama 2 models for this purpose and tuned using an annotated set of impact data.
Business impact
The supercomputing access to Leonardo will also enable the NLP Group at RISE to apply LLMs that constitute the current state-of-the-art in natural language processing. These models, such as Llama 2, have impressive language generation and information extraction capabilities, at the cost of being computationally very demanding. Systematic, task-specific tuning and evaluation therefore require supercomputing resources.
Benefits
- The supercomputing resources will enable them to load full LLMs, rather than slimmed-downed versions thereof.
- They will be able to run multiple experiment within a reasonable amount of time. Insights gained from these experiments will inform future experiments.
- Faster turnaround of experiments will improve the research productivity, the capabilities of the models developed, and, in extension, the quality of the impact data extracted.
You can get supercomputing access too!
Did you know that any small and medium company in Europe can get access to European supercomputers for free? We help you every step of the way. You can learn more here.