This event has passed.

High-Performance Data Analytics with Python (Online)

Name: High-Performance Data Analytics with Python (Online)
Start: 2025-01-21T09:00:00+01:00
End: 2025-01-23T12:00:00+01:00

Jan 21 • 09:00 – Jan 23 • 12:00 CET

About the course

Welcome to the online workshop on High Performance Data Analytics in Python on Jan. 21-23 (2025). Python is a modern, object-oriented, and industry-standard programming language for working with data on all levels of the data analytics pipeline. A rich ecosystem of libraries ranging from generic numerical libraries to special-purpose and/or domain-specific packages has been developing using Python for data analysis and scientific computing.

This three half-day online workshop is meant to give an overview of working with research data in Python using general libraries for storing, processing, analyzing and sharing data. The focus is on improving performance. After covering tools for performant processing (netcdf, numpy, pandas, scipy) on single workstations the focus shifts to parallel, distributed and GPU computing (snakemake, numba, dask, multiprocessing, mpi4py).

Who is this workshop for?

High-Performance Data Analytics in Python is for all researchers and engineers who work with large or small datasets and who want to learn powerful tools and best practices for writing more performant, parallelised, robust, and reproducible data analysis pipelines. This workshop is an interactive online event, featuring live coding, demos, and practical exercises. We aim to equip you with the tools and knowledge to write efficient, high-performance code using Python.

Prerequisites

Basic experience with Python
Basic experience in working in a Linux-like terminal
Some prior experience in working with large or small datasets

Key takeaways

After attending the workshop, you should:

Have a good overview of available tools and libraries for improving performance in Python
Know what libraries are available for efficiently storing, reading and writing large data
Be comfortable working with NumPy arrays and Pandas dataframes
Be able to explain why Python code is often slow
Understand the concept of vectorisation
Understand the importance of measuring performance and profiling code before optimizing
Be able to describe the difference between “embarrasing”, shared-memory and distributed-memory parallelism
Know the basics of parallel workflows, multiprocessing, multithreading and MPI
Understand pre-compilation and know basic usage of Numba and Cython
Have a mental model of how Dask achieves parallelism
Remember key hardware differences between CPUs and GPUs
Be able to create simple GPU kernels with Numba

	Time	Contents
Day 1 (Jan. 21)	09:00-09:10	Welcome
	09:10-09:20	Motivation
	09:20-10:00	Scientific data
	10:00-10:20	Break
	10:20-11:00	Efficient array computing
	11:00-11:20	Break
	11:20-12:00	Efficient array computing

Day 2 (Jan. 22)	09:00-09:40	Parallel computing
	09:40-09:50	Break
	09:50-10:20	Parallel computing
	10:20-10:40	Break
	10:40-11:20	Profiling and optimizing
	11:20-11:30	Break
	11:30-12:00	Profiling and optimizing

Day 3 (Jan. 23)	09:00-09:40	Performance boosting
	09:40-09:50	Break
	09:50-10:20	Performance boosting
	10:20-10:40	Break
	10:40-11:20	Dask for scalable analytics
	11:20-11:30	Break
	11:30-12:00	Dask for scalable analytics

More events & contact

Check out more upcoming events from ENCCS and our European network at https://enccs.se/events.

For questions regarding this workshop or general questions about ENNCS training events, please contact training@enccs.se

Schedules can change!

To ensure that everyone has the opportunity to participate, we kindly request that you let us know as soon as possible if you are unable to attend an event after registering.

Please send us an email at training@enccs.se to cancel your attendance.

We understand things can change, but repeated cancellations without notice may unfortunately result in your name being removed from future event registration lists.

Regulations

Due to EuroCC2 regulations, we CAN NOT ACCEPT generic or private email addresses. Please use your official university or company email address for registration.

This training is for users who live and work in the European Union or a country associated with Horizon 2020. You can read more about the countries associated with Horizon2020 HERE.

ENCCS

Related Events

Event Navigation

This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 951732. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and its associated countries .