ICCS software engineers are making waves in the field of climate science, combining machine learning with climate modelling

Submitted by Aleksandra Higson on Wed, 20/09/2023 - 15:13

In an International Collaboration of the Virtual Earth System Research Institute, ICCS Research Software Engineers (RSEs) Jack Atkinson, Athena Elafrou, Simon Clifford and Tom Meltzer partnered with Minah Yang, Laura Mansfield, Dave Connelly, Aditi Sheshadri, Qiang Sun and Ed Gerber of DataWave to optimise a PyTorch-Fortran coupling of a machine learning gravity wave parametrisation to the MiMA climate model.

We sat down with Dr. Jack Atkinson to talk about his latest project with DataWave.

Jack is a research software engineer who works at the Institute of Computing for Climate Science (ICCS). Having enjoyed maths and science from a young age, he studied Engineering at the University of Cambridge, specialising in fluid mechanics for his Masters. He went on to complete an engineering PhD followed by Postdoctoral Research on volcanic plumes. After working as a Radiation Belt Scientist at the British Antarctic Survey and developing a forecast system at the Met Office, Jack took on the role of an RSE where he is involved in several projects, and has recently completed a collaboration with DataWave.

What’s the main challenge of this project?

We can think about climate models as being made up of several constituent parts. You typically have a dynamical core, which calculates the flow of the air, how water moves and temperature etc. Coupled to that, you have separate modules describing smaller processes like convection and turbulence. What we're doing in this project is looking at gravity waves. This project unplugged the gravity wave section of the model, developed a deep-learning emulation, and plugged this new, and hopefully more efficient version back into the main model. Importantly, the gravity wave model was developed in Python using the PyTorch machine learning framework, but the main model is written in Fortran. This complicates the issue because it becomes a translation issue when coupling two units written in different coding languages. That's the main challenge.

What is Datawave trying to do, more specifically?

DataWave is looking at gravity waves, which are waves travelling in the atmosphere. On a simple level, you can think of them as blobs of air moving up and down in the atmosphere. If the blob is more dense than the air around it, it will drop, moving to areas where the air is more dense. And so, the blob of air is less dense than its surroundings and buoyancy causes it to move back up. It wants to be at balance, but overshoots which creates an oscillating pattern. In reality it's a bit more complicated than that, but they're generally waves driven by buoyancy that travel within the atmosphere.

Why are gravity waves important?

Gravity waves can carry energy, or momentum from one place to another. The waves can be generated by airflow over mountains, or where weather fronts collide as one of them sinks and the other one lifts. Often you find that energy is transported upwards, and the waves break high up in the atmosphere to deposit that energy. This influences flows in the stratosphere that can have an effect on the weather.

Gravity waves are particularly important in terms of driving the quasi biennial oscillation, or QBO. This is a pattern of strong winds that flow in the stratosphere. They’ll blow from East to West and then approximately two years later, the system reverses, continuing on like this every two years. It has important implications for the Jetstream which can lead to either colder or milder winters in the US, UK and it also has effects on the South Asian monsoon.

How is DataWave incorporating gravity waves in their climate models?

DataWave is looking at developing a few different approaches to improve the modelling of gravity waves, because they're important to climate models. It’s important to replicate how they interact with the system in real life, but it’s quite complicated so they needed to use a parameterisation. This is a method to represent processes using mathematical parameters to emulate the effects of a phenomenon like gravity waves without actually having to include all the detailed processes in our models.

DataWave is using the MiMA model, which is a fairly simple model of the atmosphere. They wanted to use a machine learning parameterisation published by Zac Espinosa. The problem was that when they coupled it to the atmospheric model (using an external library called forpy) it was very, very slow, even slower than the original parameterisation which meant that there wasn’t really any benefit to it. So they had approached us within the ICCS to see if we could improve the coupling process to a point where gravity waves are a very simple parameterisation.

How did you tackle the challenge of making the code run faster?

The machine learning code was written in Python making use of the popular and highly functional machine learning libraries PyTorch and TensorFlow. However, the MiMA model is written in Fortran, which is a much more efficient language for running on high performance computers and most climate models use it. However, this means your machine learning computations are done in Python, and that information needs to be passed back to the big model in Fortran and so on and so forth.

We asked ourselves, how can we make this faster? PyTorch provides an API, or Application Programming Interface written in C++. It allows you to access all of the functionalities of Pytorch, without having to use Python at all. So we took the existing C++ interface, and then developed a Fortran library that wraps around that. So to you, it looks like you're calling a Python function from Fortran. But really, you’re just using a C++ function from the underlying Torch.

The advantage of C++ is that rather than copying all the data over to Python for the machine learning computations, you can store the data in memory. Then you can tell Fortran, hey look, the data is over here. Fortran can look at it and pretend it's a Python object. It's called a no copy transfer which is much cheaper than copying large datasets. The only thing is that C++ and Python read data in rows whereas Fotran reads the numbers off by the columns. We therefore added some functionalities that tell the C++ and Pytorch how to read the different sections from the memory storage. That was another efficiency boost.

What was something that surprised you?

When we first tested our coupling approach we found that it was three to four times faster than the Tensorflow neural network. However, we found that the original forpy-coupled PyTorch was actually a little bit faster than what we had implemented, which we didn't expect. It turned out that there was some clever code hidden away that makes it quicker to access the data when using forpy. It was only because our approach wasn’t immediately faster that we dug down into the underlying code to find out why. We later used those insights to make our code even faster.

What advice would you give yourself if you were to do something like this again?

In general, you need to test things as much as possible. If things are working, I think it would be beneficial to spend more time growing the examples and explanations alongside the code. On the flip side, we were presenting this at the Climate Informatics conference and we were still trying to make our code faster than the original PyTorch model. This put pressure on us to get some good results. Without that time pressure maybe we could have looked after things a little bit more carefully. In the end it was alright that the full documentation came a bit later and we made sure to include good examples to follow.

Why is it important to communicate this project to the wider science community?

For me, it's important that we're getting out and communicating about the project because this is a thing a lot of people will be wanting to do. And, like I said, there's a few different approaches. Some people have used clunky add-ons to define things in Python using Fortran code. Others are re-writing everything in Fortran, which isn't really ideal either. Telling people we've developed this and that it’s publicly available as open source software is great, but we want to stress that we do welcome people asking for help. If people want to use this and have questions, we'll try and help them with it. That’s also why I think it’s important to present this at conferences.

Find out more about Jack Atkinson and get access to slides presenting this work at the 2023 RSE conference here. This article was written about our collaboration with DataWave. Find out more about their work here.

About Us

Check out our YouTube channel

Contact us

Social media

Study at Cambridge

About the University

Research at Cambridge