Submitted by L.L. Grant on Wed, 26/06/2024 - 15:48
The following blog is written by Jack Atkinson. Jack is a senior research software engineer at the ICCS who spent two weeks in Boulder, Colorado, at NCAR, working closely with model developers in an effort to make a breakthrough in an ongoing project. The visit also gave Jack the opportunity to share his own expertise and details of ICCS projects with scientists there, to help them in their own work.
The lab at Table Mesa has been a personal Mecca ever since I began working in the atmospheric sciences. The iconic building, situated where the foothills of the Rockies meet the great plains, is beautifully captured by Julie Leidel's artwork [see image below].
The National Centre for Atmospheric Research (NCAR) is home to the Community Earth System Model (CESM) used to run state-of-the-art simulations of the Climate system and is what brought me there.
Our work at ICCS involves collaborating with several projects/users of CESM and developing tools that are of interest to NCAR scientists and developers.
The first objective of my visit was to make progress on the CAM-ML project from M2LInES ; coupling a machine-learnt model for deep convection trained from high-resolution convection-resolving simulations to the lower-resolution CAM (Community Atmospheric Model). Whilst we had made progress, communication with CESM developers had been challenging, particularly as we entered the coupling phase of the project. This was an opportunity to have shoulder-to-shoulder contact for an intense period of work and get the model running (instead of crashing) within CESM.
My second goal was to follow up on preliminary work to incorporate the FTorch library into CESM.
FTorch is software developed by ICCS to make it easier for researchers to couple pyTorch machine learning models to their numerical simulations, reducing effort whilst improving reproducibility. It has already been successfully used for research with the MiMA and ICON atmospheric models as part of the DataWave VESRI project and beyond (Mansfield and Sheshadri (2024), Heuer et al. (2023))
This discussion began as an ICCS Code Clinic with NCAR and M2LInES scientist Will Chapman who was interested in using FTorch in an effort to improve computational performance with easier setup. In addition to being useful to Will we are also planning to use this tool in another DataWave project coupling gravity wave parameterisations trained in the high-resolution WRF model into CAM.
After a chilly start to the week I met with Will who was an excellent host (thanks again for everything!). After showing me to the first of my offices (I had 2!) we sat down to tackle building FTorch on Derecho (the new High Performance Computer (HPC) at NCAR) using the software that the HPC admins had installed in advance of my arrival. After a few teething problems we successfully managed to build, and run the simple examples.
Later in the week I managed to meet with our CAM-ML collaborator Judith Berner. Sitting down with her and Will to trawl through the CESM source code together allowed us to make a great deal of progress. This level of communication, and increasing my understanding of the nuances of the CESM model (shoutout to the physics buffer, ptend routines, and hidden deallocations) was incredibly beneficial.
As my work continued I found it incredibly useful to be able to just wander across the building and speak to the person who wrote or maintains the code I had questions about. This sort of communication with other members of the Cambridge team was also really beneficial in my early days as a research software engineer and something I always encourage for new members.
Having successfully run FTorch standalone on Derecho, the next step was to make it more robust and integrate it properly with CESM.
After a useful meeting with lead software engineer Jim Edwards, Will and I we were able to build on his work to create an ICCS Fork of the CESM build system that makes FTorch directly available to users when they build CESM.
This is now being used in our DataWave project, and Will's work on nudging tendencies to reduce model biases. By publicising this to the wider communities we hope that scientists using CESM (any component) can more easily interface pyTorch ML with the model to speed up research and time-to-science. This sort of opportunity to have an impact across climate modelling and computing is what attracted me to ICCS in the first place.
You can learn more about this in the slides from my recent talk at the PASC24 Conference.
Another benefit of being able to talk with Jim - the author and maintainer of the CIME build system - was that he could answer many of my questions about porting CESM to new machines, and then discuss the follow-up questions that arose.
I have been wanting for some time to port CESM to CSD3 (the Cambridge HPC) as this will allow us to perform development without being dependent on collaborators having access to Derecho, or other suitable machines. It also means that ICCS can provide CESM access to all VESRI researchers and projects especially those without access to these machines - something I know the FETCH4 project is interested in. Thanks to Jim I was able to make progress running the preparation parts of CIME that had previously perplexed me. I'm now tackling compiler/compatibility issues, but these are a task for me, Kacper, and CSD3!
Later in the week a project meeting for CAM-ML revealed that we had actually been modifying the wrong part of the CESM source.
However, my greatly improved understanding of the code achieved in the previous week allowed me to efficiently pivot to achieve a coupling that could be run from the correct place, piggybacked to the existing parameterisation.
Whilst the numbers and stability were not where we wanted them to be, this was a big step setting me up for the process of debugging on my return to Cambridge.
[I note, whilst finalising this article in May, that following this we now have the CAM-ML parameterisation running stably in our CAM test-case producing plausible results!]
Outside of the coding, another great benefit to my visit were the many useful lunches, coffees, and beers I had to organically meet various people. It was great to hear about and discuss work at NCAR all the way from hardware decisions and procurement up to development of a 'truly' open-source AI numerical weather forecasting model.
One of these such meetings has already delivered, with David John Gagne visiting ICCS in May to deliver a talk and discuss our shared interests and potential future collaborations.
These sorts of discussions are something I love about this work and something that the environment of Cambridge is especially geared towards. Sitting down to chat with a diverse array of people from completely different fields who have answers to problems you didn't even know you had is something our partnership with Queens' College is particularly useful in facilitating.
A key takeaway from my visit was that, for me, there really is no substitute for 2-3 people being sat around one screen or the ability to wander to someone's office to ask them a 5 minute question over coffee. I'm looking forward to the X-VESRI meeting and summer school later this year for more of this style of working.
As a footnote I was also incredibly lucky to have the Easter weekend to hike the flatirons before exploring Boulder and Nederland.
References:
Heuer, Helge, Mierk Schwabe, Pierre Gentine, Marco A Giorgetta, and Veronika Eyring. 2023. “Interpretable Multiscale Machine Learning-Based Parameterizations of Convection for ICON.” arXiv Preprint arXiv:2311.03251.
Mansfield, Laura A, and Aditi Sheshadri. 2024. “Uncertainty Quantification of a Machine Learning Subgrid-Scale Parameterization for Atmospheric Gravity Waves.” Authorea Preprints.