Summer School course abstracts and prerequisites
Jump to:
Introduction to Git and GitHub for Beginners
Generative AI for Software Engineering
Python Bindings to Compiled Languages
Practical Machine Learning with PyTorch
Green Software Engineering Practices
AI, Climate and Ethics Panel Session
RSE Skills
Description
Much of the code used in research is written to a base standard to achieve an immediate goal. Further, it is often written in a fluid style as the research develops. Whilst this is fast in the short-term, it does not lend well to re-usability by others (or even the future author!) or to well-written and structured code.
The aim of this session is to introduce basic key tools and concepts of research software engineering, and how they can be applied in everyday use to write higher-quality code, reduce bugs, and facilitate re-use. The workshop will be taught using Python for familiarity, but the concepts map to other languages with pointers to equivalent tools where relevant.
Prerequisites
- A working installation of Python 3. This should come as standard on Linux and can be installed on Mac and Windows
- A working installation of pip for installing Python packages
- Basic programming skills: the ability to read and follow Python code, and an enthusiasm to learn better practices.
Introduction to Git and GitHub for Beginners
Description
This course teaches the general concept of version control as well as basic Git commands such as git init, git add, git status and git commit. It provides an introduction into using remote repositories on GitHub, git clone (and when to prefer git fork), pushing new content to the remote with git push, and how to create a pull request.
Prerequisites
In order to follow along with the course the following has to be set up beforehand:
- Having git locally installed on your machine
- Having a GitHub account with MFA set up
- Having a SSH key associated with your GitHub account for authentication on your local machine.
Intermediate Git
Description
The key objective of this workshop is to provide you with knowledge of some of the higher-level functionalities within Git beyond basic usage.
We will achieve this through a presentation and accompanying hands-on exercises. You will work on the code enclosed in this repository, at each step utilising a new concept from git to make progress. We will present concepts first as theory before then applying them to the codebase in practice.
With regards to specific content we cover:
- A deeper understanding of how Git functions under the hood
- A quick recap of branching in Git
- Use of patched commits and amending
- Use of Git's stash feature
- Rebasing
- Merge conflicts and resolution
- Bisect to locate issue introduction points.
Prerequisites
To get the most out of the session we assume a basic understanding in a few areas and for you to do some preparation in advance. This expected knowledge is outlined below, along with resources for reading if you are unfamiliar with any areas.
- We assume users are familiar with basic use of Git Commands such as add, commit, push, pull, and have a basic knowledge of creating branches. If these are unfamiliar we suggest attending or reviewing the ICCS introduction to Git repository and recording.
- The exercises will make use of a Python codebase, but no specialised knowledge is required.
- We assume users are familiar with the basics of Python, modular code, writing functions, and running scripts.
- For detailed preparation guidance, see the README in the course GitHub repository.
Differentiable Programming
Description
Derivatives are at the heart of scientific programming. From the Jacobian matrices used to solve nonlinear systems to the gradient vectors used for optimisation methods, from the backpropagation operation in machine learning to the data assimilation methods used in weather forecasting, all of these techniques rely on derivative information.
Differentiable programming (also known as automatic/algorithmic differentiation (AD)) provides a suite of tools for users to compute derivatives of quantities in their code without any manual encoding. In Session 1, we will learn about the history and mathematical background of differentiable programming and consider examples using the Autograd AD tool. In Session 2, we will learn about more advanced topics and consider examples using the JAX differentiable modelling framework.
Prerequisites
- Undergraduate level knowledge of linear algebra and calculus
- Basic knowledge of Python
- A GitHub account.
Generative AI for Software Engineering
Description
This training session is designed to equip research software engineers with a practical understanding of generative AI fundamentals and hands-on tooling skills.
The course begins with the basics of machine learning (loss functions, gradient descent, backpropagation) before diving into how large language models work under the hood – covering tokenisation, embeddings, the transformer inference pipeline, self-attention, and autoregressive generation. It then explains agentic AI, showing how LLMs are wrapped in scaffolding to reason, call tools, and maintain stateless "memory" via context windows.
The practical second half transitions to tooling, teaching participants to configure and use opencode (a CLI AI assistant), build custom MCP servers for tool calling (with a NetCDF example), create reusable Agent Skills via SKILL.md definitions, and set up specialised sub-agents – all with hands-on exercises throughout.
Prerequisites
- No technical knowledge of ML or AI/LLMs is assumed
- No programming knowledge is required
- Familiarity with terminal/bash will be helpful
- A laptop
- A litellm key (which will be issued by us on the day)
Introduction to HPC
Description
High Performance Computing (HPC) enables researchers and professionals to solve complex, data-intensive problems by harnessing the power of parallel processing. This session introduces the fundamentals of HPC, including its architecture, applications, and how to effectively access and utilise HPC systems for accelerated research and innovation. This session will include a number of practical hands-on exercises.
Prerequisites
- Charged laptop with web browser
- Installed SSH client.
Python Bindings to Compiled Languages
Description
Python is the backbone of modern scientific software. We have all seen it and used it. We are also vaguely familiar with the fact that its strength comes from tying together numerous highly optimised and powerful libraries that are often written in compiled languages such as C, C++, or Fortran. This is what lies behind our daily imports, such as numpy, torch or jax.
The goal of this course is to give you a glimpse of what interaction between Python and compiled languages looks like, with the broader aim of giving you a starting point if you wish to bind your own code to Python. To achieve this, we will cover a simple "Hello, World" example of calling C++ code from Python, both directly using the Python C API and in a more practical way using the pybind11 library.
Prerequisites
- Prior experience with Python is assumed. You should be comfortable writing and reading Python code
- Basic familiarity with C++, but no expert knowledge is required. Just enough to read C++ code will be sufficient
- A laptop and a GitHub account. The course will use GitHub Codespaces as the development environment.
Correctness and Testing
Description
Many of us will have had the experience of bugs in our code, that is, mistakes that impact the intended function and functioning of our software. Such mistakes slow down development, impinge on collaboration, reduce the likelihood of our code being used by others, and in the scientific context can lead to serious mistakes in publications.
Approaches to software verification are therefore useful to help reduce the occurrence of bugs and assess whether code implements its intended specification or model. One well-established lightweight technique for evaluating software correctness is testing, where additional code is written that provides a partial specification of program behaviour.
This workshop studies the foundations of software testing, including the use of tools to automate the deployment of tests. It specifically looks at the mechanics of, and best practices for, three kinds of tests: unit tests, integration tests, and property-based tests. The first two are more widely deployed already in science whereas the third technique (property-based testing) is still an underutilised, but powerful, tool.
Python is used for running examples and exercises, with the pytest framework, but the concepts can be applied in almost any programming language. We will point to resources for a few other languages popular in science such as Fortran. We assume that the attendees have some programming skills, but are not necessarily Python experts. There is some emphasis on scientific computing, with the example being a simple 0D Energy Balance Model (EBM). It would therefore be beneficial if you have some experience in this field, though not strictly necessary.
Prerequisites
- Basic programming knowledge
- At least beginner experience in Python, e.g., understanding of: basic mathematical operations; writing and running scripts/program; writing and using functions.
Practical Machine Learning with PyTorch
Description
The key learning objective from this workshop could be simply summarised as: Provide the ability to develop ML models in PyTorch.
However, more specifically we aim to:
- provide an understanding of the structure of a PyTorch model and ML pipeline,
- introduce the different functionalities PyTorch might provide,
- encourage good research software engineering (RSE) practice, and
- exercise careful consideration and understanding of data used for training ML models.
With regards to specific ML content we cover:
- using ML for both classification and regression,
- artificial neural networks (ANNs) and convolutional neural networks (CNNs)
- treatment of both tabular and image data.
Prerequisites
To get the most out of the session it would be helpful to review:
- Python3 (numpy, pandas, matplotlib)
- Maths (calculus, matrix algebra, regression)
- Neural networks: See YouTube series by 3blue1brown, chapters 1-3.
FTorch
Description
Many projects seek to leverage machine-learnt components as part of larger models in a technique often known as "hybrid modelling". This often brings about the challenge of language interoperation - how can we run our ML models, often PyTorch-based, from within large-scale climate models written in compiled languages?
ICCS develops and maintains the FTorch library to facilitate the easy deployment of PyTorch-based models within Fortran codes. In this session we will introduce the library and follow a hands-on tutorial introducing key features, and going through how to save a net from Python and run it as part of a Fortran code. If time allows we will showcase recent features around autograd and online training.
Prerequisites
- A GitHub account
- Some previous experience with PyTorch and machine learning is useful but not essential
- Previous exposure to Fortran or a similar compiled language is useful but not essential.
PyTorch for Climate Modelling
Description
By the end of this practical, participants should be able to:
- Explain spatial downscaling and why it matters for climate applications
- Describe the U-Net architecture for image-to-image regression
- Prepare geospatial fields for CNN training in PyTorch
- Train and evaluate a U-Net for 1° to 0.25° temperature downscaling
- Explore extension ideas beyond the baseline model.
Prerequisites
Participants should download data before attending. CDS downloads can take significant time depending on queue/server load; doing this during the class can consume most of the hands-on session.
Random Forests
Description
Decision trees, random forests, and related models are commonly used machine learning models due to their relative ease of implementation and training, however they are similarly often misused. The key learning objective here is to understand the fundamentals of random forests and related tree models from first principles, their strengths and weaknesses, and methods for understanding their sensitivity.
In the first half of the session, we will look at decision trees, random forests, and related models along with their formulation, training, and interpretability. This will also cover topics including, but not necessarily limited to, feature permutation, platt scaling, and class imbalance. In the second half, we will look at the practical application of random forests along with other models, including model implementation, model appropriateness, and application to real world datasets.
Prerequisites
- Undergraduate level knowledge of linear algebra, calculus and probability theory
- A working Python 3 installation
- Download and install the scikit-learn library
- Basic Git and GitHub
- Practical Machine Learning with PyTorch session.
Collaboration with GitHub
Description
Version control (e.g. Git) is commonly used to back up research code, often to a remote repository on GitHub or GitLab. Functionalities extend far beyond this, however. Effective use of a version control system can reduce bugs, enhance collaboration, and speed up research.
In this session we explore how to use GitHub not only as a place to store your work, but also the additional tools and workflows it provides to aid software development and enhance collaboration. We will discuss preparing repositories for collaboration, as well as collaborative project practices such as issues, branches, pull requests and code review.
Prerequisites
The course will use some Python code, though an expert understanding is not required. Familiarity with basic Git commands (committing, pushing, pulling) and the ability to clone a repository from GitHub is expected.
See the course materials for further information.
Reproducibility in Computing
Description
The key learning objective from this workshop is to raise the awareness of the importance of software reproducibility, and give the participants an understanding and basic tools to improve it. We will do a whistle-stop tour through the basics of scientific software reproducibility, and touch on topics such as Version Control, READMEs, Licenses, Automation, Testing and the FAIR principles, and how those apply to software reproducibility. Finally, we will also talk about various initiatives promoting research and software reproducibility.
Prerequisites
To get the most out of the session we assume a basic understanding of the research process, programming in Python and research software. The ICCS RSE Skills workshop gives you an overview of many of the topics mentioned in this course.
Preparation
It is helpful but not mandatory to do a mini ReproHack before the session to get a better understanding of the problem of software and research reproducibility.
If you want to follow along with the exercises, basic Python coding skills and a Python development environment are required.
Introduction to Dash
Description
The key learning objective from this workshop is to provide participants with the skills to build interactive web-based dashboards using Plotly Dash.
Specifically, by the end of this session you will be able to:
- Understand the architecture of a Dash application (layout + callbacks)
- Build layouts using Dash HTML and Core Components
- Create interactive callbacks that respond to user input
- Integrate Plotly figures into a Dash app
- Structure a multi-component dashboard for exploring data
Prerequisites
To get the most out of the session we assume:
- Basic familiarity with Python (running scripts, installing packages, using functions)
- The ability to use a terminal/command line
- Familiarity with git clone to obtain the repository – if you are new to Git, the ICCS Summer School Git intro provides the necessary background
- Basic knowledge of HTML is a plus.
Green Software Engineering Practices
Description
Research software engineering activities inherently involve use of computational resources. That is, programs run on computers, servers and other pieces of hardware, which may be hosted on a machine in your office, in a HPC centre, or in the cloud.
Computational resource usage can be measured in terms of the corresponding energy consumption and - with some additional data on electricity generation – it’s possible to estimate the associated carbon emissions.
Green Computing is focused on measuring such computational resource usage and finding ways to reduce it to lower the carbon footprint of your work. We will explain and demonstrate several useful approaches and tools for good green software engineering practices, including those related to:
- Green computing for testing and debugging
- Measuring the carbon emissions associated with HPC jobs
- Green use of AI.
Prerequisites
Basic knowledge of Python.
AI, Climate and Ethics Panel Session
Description
Closing the ICCS summer school, the AI, climate and ethics plenary panel session will convene leading experts in AI and environmental sustainability to critically examine the ethical implications of deploying AI in addressing global climate challenges.
Confirmed panellists include:
Dr Gabby Samuel, Lecturer in the Department of Global Health & Social Medicine, Kings College London. Dr Samuel is an internationally acclaimed social scientist and ethicist whose work explores the complex intersections of technology, health, and the environment.
Ms Amelie Sophie Berz, PhD candidate at the University of Oxford. Ms Berz’s research examines the tort of negligence in the use of AI and the impact of deploying high-performing intelligent agents on the standard of care.
Dr Loïc Lannelongue, Assistant Research Professor in the Department of Public Health and Primary Care, University of Cambridge. Dr Lannelongue leads the Cambridge Sustainable Computing Lab, which examines the environmental impacts of computing technologies.
The session will be chaired by Dr Simon Driscoll, ICCS Early Career Advanced Fellow and the discussions will focus on themes such as the hidden costs of AI, questions on equity and accountability and potential pathways towards responsible and sustainable innovation.
Mini Projects
The Summer School includes two hands-on mini-projects sessions which offer an open-ended, collaborative space to apply what you’ve learnt in real-world scenarios. Teamwork is highly encouraged, and our instructors will be on hand to support you!
There are a range of options to choose from:
ICCS Projects: Choose from a selection of open-ended challenges designed by ICCS. Form teams and get hacking on the suggested projects
Workshop Follow-up: Use the time to complete unfinished exercises from earlier in the week with tutor support
Bring Your Own Work: Apply the workshop skills directly to your own research projects with expert help on hand, or propose your own mini-project for the session.
Whatever you choose, you should aim to actively use the good software engineering principles that you have learnt over the course of the week in your work.
Code Clinic
Code Clinics are bookable on-demand software consultation/support sessions run by the ICCS RSE team.
At each session, members of the RSE team will be available to review code, advise, troubleshoot and suggest ways to improve your computational workflows. The sessions are open to all Summer School participants who are seeking help on climate related programming problems and general advice on best practice.
Examples of code clinic requests include:
- Advice on best practices/approach
- Coding problems and debugging
- Performance profiling & optimisation
- Version control and GitHub workflows
- Testing and documentation
- Project scoping and resource allocation.
We will be able to look through code with you during the session but will be unable to spend time on it outside of the help sessions. The more you do to prepare the more you will get out of the time you spend with our RSEs.
Book a Code Clinic session here. Please ensure you also DM the RSE(s) on Slack when you have booked.
To ensure that you get the most out of your session, participants are advised to sign up in advance with as much notice as possible.