Artifact Evaluation at this years' Climate Informatics 2024

Submitted by D.A. Orchard on Wed, 18/12/2024 - 09:56

(blog post co-authored by ICCS members, Dominic Orchard, Marion Weinzierl, Roly Perera, and also Alejandro Coca-Castro from The Alan Turing Institute)

Capsule

In 2024, the Climate Informatics conference embarked, for the first time, on adding an Artifact Evaluation (AE) process following the standard peer review process. This provided an opportunity to embed the values of reproducibility into the publication process in a lightweight opt-in fashion, thus encouraging authors to make software available and the results of the paper reproducible. The submitted artifacts were evaluated by a team of reviewers who provided feedback to the authors to help develop their artifacts towards a higher standard of computational reproducibility.

This blog post reports on the background motivation for this endeavour and some details of the process this year.

banner-ae.png

Climate Informatics Artifact Evaluation 2024 banner

Motivation

In modern science, many communities and subfields now have software at their heart; most publications are underpinned by some novel piece of software, e.g., embodying a model, processing or analysing data, or producing a visualisation.

In order for such software artifacts to have the most impact, they should be available, functional, reusable, and understandable such that other researchers can benefit from the work, verify the claims of any publications, and build upon the software to do more science. These ideals are summarised by the FAIR principles of data, which can be applied to software: research software should be Findable, Accessible, Interoperable, and Reusable (FAIR).

The practicalities of achieving FAIR software are non-trivial, necessitating both a change in mind set and development of skills. One mechanism for promoting and encouraging these principles within a particular community is to embed some kind of formal evaluation of software artifacts within the peer-review process. This is a practice being widely deployed in various areas of computer science: authors are encouraged to submit their software artifacts alongside or after peer review, and a separate set of reviewers formally assesses the software artifacts against a number of criteria, e.g., relating to the availability, functionality, and reusability of the software. Such 'artifact evaluation' procedures provide an opportunity to embed the values of reproducibility into the publication process in a lightweight opt-in fashion, thus encouraging authors to make software available and the results of the paper at least repeatable.

Examples of existing work towards reproducible and accessible software (and related papers) in a collaboration of reviewers and authors are

the Reproducibility Initiative of the Supercomputing (SC) conference series (see, e.g., the SC24 Reproducibility Initiative);
the ReproHack initiative;
and CODECHECK.

Climate Informatics 2024

The Climate Informatics community has, for many years, promoted reproducible computational research through activities at its conferences. For example, following the 2023 edition of the conference, a reproducibility challenge was initiated. Teams of 2-4 people collaborated to create a notebook reproducing the key contributions of a published environmental data science paper, which were then integrated into the open-source Environmental Data Science (EDS) Book. This challenge resulted in three such interactive, reproducible notebooks on Deep learning and variational inversion to quantify and attribute climate change, and Learning the underlying physics of a simulation model of the ocean’s temperature, and Variational data assimilation with deep prior.

As a continuation of this philosophy and focus on FAIR and reproducible research, this year the Climate Informatics 2024 conference (CI2024) embarked on, for the first time, an optional Artifact Evaluation (AE) phase as a second part of the peer review process. The AE process was designed by the CI2024 reproducibility co-chairs Alejandro Coca-Castro (Alan Turing institute) and Dominic Orchard (Institute of Computing for Climate Science (ICCS), University of Cambridge, and University of Kent) and the CI2024 reproducibility working group comprising Marion Weinzierl (ICCS, University of Cambridge), Roly Perera (ICCS, University of Cambridge, and University of Bristol), Andrew Hyde (CUP), Cassandra Gould van Praag (The Alan Turing Institute), and Douglas Rao (North Carolina Institute for Climate Studies).

The Climate Informatics 2024 Artifact Evaluation website provides further details, some of which are summarised here.

Process

Full papers from the conference were published in a post-conference issue of Environmental Data
Science (Cambridge University Press). Authors of these full papers were invited to submit artifacts following paper acceptance; publication of the papers was not made dependent upon the artifact evaluation.

Artifact submissions required the following three parts to be included:

An artifact overview document providing the following:
1. A brief explanation of the purpose of the artifact and how it supports the paper;
2. Hardware requirements to evaluate the artifact;
3. A 'getting started' guide;
4. Step-by-step instructions on how to evaluate the artifact with respect to the (relevant sections) of the paper, e.g., how to reproduce any figures or conclusions, and which claims of the paper are supported by the artifact.
5. A reusability guide explaining how others may reuse the artifact.
A non-institutional URL (or URLs) to the code and relevant data (e.g., Google Drive, Zenodo, Github).
A checksum (hash) to certify the version of the artifact has been downloaded correctly.

Authors were not anonymous (since the accompanying papers had already been presented at CI2024), but the reviewers' allocated to artifacts remained anonymous.

The artifacts were then evaluated against three categories of criteria: Available, Functional, and Reusable. We borrowed heavily from the ACM (Association for Computing Machinery) artifact reviewing and badging guidelines (v1.1).

In summary of the criteria:

Available artifacts should have publicly available and relevant code and data, with a unique identifier to the relevant version of the artifact (e.g. a DOI), and an open-source license.
Functional artifacts are documented, consistent with the paper, complete to the extent possible with respect to the paper's claims, and exercisable, i.e., they can be run by others to generate results.
Reusable artifacts exceed the minimum functionality requirements above but are also well-structured and well-documented so that others could reuse and repurpose the artifact for their own work.

Artifacts satisfying the majority of the requirements for a category were then awarded a corresponding 'badge' to be displayed in an addendum to the published paper in EDS.

Authors were permitted to iteratively improve their artifacts during the process, responding to comments provided by reviews, e.g., to address any small technical problems. However, new material that would require substantially more reviewer effort was not permitted after the initial submission.

Timeline and Outcome

Plans for the Artifact Evaluation procedure were announced at the Climate Informatics 2024 conference in April 2024 (London). A subset of the reproducibility working group also took part in a panel on reproducibility during the conference to capture the overall motivation behind the process.

gl2szz_w0aauq8q.jpeg

Panel discussion "Perspectives on Practical Reproducibility in Climate Science"

After the conference, the authors of accepted full papers were invited to submit to the Artifact Evaluation process. An information and Q&A session for authors about the AE process was held online on the 20th May.

A call for reviewers was also extended to the network of the reproducibility working group to form the review committee. A reviewer information session was held in late September. The reviewer committee comprised:

James Emberton (ICCS, University of Cambridge)
James Robinson (The Alan Turing Institute)
Etienne Roesch (University of Reading)
Bryn Noel Ubald (British Antarctic Survey)
Alexandra Udaltsova (Open Climate Fix)

Reviewing was carried out using the HotCRP review management tool.

The artifact deadline was October 4th, followed by 5 weeks of reviewing. The initial 2-3 weeks of reviewing was designated specifically as a 'kicking the tires' phase in which the authors were expected to respond to any requests or questions from the reviewers, e.g., if there were problems building or running the artifacts.

Final decisions were intended to be agreed by the reviewers and sent to the authors on Friday November 8th. However, reviews and comments were collated and sent on this date, and authors were given a final week to respond to any comments and make changes accordingly. The reviewers then assessed the final versions from the 15th November, with discussions and decisions finalised 28th November and sent to the authors.

By having an interactive process, the reviewers were able to spread good practice and to provide guidance to the authors on how to develop their software with higher levels of computational reproducibility.

Two artifacts were submitted (out of the 13 accepted full papers). Both were awarded Available and Functional badges, to be publicly announced soon alongside the publication in the Environmental Data Science journal.

Reflections from the working group

Given that this was the first time the community had embarked on such a procedure, we believe the experiment was a success. Whilst the number of submissions was modest, this was not unexpected and tallies with experiences from other communities where initial uptake has been low, but with increasing participation over time as the value is seen in the community, the benefits are more widely known, and the process is tightened up.

Now that we have a template, future iterations could be more easily deployed. A stronger focus on the timeline and the expectations on the authors and reviewers alike during that timeline would lead to a tighter process in the future that can be run more efficiently. The two artifacts accepted this round could serve as examplars for those authors who are unsure of how to proceed. Templates for the artifact overview may also help encourage authors and ensure they are providing sufficient information to the artifact reviewers. Some additional training for authors may help overcome any barriers, raising confidence in how to produce an artifact evaluation submission.

Capsule

banner-ae.png

Motivation

Climate Informatics 2024

Process

Timeline and Outcome

gl2szz_w0aauq8q.jpeg

Reflections from the working group

About Us

Check out our YouTube channel

Contact us

Social media

Study at Cambridge

About the University

Research at Cambridge