This morning at the EGU meeting, Dr. Amy McGovern presented the latest work on our ExtremeWeatherBench (EWB) project on behalf of a Brightband team including Taylor Mandelbaum and Daniel Rothenberg. This presentation coincides with some significant new milestones on the EWB project:
- ExtremeWeatherBench manuscript live on arXiv, featuring comprehensive evaluations comparing IFS-Ens, AIFS-Single, GraphCast, and Pangu-Weather
- Updated release of the ExtremeWeatherBench open-source toolkit with its own DOI and comprehensive documentation
- Supporting MLWP forecast model archive on the Earthmover Marketplace and Google Cloud Storage (in Zarr + Icechunk format)
EWB was created to address a challenge and opportunity in the general development of AI-based weather forecasting tools. Most of the headline benchmarks the community tracks (such as the 500 millibar geopotential anomaly correlation coefficient and the 850 millibar temperature root mean square error) are abstract and disconnected from the actual forecasts that end users consume on daily. What most users really care about is how the weather affects them personally on a local basis – especially if that weather could turn severe.
To help shed light on this challenge, EWB targets measuring the consistency and quality of AI weather forecasts used to anticipate extreme weather. We’ve compiled a collection of over 300 such extreme weather events between 2020 and 2024, divided across heat waves, freezes, severe convective outbreaks, atmospheric rivers, and tropical cyclones. In the EWB manuscript, we present evaluation of these events across a selection of AI and NWP forecast models – and we provide turnkey code and input data to reproduce them on your own.

A unique aspect of EWB is that we’ve endeavored to present a truly global assessment of weather phenomena. Impactful, extreme weather happens everywhere - not just in the global north. We’ve made a concerted effort to include significant weather events from the tropics as well as the southern hemisphere. One of the ways we’ve been able to accomplish this is by building an extremely flexible pattern for defining such weather events within the core EWB open-source library; it’s designed from the ground-up to be extensible so that the community can run with it and add both new cases of extreme weather as well as completely new event types, integrating existing and purpose-built evaluation metrics.

Releasing EWB to the public is just one part of the commitment that Brightband is making to ensure that AI weather forecasting tools are available to all. We’re preparing an interactive scorecard that will allow the community to assess how different AI weather models performed on some of the highest profile events in EWB (following the example of the WeatherBench scorecards). Contributions to this scorecard will be open to anyone developing or running AI or NWP weather models, and we’ll continue to add models that we run in our own operational suite. We also plan to extend EWB to the world of probabilistic forecasting to better support the current generation of AI models which are trained to produce calibrated ensemble forecast distributions.