As part of its ongoing collaboration with the NOAA Physical Sciences Laboratory and with support from the NOAA Open Data Dissemination (NODD) Program, Brightband is announcing the official launch of the NNJA-AI dataset. This dataset is immediately available to any and all users interesting in playing with a diverse collection of weather observations.
Comprised of over 30 TB of data from more than a dozen senors spanning decades of observations, the NNJA-AI dataset builds on top of the NOAA-NASA Joint Archive of Observations for Earth System Reanalysis by re-processing it into an AI-ready, cloud-native format. NNJA-AI is designed to support a wide variety of weather observation analysis and visualization tasks. Most importantly — and inspired by the impact that ARCO-ERA5, a cloud-optimized and reprocessed version of the ECMWF ERA5 reanalysis product created by Google Research, has had on MLWP development over the past few years — Brightband hopes that NNJA-AI will play a foundational role in aiding the development of new AI-powered data assimilation techniques.
As discussed in our preview release blog post, NNJA-AI improves over the original archive by re-processing it from its original BUFR format to a contemporary, cloud-native one, Apache Parquet. The dataset is written to a hierarchial, Hive-partitioned format that allows enables to throw the full power of modern big data processing tools to efficiently retrieve and work with diverse weather observations at scale. Users can retrieve these data directly from a NODD-hosted bucket on Google Cloud Storage at gs://gcp-nnja-ai, or they can use a Python-based SDK to get started more quickly.
What will you build with the NNJA-AI data? We’d love to learn more about projects leveraging this data from across the weather community and beyond. Have you created some cool visualizations of satellite weather data? Maybe you’ve built your own direct observation processing forecast model trained on NNJA-AI? Whatever it might be — or if you have questions or suggestions about the data itself — send us a note at hello@brightband.com.
Learn more about this dataset on the Brightband-NNJA AI datasets page.