Welcome to anemoi-datasets documentation!

An Anemoi dataset is a thin wrapper around a Zarr store that is optimised for training data-driven weather forecasting models. anemoi-datasets are organised in such a way that I/O operations are minimised. It is one of the packages within the anemoi framework.

About Anemoi

Anemoi is a framework for developing machine learning weather forecasting models. It comprises components or packages for preparing training datasets, conducting ML model training, and a registry for datasets and trained models. Anemoi provides tools for operational inference, including interfacing to verification software. As a framework, it seeks to handle many of the complexities that meteorological organisations will share, allowing them to easily train models from existing recipes but with their own data.

Quick overview

The anemoi-datasets package provides a structured approach to preparing datasets for data-driven weather forecasting models, particularly those using deep learning. By optimising data access patterns, anemoi-datasets minimises I/O operations, improving efficiency when training machine learning models.

anemoi-datasets offers a simple high-level interface based on a YAML recipe file, which defines how datasets are processed and structured. The package allows you to:

  • Load and transform datasets from sources such as reanalyses or forecasts.

  • Interpolate data to a desired spatial resolution and temporal frequency to match model requirements.

  • Select and preprocess relevant meteorological variables for use in machine learning workflows.

  • Structure datasets for efficient access in training and inference, reducing unnecessary data operations.

The dataset definition is specified in a YAML file, which is then used to generate the dataset using the command-line tool create command. The command-line tool also allows users to inspect datasets for compatibility with machine learning models.

In the rest of this documentation, you will learn how to configure and create anemoi datasets using YAML files, as well as how to load and read existing ones. A full example of a dataset preparation process can be found in the Re-create the sample dataset section.

Installing

To install the package, you can use the following command:

pip install anemoi-datasets

Get more information in the installing section.

Contributing

git clone ...
cd anemoi-datasets
pip install .[dev]

You may also have to install pandoc on macOS:

brew install pandoc

Other Anemoi packages

License

Anemoi is available under the open source Apache License.