Create a dataset from CF-compliant data

A CF-compliant dataset is a dataset that follows the CF conventions. These dataset are usually stored in a format that is compatible with the CF conventions, such as NetCDF, OpenDAP, or Zarr. Internally, these datasets are accessed by anemoi-datasets using the Xarray library.

NetCDF

(Coming soon)

dates:
  start: 2023-01-01T00:00:00
  end: 2023-01-02T18:00:00
  frequency: 6h


input:
  netcdf:
    path: /path/to/input.nc

Note

For all Xarray-based sources, the param and variable keywords are considered synonymous. This is also true for the level and levelist keywords.

Please note that the path keyword can also be a list, and that paths can contain wildcards and patterns. See Reading GRIB messages from files that follow a pattern for more information.

OpenDAP

OpenDAP is a protocol that allows you to access remote datasets over the internet. The OpenDAP source is identical toe the NetCDF source. The only difference is that a URL is used instead of a file path.

dates:
  start: 2023-01-01T00:00:00
  end: 2023-01-02T18:00:00
  frequency: 6h

input:
  opendap:
    url: https://www.example.com/path/to/input.nc

Please note that the url keyword can also be a list, and that URLs can contain patterns. See Reading GRIB messages from files that follow a pattern for more information.

Zarr

For using remote hosted zarr datasets as sources, use xarray-zarr.

dates:
  start: 2023-01-01T00:00:00
  end: 2023-01-02T18:00:00
  frequency: 6h

input:
  xarray-zarr:
    url: https://www.example.com/path/to/input.zarr

For using local zarr datasets (such as anemoi-generated datasets), use anemoi-dataset.

dates:
  start: 2023-01-01T00:00:00
  end: 2023-01-02T18:00:00
  frequency: 6h

input:
  anemoi-dataset:
    dataset: /path/to/input.zarr

Handling data that is not 100% CF-compliant

(Coming soon)

Patching

Consider the following dataset:

<xarray.Dataset> Size: 21MB
Dimensions:   (y: 1207, x: 1442)
Dimensions without coordinates: y, x
Data variables:
   nav_lat   (y, x) float32 7MB ...
   nav_lon   (y, x) float32 7MB ...
   mask      (y, x) float32 7MB ...

Although the variables nav_lat and nav_lon are coordinates, there are not marked as such. This can be fixed by using the patch keyword in the recipe file:

dates:
  start: 2023-01-01T00:00:00
  end: 2023-01-02T18:00:00
  frequency: 6h

input:
  netcdf:
    path: /path/to/input.nc
    patch:
      coordinates: [ nav_lat, nav_lon ]

The resulting dataset will look like this:

<xarray.Dataset> Size: 21MB
Dimensions:   (y: 1207, x: 1442)
Coordinates:
   nav_lat   (y, x) float32 7MB ...
   nav_lon   (y, x) float32 7MB ...
Dimensions without coordinates: y, x
Data variables:
   mask      (y, x) float32 7MB ...

Note

Patching only happens in memory. The patched dataset is not saved and the original dataset is not modified.

Using a flavour

(Coming soon)

rules:
  latitude:
    name: grid_yt
  level:
    name: pfull
  longitude:
    name: grid_xt
  time:
    name: time

levtype: pl

You can see examples of the flavour in the following tests.