Create a dataset from CF-compliant data
A CF-compliant dataset is a dataset that follows the CF conventions. These dataset are usually stored in a format that is compatible with the CF conventions, such as NetCDF, OpenDAP, or Zarr. Internally, these datasets are accessed by anemoi-datasets using the Xarray library.
NetCDF
(Coming soon)
dates:
start: 2023-01-01T00:00:00
end: 2023-01-02T18:00:00
frequency: 6h
input:
netcdf:
path: /path/to/input.nc
Note
For all Xarray-based sources, the param and variable keywords
are considered synonymous. This is also true for the level and
levelist keywords.
Please note that the path keyword can also be a list, and that paths
can contain wildcards and patterns. See Reading GRIB messages from files that follow a pattern for more
information.
OpenDAP
OpenDAP is a protocol that allows you to access remote datasets over the internet. The OpenDAP source is identical toe the NetCDF source. The only difference is that a URL is used instead of a file path.
dates:
start: 2023-01-01T00:00:00
end: 2023-01-02T18:00:00
frequency: 6h
input:
opendap:
url: https://www.example.com/path/to/input.nc
Please note that the url keyword can also be a list, and that URLs
can contain patterns. See Reading GRIB messages from files that follow a pattern for more information.
Zarr
For using remote hosted zarr datasets as sources, use xarray-zarr.
dates:
start: 2023-01-01T00:00:00
end: 2023-01-02T18:00:00
frequency: 6h
input:
xarray-zarr:
url: https://www.example.com/path/to/input.zarr
For using local zarr datasets (such as anemoi-generated datasets), use anemoi-dataset.
dates:
start: 2023-01-01T00:00:00
end: 2023-01-02T18:00:00
frequency: 6h
input:
anemoi-dataset:
dataset: /path/to/input.zarr
Handling data that is not 100% CF-compliant
(Coming soon)
Patching
Consider the following dataset:
<xarray.Dataset> Size: 21MB
Dimensions: (y: 1207, x: 1442)
Dimensions without coordinates: y, x
Data variables:
nav_lat (y, x) float32 7MB ...
nav_lon (y, x) float32 7MB ...
mask (y, x) float32 7MB ...
Although the variables nav_lat and nav_lon are coordinates,
there are not marked as such. This can be fixed by using the patch
keyword in the recipe file:
dates:
start: 2023-01-01T00:00:00
end: 2023-01-02T18:00:00
frequency: 6h
input:
netcdf:
path: /path/to/input.nc
patch:
coordinates: [ nav_lat, nav_lon ]
The resulting dataset will look like this:
<xarray.Dataset> Size: 21MB
Dimensions: (y: 1207, x: 1442)
Coordinates:
nav_lat (y, x) float32 7MB ...
nav_lon (y, x) float32 7MB ...
Dimensions without coordinates: y, x
Data variables:
mask (y, x) float32 7MB ...
Note
Patching only happens in memory. The patched dataset is not saved and the original dataset is not modified.
Using a flavour
(Coming soon)
rules:
latitude:
name: grid_yt
level:
name: pfull
longitude:
name: grid_xt
time:
name: time
levtype: pl
You can see examples of the flavour in the following tests.