Create a dataset from GRIB data
A GRIB file is a file that contains several GRIB messages. Each message is a single 2D field. anemoi-datasets relies earthkit-data to read GRIB files, which itself relies on eccodes.
Reading GRIB messages from a file
To create a dataset from GRIB files, use the grib source.
dates:
start: 2023-01-01T00:00:00
end: 2023-01-02T18:00:00
frequency: 6h
input:
grib:
path: /path/to/input.grib
This recipe will create a dataset with all the GRIB messages present in the file, whose valid date matches the requested dates. This means that for forecast data, the date at which the data is valid is usually the reference date of the forecast (starting date) plus the forecast step.
Please note that the path keyword can also be a list.
Reading GRIB messages from files that follow a pattern
Often, GRIB files are stored in a directory with a specific pattern. For
example, the files may be named with a date pattern, such as
YYYYMMDD_HHMM.grib. In this case, you can use the grib source.
dates:
start: 2023-01-01T00:00:00
end: 2023-01-02T18:00:00
frequency: 6h
input:
grib:
path: /path/to/data-{param}-{date:strftime(%Y%m%d%H)}.grib
param: [2t, 10u, 10v]
Please note that the path keyword can also be a list.
Every pattern in the path that is enclosed in curly brackets
({}) is replaced by the requested value. For example, The path
/path/to/files/{param}_{level}.grib will be replaced by
/path/to/files/z_500.grib if the requested parameter is z and
the level is 500.
There is a special syntax for the date keyword:
The construct {date:strftime(%Y%m%d%H)} is replaced by the requested
date formatted according to the Python strftime method. For example, if
the requested date is 2023-01-01 00:00:00, the pattern will be
2023010100.grib.
You can also use strftimedelta to specify a date that is shifted by
an offset from the requested date. For example, if you want to read a
file that is one hour before the requested date, you can use the
following pattern {date:strftimedelta(-1h,%Y%m%d%H)}. This will be
replaced by 2023010113 if the requested date is 2023-01-01
14:00:00.
You can also use Unix wildcards to specify a pattern for the files. For
example, if the files are named with a date pattern, such as
YYYYMMDD_HHMM.grib, you can use the following pattern:
/path/to/files/*{date:strftime(%Y%m%d%H)}*.grib. The * wildcard
will match any number of characters, including none.
Using an index file
If you have a large number of GRIB files, it may be useful to create an index file. This file contains the list of all the GRIB messages in the files and allows quick access to the messages without having to read the entire file. The index file is created using the grib-index command and uses the grib-index source.
anemoi-datasets grib-index --index index.db /path/to/grib-files --match '*pattern*'
The index file can then be used in the recipe file. For example, if the
index file is named index.db, you can use the following recipe:
dates:
start: 2023-01-01T00:00:00
end: 2023-01-02T18:00:00
frequency: 6h
input:
grib-index:
index: /path/to/index.db
after that, the parameters are the same as for the grib source.
Selecting GRIB messages
You can select GRIB messages using the MARS language and eccodes keys.
For example, to select all the GRIB messages with a specific parameter,
you can use the param keyword. For example, to select all the GRIB
messages with the parameters 2t, 10u and 10v, you can use
the following:
dates:
start: 2023-01-01T00:00:00
end: 2023-01-02T18:00:00
frequency: 6h
input:
grib-index:
index: /path/to/index.db
param: [2t, 10u, 10v]
It is recommended to join several sources to differentiate between single-level and multi-level fields.
dates:
start: 2023-01-01T00:00:00
end: 2023-01-02T18:00:00
frequency: 6h
input:
join:
- grib:
path: /path/to/input.grib
param: [z, t, u, v]
levelist: [1000, 850, 500]
levtype: pl
- grib:
path: /path/to/input2.grib
param: [2t, msl]
levtype: sfc
Note
You can use any eccodes keys to select the GRIB messages. If you are using an index, the keys must be present in the index file, and should have been provided at index creation time.
For example, to select variables by their non-integer topLevel
value, topLevel:d can be used. This instructs eccodes to retrieve
topLevel as a double instead of the default integer type.
The build.variable_naming option or the rename filter (see
anemoi-transform) can be used
to include the exact topLevel value in the anemoi-dataset variable
name. (Be cautious with exact matching of float values when using
:d.)
dates:
start: 2023-01-01T00:00:00
end: 2023-01-02T18:00:00
frequency: 6h
input:
grib:
path: /path/to/input.grib
param: ["T_SO"]
topLevel:d: [0.0, 0.01, 0.8099999999999999, 7.290000000000001]
build:
variable_naming: "{param}_{topLevel:d}"
dates:
start: 2023-01-01T00:00:00
end: 2023-01-02T18:00:00
frequency: 6h
input:
pipe:
- grib:
path: /path/to/input.grib
param: ["T_SO"]
topLevel:d: [0.0, 0.01, 0.8099999999999999, 7.290000000000001]
- rename:
param: "{param}_{topLevel:d}"
Using a flavour
GRIB from different organisations often have slightly different flavours, such as organisation-specific naming conventions or different ways of understanding single-level and multi-level fields.
A flavour is a list of pairs of dictionaries, where the first dictionary is a matching rule (condition) and the second one is an action (conclusion).
When looking up fields’ metadata, like the parameter name (param) or
the level (level), the first rule that matches the existing field
metadata is applied. The values listed in its second dictionary are then
used to override the actual metadata values of the field.
For example, the first rule in the example below will clear the
levelist metadata fields that have a levtype of sfc. This is
useful because the default naming of the variables in the resulting
dataset is the concatenation of the param and levelist fields.
If the level field is empty, the resulting variable name will be
just the param. This is useful to avoid having a variable name like
2t_2 or 10u_10.
- - levtype: sfc
- levelist: null
- - { discipline: 0, parameterCategory: 1, parameterNumber: 201 }
- param: csf
- - { discipline: 0, parameterCategory: 1, parameterNumber: 64 }
- param: tcwv
The second and third rules will allow a user to define a param name
if the field is not recognised by eccodes.
In a recipe file, the flavour can be either defined by giving a path to a YAML or JSON file:
dates:
start: 2023-01-01T00:00:00
end: 2023-01-02T18:00:00
frequency: 6h
input:
grib:
path: /path/to/input.grib
flavour: /path/to/flavour.yaml
or can be given inline in the recipe file.
dates:
start: 2023-01-01T00:00:00
end: 2023-01-02T18:00:00
frequency: 6h
input:
grib:
path: /path/to/input.grib
flavour:
- - levtype: sfc
- levelist: null
You can make use of YAML anchors to avoid repeating the same rules in multiple places:
dates:
start: 2023-01-01T00:00:00
end: 2023-01-02T18:00:00
frequency: 6h
flavour: &flavour
- - levtype: sfc
- levelist: null
input:
join:
- grib:
path: /path/to/input.grib
flavour: *flavour
param: [ z, t, u, v ]
levelist: [ 1000, 850, 500 ]
levtype: pl
- grib:
path: /path/to/input2.grib
flavour: *flavour
param: [ 2t, msl ]
levtype: sfc