Create a dataset using a filter

A filter is a software component that takes as input the output of a source or another filter and can modify the fields and/or their metadata. For example, typical filters are interpolations, renaming of variables, etc. Filters are available as part of anemoi-transform.

Using a filter

In the example below we see a recipe to create a dataset from MARS data in which we perform a rename transform to update tp to be named tp_era5. To be able to use the transform we just define it as a second step of the pipe, after gathering the data.

dates:
  start: 2020-12-12 00:00:00
  end: 2020-12-14 12:00:00
  frequency: 12h

input:
  pipe:
    - join:
      - mars:
          expver: "0001"
          class: ea
          grid: 20./20.
          param: [2t]
          levtype: sfc
      - mars:
          expver: "0001"
          class: ea
          grid: 20./20.
          param: [q, t,w]
          levtype: pl
          level: [50, 100]
          stream: oper
          type: an
      - accumulations:
          expver: "0001"
          class: ea
          grid: 20./20.
          param: [cp, tp]
    - rename:
        param:
          tp: tp_era5

That recipe will generate the following dataset:

Dataset Summary
===============

πŸ“¦ Path          : recipe1.zarr
πŸ”’ Format version: 0.30.0

πŸ“… Start      : 2020-12-12 00:00
πŸ“… End        : 2020-12-13 12:00
⏰ Frequency  : 12h
🚫 Missing    : 0
🌎 Resolution : 20.0
🌎 Field shape: [9, 18]

πŸ“ Shape      : 4 Γ— 9 Γ— 1 Γ— 162 (22.8 KiB)
πŸ’½ Size       : 40.7 KiB (40.7 KiB)
πŸ“ Files      : 48

   Index β”‚ Variable β”‚         Min β”‚         Max β”‚         Mean β”‚       Stdev
   ──────┼──────────┼─────────────┼─────────────┼──────────────┼────────────
      0 β”‚ 2t       β”‚     226.496 β”‚     309.946 β”‚       278.03 β”‚     19.2561
      1 β”‚ cp       β”‚           0 β”‚  0.00739765 β”‚   0.00014582 β”‚ 0.000527194
      2 β”‚ q_100    β”‚ 1.38935e-06 β”‚ 4.20381e-06 β”‚  2.68779e-06 β”‚ 5.59043e-07
      3 β”‚ q_50     β”‚ 1.26881e-06 β”‚ 3.20919e-06 β”‚  2.74525e-06 β”‚ 4.35595e-07
      4 β”‚ t_100    β”‚     189.787 β”‚     226.929 β”‚      207.052 β”‚     9.26841
      5 β”‚ t_50     β”‚      189.14 β”‚      236.51 β”‚       212.79 β”‚      9.5502
      6 β”‚ tp_era5  β”‚           0 β”‚  0.00823116 β”‚  0.000326814 β”‚  0.00078008
      7 β”‚ w_100    β”‚  -0.0114685 β”‚   0.0129402 β”‚ -0.000355278 β”‚  0.00448272
      8 β”‚ w_50     β”‚ -0.00815806 β”‚   0.0126816 β”‚ -0.000267674 β”‚  0.00331866
   ──────┴──────────┴─────────────┴─────────────┴──────────────┴────────────
πŸ”‹ Dataset ready, last update 1 minute ago.
πŸ“Š Statistics ready.

Creating a new filter

In order to create a new filter the recommendation is to define it under the package anemoi-transform. Available filters can be found in anemoi/transform/filters or running the command anemoi-transform filters list. For details about how to create a filter please refer to the anemoi-transform documentation.

Using multiple filters

It’s possible to stack multiple filters one after the other. Below you can see an updated version of the dataset creation we had where we now create a dataset and apply a rename filter and our newly defined VerticalVelocity filter.

dates:
  start: 2020-12-12 00:00:00
  end: 2020-12-14 12:00:00
  frequency: 12h

input:
  pipe:
    - join:
      - mars:
          expver: "0001"
          class: ea
          grid: 20./20.
          param: [2t]
          levtype: sfc
      - mars:
          expver: "0001"
          class: ea
          grid: 20./20.
          param: [q, t,w]
          levtype: pl
          level: [50, 100]
          stream: oper
          type: an
      - accumulations:
          expver: "0001"
          class: ea
          grid: 20./20.
          param: [cp, tp]
    - rename:
        param:
          tp: tp_era5
    - w_to_wz:
        w_component: w
        temperature: t
        humidity: q

That recipe will generate the following dataset:

Dataset Summary
===============


πŸ“¦ Path          : recipe2.zarr
πŸ”’ Format version: 0.30.0

πŸ“… Start      : 2020-12-12 00:00
πŸ“… End        : 2020-12-13 12:00
⏰ Frequency  : 12h
🚫 Missing    : 0
🌎 Resolution : 20.0
🌎 Field shape: [9, 18]

πŸ“ Shape      : 4 Γ— 9 Γ— 1 Γ— 162 (22.8 KiB)
πŸ’½ Size       : 41.1 KiB (41.1 KiB)
πŸ“ Files      : 48

   Index β”‚ Variable β”‚         Min β”‚         Max β”‚        Mean β”‚       Stdev
   ──────┼──────────┼─────────────┼─────────────┼─────────────┼────────────
      0 β”‚ 2t       β”‚     226.496 β”‚     309.946 β”‚      278.03 β”‚     19.2561
      1 β”‚ cp       β”‚           0 β”‚  0.00739765 β”‚  0.00014582 β”‚ 0.000527194
      2 β”‚ q_100    β”‚ 1.38935e-06 β”‚ 4.20381e-06 β”‚ 2.68779e-06 β”‚ 5.59043e-07
      3 β”‚ q_50     β”‚ 1.26881e-06 β”‚ 3.20919e-06 β”‚ 2.74525e-06 β”‚ 4.35595e-07
      4 β”‚ t_100    β”‚     189.787 β”‚     226.929 β”‚     207.052 β”‚     9.26841
      5 β”‚ t_50     β”‚      189.14 β”‚      236.51 β”‚      212.79 β”‚      9.5502
      6 β”‚ tp_era5  β”‚           0 β”‚  0.00823116 β”‚ 0.000326814 β”‚  0.00078008
      7 β”‚ wz_100   β”‚ -0.00798191 β”‚  0.00721723 β”‚ 0.000224189 β”‚  0.00277693
      8 β”‚ wz_50    β”‚    -0.01549 β”‚   0.0103844 β”‚ 0.000341309 β”‚  0.00417065
   ──────┴──────────┴─────────────┴─────────────┴─────────────┴────────────
πŸ”‹ Dataset ready, last update 11 seconds ago.
πŸ“Š Statistics ready.