planetary-computer

This source is for building datasets from STAC collections on the open Microsoft Planetary Computer.

It’s intended for two types of collections:

1. Containing a collection-level dataset asset under the zarr-abfs key corresponding to a single Zarr store containing all data.

  1. Containing multiple items and, potentially, multiple assets per item.

Below is an example recipe that builds a dataset using the Near-surface level collection Met Office global deterministic 10km forecast collection.

dates:
  start: 2020-01-01T00:00:00+00:00
  end: 2020-01-02T00:00:00+00:00
  frequency: 6h

input:
  planetary-computer:
    data_catalog_id: met-office-global-deterministic-near-surface
    param: [rainfall_rate, lwe_snowfall_rate]
    search_params:
      datetime: 2020-01-01T00:00:00+00:00/2020-01-02T00:00:00+00:00
      variable_key_map:
        lwe_snowfall_rate: snowfall_rate
      filter:
        op: and
        args:
          - op: "="
            args:
              - property: forecast:horizon
              - PT0000H00M
          - op: "="
            args:
              - property: met_office_deterministic:model
              - global

The following is applicable to collections with multiple items and assets only.

The search_params config section enables specification of mappings and filters for the STAC items and assets to include in the dataset. Supported parameters include:

  • datetime: passed to the STAC API to filter items by their datetime field(s).

  • variable_key_map: a mapping of data variable names to STAC asset keys for collections where they differ.

  • filter: a CQL2 filter (dict for cql2-json, string for cql2-text) passed directly to the STAC API to filter items server-side.

Tip

While not required, it is recommended to include a datetime filter under search_params.datetime to reduce query time and the number of results to filter. See pystac_client.Client.search for accepted formats.

Tip

To identify a collection’s queryable fields, visit its queryables endpoint (e.g., ERA5 - PDS queryables ) or use the Python equivalent pystac_client.CollectionClient(...).get_queryables.

See other example recipes in the tests.