Rule retrieve_databundle_light

Rule retrieve_databundle_light#

Not all data dependencies are shipped with the git repository, since git is not suited for handling large changing files. Instead we provide separate data bundles which can be obtained using the retrieve_databundle_light rule when retrieve_databundle flag in the configuration file is on. If that is the case, retrieve_databundle_light rule is included into the workflow. The common data needed to run the model will be loaded corresponding to settings of the config_default.yaml or config_tutorial.yaml depending on the tutorial flag.

https://zenodo.org/badge/DOI/10.5281/zenodo.5894972.svg

The data bundles contains common GIS datasets like EEZ shapes, Copernicus Landcover, Hydrobasins and also electricity specific summary statistics like historic per country yearly totals of hydro generation, GDP and POP on NUTS3 levels and per-country load time-series.

This rule downloads the data bundle from zenodo or google drive and extracts it in the data, resources and cutouts sub-directory. Bundle data are then deleted once downloaded and unzipped.

The Model customization uses a smaller data bundle than required for the full model (around 500 MB)

The required bundles are downloaded automatically according to the list names, in agreement to the data bundles specified in the bundle configuration file, typically located in the config folder. Each data bundle entry has the following structure:

bundle_name:  # name of the bundle
  countries: [country code, region code or country list]  # list of countries represented in the databundle
  [tutorial: true/false]  # (optional, default false) whether the bundle is a tutorial or not
  category: common/resources/data/cutouts  # category of data contained in the bundle:
  destination: "."  # folder where to unzip the files with respect to the repository root ("" or ".")
  urls:  # list of urls by source, e.g. zenodo or google
    zenodo: {zenodo url}  # key to download data from zenodo
    gdrive: {google url}  # key to download data from google drive
    protectedplanet: {url}  # key to download data from protected planet; the url can contain {month:s} and {year:d} to let the workflow specify the current month and year
    direct: {url}  # key to download data directly from a url; if unzip option is enabled data are unzipped
    post:  # key to download data using an url post request; if unzip option is enabled data are unzipped
      url: {url}
      [post arguments]
  [unzip: true/false]  # (optional, default false) used in direct download technique to automatically unzip files
  output: [...]  # list of outputs of the databundle
  [disable_by_opt:]  # option to disable outputs from the bundle; it contains a dictionary of options, each one with
                     # each one with its output. When "all" is specified, the entire bundle is not executed
    [{option}: [outputs,...,/all]]  # list of options and the outputs to remove, or "all" corresponding to ignore everything

Depending on the country list that is asked to perform, all needed databundles are downloaded according to the following rules:

  • The databundle shall adhere to the tutorial configuration: when the tutorial configuration is running, only the databundles having tutorial flag true shall be downloaded

  • For every data category, the most suitable bundles are downloaded by order of number of countries matched: for every bundles matching the category, the algorithm sorts the bundles by the number of countries that are matched and starts downloading them starting from those matching more countries till all countries are matched or no more bundles are available

  • For every bundle to download, it is given priority to the first bundle source, as listed in the urls option of each bundle configuration; when a source fails, the following source is used and so on

https://zenodo.org/badge/DOI/10.5281/zenodo.3517921.svg

Relevant Settings

tutorial:  # configuration stating whether the tutorial is needed

See also

Documentation of the configuration file config.yaml at Top-level configuration

Outputs

  • data: input data unzipped into the data folder

  • resources: input data unzipped into the resources folder

  • cutouts: input data unzipped into the cutouts folder