Rule retrieve_databundle_light
#
Not all data dependencies are shipped with the git repository, since git is not suited for handling large changing files. Instead we provide separate data bundles which can be obtained using the retrieve_databundle_light
rule when retrieve_databundle flag in the configuration file is on. If that is the case, retrieve_databundle_light rule is included into the workflow. The common data needed to run the model will be loaded corresponding to settings of the config_default.yaml or config_tutorial.yaml depending on the tutorial flag.
The data bundles contains common GIS datasets like EEZ shapes, Copernicus Landcover, Hydrobasins and also electricity specific summary statistics like historic per country yearly totals of hydro generation, GDP and POP on NUTS3 levels and per-country load time-series.
This rule downloads the data bundle from zenodo
or google drive
and extracts it in the data
, resources
and cutouts
sub-directory.
Bundle data are then deleted once downloaded and unzipped.
The Model customization uses a smaller data bundle than required for the full model (around 500 MB)
The required bundles are downloaded automatically according to the list names, in agreement to
the data bundles specified in the bundle configuration file, typically located in the config
folder.
Each data bundle entry has the following structure:
bundle_name: # name of the bundle
countries: [country code, region code or country list] # list of countries represented in the databundle
[tutorial: true/false] # (optional, default false) whether the bundle is a tutorial or not
category: common/resources/data/cutouts # category of data contained in the bundle:
destination: "." # folder where to unzip the files with respect to the repository root ("" or ".")
urls: # list of urls by source, e.g. zenodo or google
zenodo: {zenodo url} # key to download data from zenodo
gdrive: {google url} # key to download data from google drive
protectedplanet: {url} # key to download data from protected planet; the url can contain {month:s} and {year:d} to let the workflow specify the current month and year
direct: {url} # key to download data directly from a url; if unzip option is enabled data are unzipped
post: # key to download data using an url post request; if unzip option is enabled data are unzipped
url: {url}
[post arguments]
[unzip: true/false] # (optional, default false) used in direct download technique to automatically unzip files
output: [...] # list of outputs of the databundle
[disable_by_opt:] # option to disable outputs from the bundle; it contains a dictionary of options, each one with
# each one with its output. When "all" is specified, the entire bundle is not executed
[{option}: [outputs,...,/all]] # list of options and the outputs to remove, or "all" corresponding to ignore everything
Depending on the country list that is asked to perform, all needed databundles are downloaded according to the following rules:
The databundle shall adhere to the tutorial configuration: when the tutorial configuration is running, only the databundles having tutorial flag true shall be downloaded
For every data category, the most suitable bundles are downloaded by order of number of countries matched: for every bundles matching the category, the algorithm sorts the bundles by the number of countries that are matched and starts downloading them starting from those matching more countries till all countries are matched or no more bundles are available
For every bundle to download, it is given priority to the first bundle source, as listed in the
urls
option of each bundle configuration; when a source fails, the following source is used and so on
Relevant Settings
tutorial: # configuration stating whether the tutorial is needed
See also
Documentation of the configuration file config.yaml
at
Top-level configuration
Outputs
data
: input data unzipped into the data folderresources
: input data unzipped into the resources foldercutouts
: input data unzipped into the cutouts folder