API Reference (automated)

Contents

API Reference (automated)#

_helpers#

_helpers.configure_logging(snakemake, skip_handlers=False)#

Configure the basic behaviour for the logging module.

Note: Must only be called once from the __main__ section of a script.

The setup includes printing log messages to STDERR and to a log file defined by either (in priority order): snakemake.log.python, snakemake.log[0] or “logs/{rulename}.log”. Additional keywords from logging.basicConfig are accepted via the snakemake configuration file under snakemake.config.logging.

Parameters:
  • snakemake (snakemake object) – Your snakemake object containing a snakemake.config and snakemake.log.

  • skip_handlers (True | False (default)) – Do (not) skip the default handlers created for redirecting output to STDERR and file.

_helpers.country_name_2_two_digits(country_name)#

Convert full country name to 2-digit country code.

Parameters:

country_name (str) – country name

Returns:

two_code_country – 2-digit country name

Return type:

str

_helpers.create_country_list(input, iso_coding=True)#

Create a country list for defined regions..

Parameters:

input (str) – Any two-letter country name, regional name, or continent given in the regions config file. Country name duplications won’t distort the result. Examples are: [“NG”,”ZA”], downloading osm data for Nigeria and South Africa [“africa”], downloading data for Africa [“NAR”], downloading data for the North African Power Pool [“TEST”], downloading data for a customized test set. [“NG”,”ZA”,”NG”], won’t distort result.

Returns:

full_codes_list – Example [“NG”,”ZA”]

Return type:

list

_helpers.create_logger(logger_name, level=20)#

Create a logger for a module and adds a handler needed to capture in logs traceback from exceptions emerging during the workflow.

_helpers.get_aggregation_strategies(aggregation_strategies)#

Default aggregation strategies that cannot be defined in .yaml format must be specified within the function, otherwise (when defaults are passed in the function’s definition) they get lost when custom values are specified in the config.

_helpers.get_last_commit_message(path)#

Function to get the last PyPSA-Earth Git commit message.

Returns:

result

Return type:

string

_helpers.handle_exception(exc_type, exc_value, exc_traceback)#

Customise errors traceback.

_helpers.load_network(import_name=None, custom_components=None)#

Helper for importing a pypsa.Network with additional custom components.

Parameters:
  • import_name (str) – As in pypsa.Network(import_name)

  • custom_components (dict) –

    Dictionary listing custom components. For using snakemake.params.override_components"] in config.yaml define:

    override_components:
        ShadowPrice:
            component: ["shadow_prices","Shadow price for a global constraint.",np.nan]
            attributes:
            name: ["string","n/a","n/a","Unique name","Input (required)"]
            value: ["float","n/a",0.,"shadow value","Output"]
    

Return type:

pypsa.Network

_helpers.mock_snakemake(rulename, **wildcards)#

This function is expected to be executed from the “scripts”-directory of ” the snakemake project. It returns a snakemake.script.Snakemake object, based on the Snakefile.

If a rule has wildcards, you have to specify them in wildcards.

Parameters:
  • rulename (str) – name of the rule for which the snakemake object should be generated

  • wildcards – keyword arguments fixing the wildcards. Only necessary if wildcards are needed.

_helpers.progress_retrieve(url, file, data=None, headers=None, disable_progress=False, roundto=1.0)#

Function to download data from a url with a progress bar progress in retrieving data.

Parameters:
  • url (str) – Url to download data from

  • file (str) – File where to save the output

  • data (dict) – Data for the request (default None), when not none Post method is used

  • disable_progress (bool) – When true, no progress bar is shown

  • roundto (float) – (default 0) Precision used to report the progress e.g. 0.1 stands for 88.1, 10 stands for 90, 80

_helpers.read_csv_nafix(file, **kwargs)#

Function to open a csv as pandas file and standardize the na value

_helpers.read_geojson(fn, cols=[], dtype=None, crs='EPSG:4326')#

Function to read a geojson file fn. When the file is empty, then an empty GeoDataFrame is returned having columns cols, the specified crs and the columns specified by the dtype dictionary it not none.

Parameters:#

fnstr

Path to the file to read

colslist

List of columns of the GeoDataFrame

dtypedict

Dictionary of the type of the object by column

crsstr

CRS of the GeoDataFrame

_helpers.read_osm_config(*args)#

Read values from the regions config file based on provided key arguments.

Parameters:

*args (str) – One or more key arguments corresponding to the values to retrieve from the config file. Typical arguments include “world_iso”, “continent_regions”, “iso_to_geofk_dict”, and “osm_clean_columns”.

Returns:

If a single key is provided, returns the corresponding value from the regions config file. If multiple keys are provided, returns a tuple containing values corresponding to the provided keys.

Return type:

tuple or str or dict

Examples

>>> values = read_osm_config("key1", "key2")
>>> print(values)
('value1', 'value2')
>>> world_iso = read_osm_config("world_iso")
>>> print(world_iso)
{"Africa": {"DZ": "algeria", ...}, ...}
_helpers.sets_path_to_root(root_directory_name)#

Search and sets path to the given root directory (root/path/file).

Parameters:
  • root_directory_name (str) – Name of the root directory.

  • n (int) – Number of folders the function will check upwards/root directed.

_helpers.three_2_two_digits_country(three_code_country)#

Convert 3-digit to 2-digit country code:

Parameters:

three_code_country (str) – 3-digit country name

Returns:

two_code_country – 2-digit country name

Return type:

str

_helpers.two_2_three_digits_country(two_code_country)#

Convert 2-digit to 3-digit country code:

Parameters:

two_code_country (str) – 2-digit country name

Returns:

three_code_country – 3-digit country name

Return type:

str

_helpers.two_digits_2_name_country(two_code_country, nocomma=False, remove_start_words=[])#

Convert 2-digit country code to full name country:

Parameters:
  • two_code_country (str) – 2-digit country name

  • nocomma (bool (optional, default False)) – When true, country names with comma are extended to remove the comma. Example CD -> Congo, The Democratic Republic of -> The Democratic Republic of Congo

  • remove_start_words (list (optional, default empty)) – When a sentence starts with any of the provided words, the beginning is removed. e.g. The Democratic Republic of Congo -> Democratic Republic of Congo (remove_start_words=[“The”])

Returns:

full_name – full country name

Return type:

str

add_electricity#

Adds electrical generators, load and existing hydro storage units to a base network.

Relevant Settings#

costs:
    year:
    version:
    rooftop_share:
    USD2013_to_EUR2013:
    dicountrate:
    emission_prices:

electricity:
    max_hours:
    marginal_cost:
    capital_cost:
    conventional_carriers:
    co2limit:
    extendable_carriers:
    include_renewable_capacities_from_OPSD:
    estimate_renewable_capacities_from_capacity_stats:

renewable:
    hydro:
        carriers:
        hydro_max_hours:
        hydro_max_hours_default:
        hydro_capital_cost:

lines:
    length_factor:

See also

Documentation of the configuration file config.yaml at costs, electricity, load_options, renewable, lines

Inputs#

  • resources/costs.csv: The database of cost assumptions for all included technologies for specific years from various sources; e.g. discount rate, lifetime, investment (CAPEX), fixed operation and maintenance (FOM), variable operation and maintenance (VOM), fuel costs, efficiency, carbon-dioxide intensity.

  • data/bundle/hydro_capacities.csv: Hydropower plant store/discharge power capacities, energy storage capacity, and average hourly inflow by country. Not currently used!

    _images/hydrocapacities.png
  • data/geth2015_hydro_capacities.csv: alternative to capacities above; not currently used!

  • resources/demand_profiles.csv: a csv file containing the demand profile associated with buses

  • resources/shapes/gadm_shapes.geojson: confer Rule build_shapes

  • resources/powerplants.csv: confer Rule build_powerplants

  • resources/profile_{}.nc: all technologies in config["renewables"].keys(), confer Rule build_renewable_profiles

  • networks/base.nc: confer Rule base_network

Outputs#

  • networks/elec.nc:

Description#

The rule add_electricity ties all the different data inputs from the preceding rules together into a detailed PyPSA network that is stored in networks/elec.nc. It includes:

  • today’s transmission topology and transfer capacities (in future, optionally including lines which are under construction according to the config settings lines: under_construction and links: under_construction),

  • today’s thermal and hydro power generation capacities (for the technologies listed in the config setting electricity: conventional_carriers), and

  • today’s load time-series (upsampled in a top-down approach according to population and gross domestic product)

It further adds extendable generators with zero capacity for

  • photovoltaic, onshore and AC- as well as DC-connected offshore wind installations with today’s locational, hourly wind and solar capacity factors (but no current capacities),

  • additional open- and combined-cycle gas turbines (if OCGT and/or CCGT is listed in the config setting electricity: extendable_carriers)

add_electricity.attach_load(n, demand_profiles)#

Add load profiles to network buses.

Parameters:
  • n (pypsa network)

  • demand_profiles (str) – Path to csv file of elecric demand time series, e.g. “resources/demand_profiles.csv” Demand profile has snapshots as rows and bus names as columns.

Returns:

n – Now attached with load time series

Return type:

pypsa network

add_electricity.calculate_annuity(n, r)#

Calculate the annuity factor for an asset with lifetime n years and discount rate of r, e.g. annuity(20, 0.05) * 20 = 1.6.

add_electricity.load_costs(tech_costs, config, elec_config, Nyears=1)#

Set all asset costs and other parameters.

base_network#

Creates the network topology from a OpenStreetMap.

Relevant Settings#

snapshots:

countries:

electricity:
    voltages:

lines:
    types:
    s_max_pu:
    under_construction:

links:
    p_max_pu:
    p_nom_max:
    under_construction:

transformers:
    x:
    s_nom:
    type:

See also

Documentation of the configuration file config.yaml at snapshots, Top-level configuration, electricity, load_options, lines, links, transformers

Inputs#

Outputs#

  • networks/base.nc

Description#

build_bus_regions#

Creates Voronoi shapes for each bus representing both onshore and offshore regions.

Relevant Settings#

countries:

See also

Documentation of the configuration file config.yaml at Top-level configuration

Inputs#

Outputs#

  • resources/regions_onshore.geojson:

  • resources/regions_offshore.geojson:

Description#

build_bus_regions.custom_voronoi_partition_pts(points, outline, add_bounds_shape=True, multiplier=5)#

Compute the polygons of a voronoi partition of points within the polygon outline

build_bus_regions.points#
Type:

Nx2 - ndarray[dtype=float]

build_bus_regions.outline#
Type:

Polygon

Returns:

polygons

Return type:

N - ndarray[dtype=Polygon|MultiPolygon]

build_cutout#

Create cutouts with atlite.

For this rule to work you must have

See also

For details on the weather data read the atlite documentation. If you need help specifically for creating cutouts the corresponding section in the atlite documentation should be helpful.

Relevant Settings#

atlite:
    nprocesses:
    cutouts:
        {cutout}:

See also

Documentation of the configuration file config.yaml at atlite

Inputs#

None

Outputs#

  • cutouts/{cutout}: weather data from either the ERA5 reanalysis weather dataset or SARAH-2 satellite-based historic weather data with the following structure:

ERA5 cutout:

Field

Dimensions

Unit

Description

pressure

time, y, x

Pa

Surface pressure

temperature

time, y, x

K

Air temperature 2 meters above the surface.

soil temperature

time, y, x

K

Soil temperature between 1 meters and 3 meters depth (layer 4).

influx_toa

time, y, x

Wm**-2

Top of Earth’s atmosphere TOA incident solar radiation

influx_direct

time, y, x

Wm**-2

Total sky direct solar radiation at surface

runoff

time, y, x

m

Runoff (volume per area)

roughness

y, x

m

Forecast surface roughness (roughness length)

height

y, x

m

Surface elevation above sea level

albedo

time, y, x

Albedo measure of diffuse reflection of solar radiation. Calculated from relation between surface solar radiation downwards (Jm**-2) and surface net solar radiation (Jm**-2). Takes values between 0 and 1.

influx_diffuse

time, y, x

Wm**-2

Diffuse solar radiation at surface. Surface solar radiation downwards minus direct solar radiation.

wnd100m

time, y, x

ms**-1

Wind speeds at 100 meters (regardless of direction)

_images/era5.png

A SARAH-2 cutout can be used to amend the fields temperature, influx_toa, influx_direct, albedo, influx_diffuse of ERA5 using satellite-based radiation observations.

Description#

build_demand_profiles#

Creates electric demand profile csv.

Relevant Settings#

load:
    scale:
    ssp:
    weather_year:
    prediction_year:
    region_load:

Inputs#

  • networks/base.nc: confer Rule base_network, a base PyPSA Network

  • resources/bus_regions/regions_onshore.geojson: confer build_bus_regions

  • load_data_paths: paths to load profiles, e.g. hourly country load profiles produced by GEGIS

  • resources/shapes/gadm_shapes.geojson: confer Rule build_shapes, file containing the gadm shapes

Outputs#

  • resources/demand_profiles.csv: the content of the file is the electric demand profile associated to each bus. The file has the snapshots as rows and the buses of the network as columns.

Description#

The rule build_demand creates load demand profiles in correspondence of the buses of the network. It creates the load paths for GEGIS outputs by combining the input parameters of the countries, weather year, prediction year, and SSP scenario. Then with a function that takes in the PyPSA network “base.nc”, region and gadm shape data, the countries of interest, a scale factor, and the snapshots, it returns a csv file called “demand_profiles.csv”, that allocates the load to the buses of the network according to GDP and population.

build_demand_profiles.build_demand_profiles(n, load_paths, regions, admin_shapes, countries, scale, start_date, end_date, out_path)#

Create csv file of electric demand time series.

Parameters:
  • n (pypsa network)

  • load_paths (paths of the load files)

  • regions (.geojson) – Contains bus_id of low voltage substations and bus region shapes (voronoi cells)

  • admin_shapes (.geojson) – contains subregional gdp, population and shape data

  • countries (list) – List of countries that is config input

  • scale (float) – The scale factor is multiplied with the load (1.3 = 30% more load)

  • start_date (parameter) – The start_date is the first hour of the first day of the snapshots

  • end_date (parameter) – The end_date is the last hour of the last day of the snapshots

Returns:

demand_profiles.csv

Return type:

csv file containing the electric demand time series

build_demand_profiles.get_gegis_regions(countries)#

Get the GEGIS region from the config file.

Parameters:

region (str) – The region of the bus

Returns:

The GEGIS region

Return type:

str

build_demand_profiles.get_load_paths_gegis(ssp_parentfolder, config)#

Create load paths for GEGIS outputs.

The paths are created automatically according to included country, weather year, prediction year and ssp scenario

Example

[“/data/ssp2-2.6/2030/era5_2013/Africa.nc”, “/data/ssp2-2.6/2030/era5_2013/Africa.nc”]

build_demand_profiles.shapes_to_shapes(orig, dest)#

Adopted from vresutils.transfer.Shapes2Shapes()

build_natura_raster#

Converts vectordata or known as shapefiles (i.e. used for geopandas/shapely) to our cutout rasters. The Protected Planet Data on protected areas is aggregated to all cutout regions.

Relevant Settings#

renewable:
    {technology}:
        cutout:

See also

Documentation of the configuration file config.yaml at renewable

Inputs#

Outputs#

  • resources/natura/natura.tiff: Rasterized version of the world protected areas, such as WDPA natural protection areas to reduce computation times.

Description#

To operate the script you need all input files.

This script collects all shapefiles available in the folder data/landcover/* describing regions of protected areas, merges them to one shapefile, and create a rasterized version of the region, that covers the region described by the cutout. The output is a raster file with the name natura.tiff in the folder resources/natura/.

build_natura_raster.get_fileshapes(list_paths, accepted_formats=('.shp',))#

Function to parse the list of paths to include shapes included in folders, if any

build_natura_raster.unify_protected_shape_areas(inputs, natura_crs, out_logging)#

Iterates through all snakemake rule inputs and unifies shapefiles (.shp) only.

The input is given in the Snakefile and shapefiles are given by .shp

Returns:

unified_shape

Return type:

GeoDataFrame with a unified “multishape”

build_renewable_profiles#

Calculates for each network node the (i) installable capacity (based on land- use), (ii) the available generation time series (based on weather data), and (iii) the average distance from the node for onshore wind, AC-connected offshore wind, DC-connected offshore wind and solar PV generators. For hydro generators, it calculates the expected inflows. In addition for offshore wind it calculates the fraction of the grid connection which is under water.

Relevant settings#

snapshots:

atlite:
    nprocesses:

renewable:
    {technology}:
        cutout:
        copernicus:
            grid_codes:
            distance:
            distance_grid_codes:
        natura:
        max_depth:
        max_shore_distance:
        min_shore_distance:
        capacity_per_sqkm:
        correction_factor:
        potential:
        min_p_max_pu:
        clip_p_max_pu:
        resource:
        clip_min_inflow:

See also

Documentation of the configuration file config.yaml at snapshots, atlite, renewable

Inputs#

Outputs#

  • resources/profile_{technology}.nc, except hydro technology, with the following structure

    Field

    Dimensions

    Description

    profile

    bus, time

    the per unit hourly availability factors for each node

    weight

    bus

    sum of the layout weighting for each node

    p_nom_max

    bus

    maximal installable capacity at the node (in MW)

    potential

    y, x

    layout of generator units at cutout grid cells inside the Voronoi cell (maximal installable capacity at each grid cell multiplied by capacity factor)

    average_distance

    bus

    average distance of units in the Voronoi cell to the grid node (in km)

    underwater_fraction

    bus

    fraction of the average connection distance which is under water (only for offshore)

  • resources/profile_hydro.nc for the hydro technology

    Field

    Dimensions

    Description

    inflow

    plant, time

    Inflow to the state of charge (in MW), e.g. due to river inflow in hydro reservoir.

    • profile

    _images/profile_ts.png
    • p_nom_max

    _images/p_nom_max_hist.png
    • potential

    _images/potential_heatmap.png
    • average_distance

    _images/distance_hist.png
    • underwater_fraction

    _images/underwater_hist.png

Description#

This script leverages on atlite function to derivate hourly time series for an entire year for solar, wind (onshore and offshore), and hydro data.

This script functions at two main spatial resolutions: the resolution of the network nodes and their Voronoi cells, and the resolution of the cutout grid cells for the weather data. Typically the weather data grid is finer than the network nodes, so we have to work out the distribution of generators across the grid cells within each Voronoi cell. This is done by taking account of a combination of the available land at each grid cell and the capacity factor there.

This uses the Copernicus land use data, Natura2000 nature reserves and GEBCO bathymetry data.

_images/eligibility.png

To compute the layout of generators in each node’s Voronoi cell, the installable potential in each grid cell is multiplied with the capacity factor at each grid cell. This is done since we assume more generators are installed at cells with a higher capacity factor.

_images/offwinddc-gridcell.png _images/offwindac-gridcell.png img/onwind-gridcell.png _images/solar-gridcell.png

This layout is then used to compute the generation availability time series from the weather data cutout from atlite.

Two methods are available to compute the maximal installable potential for the node (p_nom_max): simple and conservative:

  • simple adds up the installable potentials of the individual grid cells. If the model comes close to this limit, then the time series may slightly overestimate production since it is assumed the geographical distribution is proportional to capacity factor.

  • conservative ascertains the nodal limit by increasing capacities proportional to the layout until the limit of an individual grid cell is reached.

build_renewable_profiles.check_cutout_completness(cf)#

Check if a cutout contains missed values.

That may be the case due to some issues with accessibility of ERA5 data See for details https://confluence.ecmwf.int/display/CUSF/Missing+data+in+ERA5T Returns share of cutout cells with missed data

build_renewable_profiles.estimate_bus_loss(data_column, tech)#

Calculated share of buses with data loss due to flaws in the cutout data.

Returns share of the buses with missed data

build_renewable_profiles.filter_cutout_region(cutout, regions)#

Filter the cutout to focus on the region of interest.

build_renewable_profiles.rescale_hydro(plants, runoff, normalize_using_yearly, normalization_year)#

Function used to rescale the inflows of the hydro capacities to match country statistics.

Parameters:
  • plants (DataFrame) – Run-of-river plants orf dams with lon, lat, countries, installed_hydro columns. Countries and installed_hydro column are only used with normalize_using_yearly installed_hydro column shall be a boolean vector specifying whether that plant is currently installed and used to normalize the inflows

  • runoff (xarray object) – Runoff at each bus

  • normalize_using_yearly (DataFrame) – Dataframe that specifies for every country the total hydro production

  • year (int) – Year used for normalization

build_shapes#

build_shapes.add_gdp_data(df_gadm, year=2020, update=False, out_logging=False, name_file_nc='GDP_PPP_1990_2015_5arcmin_v2.nc', nprocesses=2, disable_progressbar=False)#

Function to add gdp data to arbitrary number of shapes in a country.

Inputs:#

df_gadm: Geodataframe with one Multipolygon per row
  • Essential column [“country”, “geometry”]

  • Non-essential column [“GADM_ID”]

Outputs:#

df_gadm: Geodataframe with one Multipolygon per row
  • Same columns as input

  • Includes a new column [“gdp”]

build_shapes.add_population_data(df_gadm, country_codes, worldpop_method, year=2020, update=False, out_logging=False, mem_read_limit_per_process=1024, nprocesses=2, disable_progressbar=False)#

Function to add population data to arbitrary number of shapes in a country. It loads data from WorldPop raster files where each pixel represents the population in that square region. Each square polygon (or pixel) is then mapped into the corresponding GADM shape. Then the population in a GADM shape is identified by summing over all pixels mapped to that region.

This is performed with an iterative approach:

  1. All necessary WorldPop data tiff file are downloaded

  2. The so-called windows are created to handle RAM limitations related to large WorldPop files. Large WorldPop files require significant RAM to handle, which may not be available, hence, the entire activity is decomposed into multiple windows (or tasks). Each window represents a subset of a raster file on which the following algorithm is applied. Note: when enough RAM is available only a window is created for efficiency purposes.

  3. Execute all tasks by summing the values of the pixels mapped into each GADM shape. Parallelization applies in this task.

Inputs:#

df_gadm: Geodataframe with one Multipolygon per row
  • Essential column [“country”, “geometry”]

  • Non-essential column [“GADM_ID”]

Outputs:#

df_gadm: Geodataframe with one Multipolygon per row
  • Same columns as input

  • Includes a new column [“pop”]

build_shapes.calculate_transform_and_coords_for_window(current_transform, window_dimensions, original_window=False)#

Function which calculates the [lat,long] corners of the window given window_dimensions, if not(original_window) it also changes the affine transform to match the window.

Inputs:#

  • current_transform: affine transform of source image

  • window_dimensions: dimensions of window used when reading file

  • original_window: boolean to track if window covers entire country

Outputs:#

A list of: [

adjusted_transform: affine transform adjusted to window coordinate_topleft: [latitude, longitude] of top left corner of the window coordinate_botright: [latitude, longitude] of bottom right corner of the window ]

build_shapes.compute_geomask_region(country_rows, affine_transform, window_dimensions, latlong_topleft, latlong_botright)#

Function to mask geometries into np_map_ID using an incrementing counter.

Inputs:#

country_rows: geoDataFrame filled with geometries and their GADM_ID affine_transform: affine transform of current window window_dimensions: dimensions of window used when reading file latlong_topleft: [latitude, longitude] of top left corner of the window latlong_botright: [latitude, longitude] of bottom right corner of the window

Outputs:#

np_map_ID.astype(“H”): np_map_ID contains an ID for each location (undefined is 0)

dimensions are taken from window_dimensions, .astype(“H”) for memory savings

id_result:

DataFrame of the mapping from id (from counter) to GADM_ID

build_shapes.convert_GDP(name_file_nc, year=2015, out_logging=False)#

Function to convert the nc database of the GDP to tif, based on the work at https://doi.org/10.1038/sdata.2018.4. The dataset shall be downloaded independently by the user (see guide) or together with pypsa-earth package.

build_shapes.countries(countries, geo_crs, contended_flag, update=False, out_logging=False)#

Create country shapes

build_shapes.download_GADM(country_code, update=False, out_logging=False)#

Download gpkg file from GADM for a given country code.

Parameters:
  • country_code (str) – Two letter country codes of the downloaded files

  • update (bool) – Update = true, forces re-download of files

Return type:

gpkg file per country

build_shapes.download_WorldPop(country_code, worldpop_method, year=2020, update=False, out_logging=False, size_min=300)#

Download Worldpop using either the standard method or the API method.

Parameters:
  • worldpop_method (str) – worldpop_method = “api” will use the API method to access the WorldPop 100mx100m dataset. worldpop_method = “standard” will use the standard method to access the WorldPop 1KMx1KM dataset.

  • country_code (str) – Two letter country codes of the downloaded files. Files downloaded from https://data.worldpop.org/ datasets WorldPop UN adjusted

  • year (int) – Year of the data to download

  • update (bool) – Update = true, forces re-download of files

  • size_min (int) – Minimum size of each file to download

build_shapes.download_WorldPop_API(country_code, year=2020, update=False, out_logging=False, size_min=300)#

Download tiff file for each country code using the api method from worldpop API with 100mx100m resolution.

Parameters:
  • country_code (str) – Two letter country codes of the downloaded files. Files downloaded from https://data.worldpop.org/ datasets WorldPop UN adjusted

  • year (int) – Year of the data to download

  • update (bool) – Update = true, forces re-download of files

  • size_min (int) – Minimum size of each file to download

Returns:

  • WorldPop_inputfile (str) – Path of the file

  • WorldPop_filename (str) – Name of the file

build_shapes.download_WorldPop_standard(country_code, year=2020, update=False, out_logging=False, size_min=300)#

Download tiff file for each country code using the standard method from worldpop datastore with 1kmx1km resolution.

Parameters:
  • country_code (str) – Two letter country codes of the downloaded files. Files downloaded from https://data.worldpop.org/ datasets WorldPop UN adjusted

  • year (int) – Year of the data to download

  • update (bool) – Update = true, forces re-download of files

  • size_min (int) – Minimum size of each file to download

Returns:

  • WorldPop_inputfile (str) – Path of the file

  • WorldPop_filename (str) – Name of the file

build_shapes.eez(countries, geo_crs, country_shapes, EEZ_gpkg, out_logging=False, distance=0.01, minarea=0.01, tolerance=0.01)#

Creates offshore shapes by buffer smooth countryshape (=offset country shape) and differ that with the offshore shape which leads to for instance a 100m non-build coastline.

build_shapes.generalized_mask(src, geom, **kwargs)#

Generalize mask function to account for Polygon and MultiPolygon

build_shapes.generate_df_tasks(c_code, mem_read_limit_per_process, WorldPop_inputfile)#

Function to generate a list of tasks based on the memory constraints.

One task represents a single window of the image

Inputs:#

c_code: country code mem_read_limit_per_process: memory limit for src.read() operation WorldPop_inputfile: file location of worldpop file

Outputs:#

Dataframe of task_list

build_shapes.get_GADM_filename(country_code)#

Function to get the GADM filename given the country code.

build_shapes.get_GADM_layer(country_list, layer_id, geo_crs, contended_flag, update=False, outlogging=False)#

Function to retrieve a specific layer id of a geopackage for a selection of countries.

Parameters:
  • country_list (str) – List of the countries

  • layer_id (int) – Layer to consider in the format GID_{layer_id}. When the requested layer_id is greater than the last available layer, then the last layer is selected. When a negative value is requested, then, the last layer is requested

build_shapes.get_worldpop_val_xy(WorldPop_inputfile, window_dimensions)#

Function to extract data from .tif input file.

Inputs:#

WorldPop_inputfile: file location of worldpop file window_dimensions: dimensions of window used when reading file

Outputs:#

np_pop_valid: array filled with values for each nonzero pixel in the worldpop file np_pop_xy: array with [x,y] coordinates of the corresponding nonzero values in np_pop_valid

build_shapes.load_EEZ(countries_codes, geo_crs, EEZ_gpkg='./data/eez/eez_v11.gpkg')#

Function to load the database of the Exclusive Economic Zones.

The dataset shall be downloaded independently by the user (see guide) or together with pypsa-earth package.

build_shapes.load_GDP(countries_codes, year=2015, update=False, out_logging=False, name_file_nc='GDP_PPP_1990_2015_5arcmin_v2.nc')#

Function to load the database of the GDP, based on the work at https://doi.org/10.1038/sdata.2018.4. The dataset shall be downloaded independently by the user (see guide) or together with pypsa-earth package.

build_shapes.loop_and_extact_val_x_y(np_pop_count, np_pop_val, np_pop_xy, region_geomask, dict_id)#

Function that will be compiled using @njit (numba) It takes all the population values from np_pop_val and stores them in np_pop_count.

where each location in np_pop_count is mapped to a GADM_ID through dict_id (id_mapping by extension)

Inputs:#

np_pop_count: np.zeros array, which will store population counts np_pop_val: array filled with values for each nonzero pixel in the worldpop file np_pop_xy: array with [x,y] coordinates of the corresponding nonzero values in np_pop_valid region_geomask: array with dimensions of window, values are keys that map to GADM_ID using id_mapping dict_id: numba typed.dict containing id_mapping.index -> location in np_pop_count

Outputs:#

np_pop_count: np.array containing population counts

build_shapes.process_function_population(row_id)#

Function that reads the task from df_tasks and executes all the methods.

to obtain population values for the specified region

Inputs:#

row_id: integer which indicates a specific row of df_tasks

Outputs:#

windowed_pop_count: Dataframe containing “GADM_ID” and “pop” columns It represents the amount of population per region (GADM_ID), for the settings given by the row in df_tasks

build_shapes.sum_values_using_geomask(np_pop_val, np_pop_xy, region_geomask, id_mapping)#

Function that sums all the population values in np_pop_val into the correct GADM_ID It uses np_pop_xy to access the key stored in region_geomask[x][y]

The relation of this key to GADM_ID is stored in id_mapping

Inputs:#

np_pop_val: array filled with values for each nonzero pixel in the worldpop file np_pop_xy: array with [x,y] coordinates of the corresponding nonzero values in np_pop_valid region_geomask: array with dimensions of window, values are keys that map to GADM_ID using id_mapping id_mapping: Dataframe that contains mappings of region_geomask values to GADM_IDs

Outputs:#

df_pop_count: Dataframe with columns
  • “GADM_ID”

  • “pop” containing population of GADM_ID region

cluster_network#

Creates networks clustered to {cluster} number of zones with aggregated buses, generators and transmission corridors.

Relevant Settings#

clustering:
    aggregation_strategies:

focus_weights:

solving:
    solver:
        name:

lines:
    length_factor:

See also

Documentation of the configuration file config.yaml at Top-level configuration, renewable, solving, lines

Inputs#

Outputs#

  • resources/regions_onshore_elec_s{simpl}_{clusters}.geojson:

  • resources/regions_offshore_elec_s{simpl}_{clusters}.geojson:

  • resources/busmap_elec_s{simpl}_{clusters}.csv: Mapping of buses from networks/elec_s{simpl}.nc to networks/elec_s{simpl}_{clusters}.nc;

  • resources/linemap_elec_s{simpl}_{clusters}.csv: Mapping of lines from networks/elec_s{simpl}.nc to networks/elec_s{simpl}_{clusters}.nc;

  • networks/elec_s{simpl}_{clusters}.nc:

Description#

Note

Why is clustering used both in simplify_network and cluster_network ?

Consider for example a network networks/elec_s100_50.nc in which simplify_network clusters the network to 100 buses and in a second step cluster_network` reduces it down to 50 buses.

In preliminary tests, it turns out, that the principal effect of changing spatial resolution is actually only partially due to the transmission network. It is more important to differentiate between wind generators with higher capacity factors from those with lower capacity factors, i.e. to have a higher spatial resolution in the renewable generation than in the number of buses.

The two-step clustering allows to study this effect by looking at networks like networks/elec_s100_50m.nc. Note the additional m in the {cluster} wildcard. So in the example network there are still up to 100 different wind generators.

In combination these two features allow you to study the spatial resolution of the transmission network separately from the spatial resolution of renewable generators.

Is it possible to run the model without the simplify_network rule?

No, the network clustering methods in the PyPSA module pypsa.clustering.spatial do not work reliably with multiple voltage levels and transformers.

Tip

The rule cluster_all_networks runs for all scenario s in the configuration file the rule cluster_network.

Exemplary unsolved network clustered to 512 nodes:

_images/elec_s_512.png

Exemplary unsolved network clustered to 256 nodes:

_images/elec_s_256.png

Exemplary unsolved network clustered to 128 nodes:

_images/elec_s_128.png

Exemplary unsolved network clustered to 37 nodes:

_images/elec_s_37.png
cluster_network.distribute_clusters(inputs, build_shape_options, country_list, distribution_cluster, n, n_clusters, focus_weights=None, solver_name=None)#

Determine the number of clusters per country.

build_osm_network#

build_osm_network.add_buses_to_empty_countries(country_list, fp_country_shapes, buses)#

Function to add a bus for countries missing substation data.

build_osm_network.connect_stations_same_station_id(lines, buses)#

Function to create fake links between substations with the same substation_id.

build_osm_network.fix_overpassing_lines(lines, buses, distance_crs, tol=1)#

Function to avoid buses overpassing lines with no connection when the bus is within a given tolerance from the line.

Parameters:
  • lines (GeoDataFrame) – Geodataframe of lines

  • buses (GeoDataFrame) – Geodataframe of substations

  • tol (float) – Tolerance in meters of the distance between the substation and the line below which the line will be split

build_osm_network.force_ac_lines(df, col='tag_frequency')#

Function that forces all PyPSA lines to be AC lines.

A network can contain AC and DC power lines that are modelled as PyPSA “Line” component. When DC lines are available, their power flow can be controlled by their converter. When it is artificially converted into AC, this feature is lost. However, for debugging and preliminary analysis, it can be useful to bypass problems.

build_osm_network.get_ac_frequency(df, fr_col='tag_frequency')#

# Function to define a default frequency value.

Attempts to find the most usual non-zero frequency across the dataframe; 50 Hz is assumed as a back-up value

build_osm_network.get_converters(buses, lines)#

Function to create fake converter lines that connect buses of the same station_id of different polarities.

build_osm_network.get_transformers(buses, lines)#

Function to create fake transformer lines that connect buses of the same station_id at different voltage.

build_osm_network.merge_stations_lines_by_station_id_and_voltage(lines, buses, geo_crs, distance_crs, tol=2000)#

Function to merge close stations and adapt the line datasets to adhere to the merged dataset.

build_osm_network.merge_stations_same_station_id(buses, delta_lon=0.001, delta_lat=0.001, precision=4)#

Function to merge buses with same voltage and station_id This function iterates over all substation ids and creates a bus_id for every substation and voltage level.

Therefore, a substation with multiple voltage levels is represented with different buses, one per voltage level

build_osm_network.set_lines_ids(lines, buses, distance_crs)#

Function to set line buses ids to the closest bus in the list.

build_osm_network.set_lv_substations(buses)#

Function to set what nodes are lv, thereby setting substation_lv The current methodology is to set lv nodes to buses where multiple voltage level are found, hence when the station_id is duplicated.

build_osm_network.set_substations_ids(buses, distance_crs, tol=2000)#

Function to set substations ids to buses, accounting for location tolerance.

The algorithm is as follows:

  1. initialize all substation ids to -1

  2. if the current substation has been already visited [substation_id < 0], then skip the calculation

  3. otherwise:
    1. identify the substations within the specified tolerance (tol)

    2. when all the substations in tolerance have substation_id < 0, then specify a new substation_id

    3. otherwise, if one of the substation in tolerance has a substation_id >= 0, then set that substation_id to all the others; in case of multiple substations with substation_ids >= 0, the first value is picked for all

clean_osm_data#

clean_osm_data.clean_cables(df)#

Function to clean the raw cables column: manual fixing and drop undesired values

clean_osm_data.clean_circuits(df)#

Function to clean the raw circuits column: manual fixing and clean nan values

clean_osm_data.clean_frequency(df, default_frequency='50')#

Function to clean raw frequency column: manual fixing and fill nan values

clean_osm_data.clean_voltage(df)#

Function to clean the raw voltage column: manual fixing and drop nan values

clean_osm_data.create_extended_country_shapes(country_shapes, offshore_shapes, tolerance=0.01)#

Obtain the extended country shape by merging on- and off-shore shapes.

clean_osm_data.explode_rows(df, cols)#

Function that explodes the rows as specified in cols, including warning alerts for unexpected values.

Example

row 1: [50,50], [33000, 110000]

after explode_rows applied on the two columns becomes row 1: 50, 33000 row 2: 50, 110000

clean_osm_data.fill_circuits(df)#

This function fills the rows circuits column so that the size of each list element matches the size of the list in the frequency column.

Multiple procedure are adopted:

  1. In the rows of circuits where the number of elements matches the number of the frequency column, nothing is done

  2. Where the number of elements in the cables column match the ones in the frequency column, then the values of cables are used.

  3. Where the number of elements in cables exceed those in frequency, the cables elements are downscaled and the last values of cables are summed. Let’s assume that cables is [3,3,3] but frequency is [50,50]. With this procedures, cables is treated as [3,6] and used for calculating the circuits

  4. Where the number in cables has an unique number, e.g. [‘6’], but frequency does not, e.g. [‘50’, ‘50’], then distribute the cables proportionally across the values. Note: the distribution accounts for the frequency type; when the frequency is 50 or 60, then a circuit requires 3 cables, when DC (0 frequency) is used, a circuit requires 2 cables.

  5. Where no information of cables or circuits is available, a circuit is assumed for every frequency entry.

clean_osm_data.filter_circuits(df, min_value_circuit=0.1)#

Filters df to contain only lines with circuit value above min_value_circuit.

clean_osm_data.filter_frequency(df, accepted_values=[50, 60, 0], threshold=0.1)#

Filters df to contain only lines with frequency with accepted_values.

clean_osm_data.filter_voltage(df, threshold_voltage=35000)#

Filters df to contain only lines with voltage above threshold_voltage.

clean_osm_data.finalize_lines_type(df_lines)#

This function is aimed at finalizing the type of the columns of the dataframe.

clean_osm_data.finalize_substation_types(df_all_substations)#

Specify bus_id and voltage columns as integer.

clean_osm_data.find_first_overlap(geom, country_geoms, default_name)#

Return the first index whose shape intersects the geometry.

clean_osm_data.integrate_lines_df(df_all_lines, distance_crs)#

Function to add underground, under_construction, frequency and circuits.

clean_osm_data.load_network_data(network_asset, data_options)#

Function to check if OSM or custom data should be considered.

The network_asset should be a string named “lines”, “cables” or “substations”.

clean_osm_data.prepare_generators_df(df_all_generators)#

Prepare the dataframe for generators.

clean_osm_data.prepare_lines_df(df_lines)#

This function prepares the dataframe for lines and cables.

Parameters:

df_lines (dataframe) – Raw lines or cables dataframe as downloaded from OpenStreetMap

clean_osm_data.prepare_substation_df(df_all_substations)#

Prepare raw substations dataframe to the structure compatible with PyPSA- Eur.

Parameters:

df_all_substations (dataframe) – Raw substations dataframe as downloaded from OpenStreetMap

clean_osm_data.set_countryname_by_shape(df, ext_country_shapes, exclude_external=True, col_country='country')#

Set the country name by the name shape

clean_osm_data.set_name_by_closestcity(df_all_generators, colname='name')#

Function to set the name column equal to the name of the closest city.

clean_osm_data.set_unique_id(df, col)#

Create unique id’s, where id is specified by the column “col” The steps below create unique bus id’s without losing the original OSM bus_id.

Unique bus_id are created by simply adding -1,-2,-3 to the original bus_id Every unique id gets a -1 If a bus_id exist i.e. three times it it will the counted by cumcount -1,-2,-3 making the id unique

Parameters:
  • df (dataframe) – Dataframe considered for the analysis

  • col (str) – Column name for the analyses; examples: “bus_id” for substations or “line_id” for lines

clean_osm_data.split_and_match_voltage_frequency_size(df)#

Function to match the length of the columns in subset by duplicating the last value in the column.

The function does as follows:

  1. First, it splits voltage and frequency columns by semicolon For example, the following lines row 1: ‘50’, ‘220000 row 2: ‘50;50;50’, ‘220000;380000’

    become: row 1: [‘50’], [‘220000’] row 2: [‘50’,’50’,’50’], [‘220000’,’380000’]

  2. Then, it harmonize each row to match the length of the lists by filling the missing values with the last elements of each list. In agreement to the example of before, after the cleaning:

    row 1: [‘50’], [‘220000’] row 2: [‘50’,’50’,’50’], [‘220000’,’380000’,’380000’]

clean_osm_data.split_cells(df, cols=['voltage'])#

Split semicolon separated cells i.e. [66000;220000] and create new identical rows.

Parameters:
  • df (dataframe) – Dataframe under analysis

  • cols (list) – List of target columns over which to perform the analysis

Example

Original data: row 1: ‘66000;220000’, ‘50’

After applying split_cells(): row 1, ‘66000’, ‘50’ row 2, ‘220000’, ‘50’

download_osm_data#

Python interface to download OpenStreetMap data Documented at pypsa-meets-earth/earth-osm

Relevant Settings#

None # multiprocessing & infrastructure selection can be an option in future

Inputs#

None

Outputs#

  • data/osm/pbf: Raw OpenStreetMap data as .pbf files per country

  • data/osm/power: Filtered power data as .json files per country

  • data/osm/out: Prepared power data as .geojson and .csv files per country

  • resources/osm/raw: Prepared and per type (e.g. cable/lines) aggregated power data as .geojson and .csv files

download_osm_data.convert_iso_to_geofk(iso_code, iso_coding=True, convert_dict={'AE': 'QA-AE-OM-BH-KW', 'AG': 'central-america', 'AS': 'american-oceania', 'AW': 'central-america', 'AX': 'finland', 'BB': 'central-america', 'BH': 'QA-AE-OM-BH-KW', 'BM': 'north-america', 'BN': 'MY', 'BS': 'bahamas', 'CP': 'ile-de-clipperton', 'CU': 'cuba', 'CW': 'central-america', 'DM': 'central-america', 'DO': 'haiti-and-domrep', 'EH': 'MA', 'FK': 'south-america', 'GD': 'central-america', 'GF': 'south-america', 'GG': 'guernsey-jersey', 'GM': 'SN-GM', 'GP': 'guadeloupe', 'GU': 'american-oceania', 'HK': 'china', 'HT': 'haiti-and-domrep', 'IC': 'canary-islands', 'IL': 'PS-IL', 'IM': 'isle-of-man', 'JE': 'guernsey-jersey', 'JM': 'jamaica', 'KM': 'comores', 'KN': 'central-america', 'KW': 'QA-AE-OM-BH-KW', 'KY': 'central-america', 'LC': 'central-america', 'MH': 'marshall-islands', 'MO': 'china', 'MP': 'american-oceania', 'NF': 'AU', 'OM': 'QA-AE-OM-BH-KW', 'PA': 'panama', 'PF': 'polynesie-francaise', 'PN': 'pitcairn-islands', 'PS': 'PS-IL', 'QA': 'QA-AE-OM-BH-KW', 'RE': 'reunion', 'SA': 'QA-AE-OM-BH-KW', 'SG': 'MY', 'SM': 'IT', 'SN': 'SN-GM', 'SX': 'central-america', 'TC': 'central-america', 'TK': 'tokelau', 'TT': 'central-america', 'VA': 'IT', 'VC': 'central-america', 'VU': 'vanuatu', 'WF': 'wallis-et-futuna', 'XK': 'RS-KM', 'YT': 'mayotte'})#

Function to convert the iso code name of a country into the corresponding geofabrik In Geofabrik, some countries are aggregated, thus if a single country is requested, then all the agglomeration shall be downloaded For example, Senegal (SN) and Gambia (GM) cannot be found alone in geofabrik, but they can be downloaded as a whole SNGM.

The conversion directory, initialized to iso_to_geofk_dict is used to perform such conversion When a two-letter code country is found in convert_dict, and iso_coding is enabled, then that two-letter code is converted into the corresponding value of the dictionary

Parameters:
  • iso_code (str) – Two-code country code to be converted

  • iso_coding (bool) – When true, the iso to geofk is performed

  • convert_dict (dict) – Dictionary used to apply the conversion iso to geofk The keys correspond to the countries iso codes that need a different region to be downloaded

download_osm_data.country_list_to_geofk(country_list)#

Convert the requested country list into geofk norm.

Parameters:

input (str) – Any two-letter country name or aggregation of countries given in the regions config file Country name duplications won’t distort the result. Examples are: [“NG”,”ZA”], downloading osm data for Nigeria and South Africa [“SNGM”], downloading data for Senegal&Gambia shape [“NG”,”ZA”,”NG”], won’t distort result.

Returns:

full_codes_list – Example [“NG”,”ZA”]

Return type:

list

simplify_network#

make_statistics#

Create statistics for a given scenario run.

This script contains functions to create statistics of the workflow for the current execution

Relevant statistics that are created are:

  • For clean_osm_data and download_osm_data, the number of elements, length of the lines and length of dc lines are stored

  • For build_shapes, the surface, total GDP, total population and number of shapes are collected

  • For build_renewable_profiles, total available potential and average production are collected

  • For network rules (base_network, add_electricity, simplify_network and solve_network), length of lines, number of buses and total installed capacity by generation technology

  • Execution time for the rules, when benchmark is available

Outputs#

This rule creates a dataframe containing in the columns the relevant statistics for the current run.

make_statistics.add_computational_stats(df, snakemake, column_name=None)#

Add the major computational information of a given rule into the existing dataframe.

make_statistics.aggregate_computational_stats(name, dict_dfs)#

Function to aggregate the total computational statistics of the rules.

make_statistics.calculate_stats(scenario_config, renewable_config, renewable_carriers_config, metric_crs='EPSG:3857', area_crs='ESRI:54009')#

Function to collect all statistics

make_statistics.collect_basic_osm_stats(path, rulename, header)#

Collect basic statistics on OSM data: number of items

make_statistics.collect_bus_regions_stats(bus_region_rule='build_bus_regions')#

Collect statistics on bus regions.

  • number of onshore regions

  • number of offshore regions

make_statistics.collect_clean_osm_stats(rulename='clean_osm_data', metric_crs='EPSG:3857')#

Collect statistics on OSM data; used for clean OSM data.

make_statistics.collect_network_osm_stats(path, rulename, header, metric_crs='EPSG:3857')#

Collect statistics on OSM network data: - number of items - length of the stored shapes - length of objects with tag_frequency == 0 (DC elements)

make_statistics.collect_network_stats(network_rule, scenario_config)#

Collect statistics on pypsa networks: - installed capacity by carrier - lines total length (accounting for parallel lines) - lines total capacity

make_statistics.collect_only_computational(rulename)#

Rule to create only computational statistics of rule rulename.

make_statistics.collect_osm_stats(rulename, **kwargs)#

Collect statistics on OSM data.

When lines and cables are considered, then network-related statistics are collected (collect_network_osm_stats), otherwise basic statistics are (collect_basic_osm_stats)

make_statistics.collect_raw_osm_stats(rulename='download_osm_data', metric_crs='EPSG:3857')#

Collect basic statistics on OSM data; used for raw OSM data.

make_statistics.collect_renewable_stats(rulename, technology)#

Collect statistics on the renewable time series generated by the workflow: - potential - average production by plant (hydro) or bus (other RES)

make_statistics.collect_shape_stats(rulename='build_shapes', area_crs='ESRI:54009')#

Collect statistics on the shapes created by the workflow: - area - number of gadm shapes - Percentage of shapes having country flag matching the gadm file - total population - total gdp

make_statistics.collect_snakemake_stats(name, dict_dfs, renewable_config, renewable_carriers_config)#

Collect statistics on what rules have been successful.

make_statistics.generate_scenario_by_country(path_base, country_list, out_dir='configs/scenarios', pre='config.')#

Utility function to create copies of a standard yaml file available in path_base for every country in country_list. Copies are saved into the output directory out_dir.

Note: - the clusters are automatically modified for selected countries with limited data - for landlocked countries, offwind technologies are removed (solar, onwind and hydro are forced)

Parameters:
  • path_base (str) – Path to the standard yaml file used as default

  • country_list (list) – List of countries. Note: the input is parsed using download_osm_data.create_country_list

  • out_dir (str (optional)) – Output directory where output configuration files are executed

monte_carlo#

Prepares network files with monte-carlo parameter sweeps for solving process.

Relevant Settings#

monte_carlo:
options:
    add_to_snakefile: false # When set to true, enables Monte Carlo sampling
    samples: 9 # number of optimizations. Note that number of samples when using scipy has to be the square of a prime number
    sampling_strategy: "chaospy"  # "pydoe2", "chaospy", "scipy", packages that are supported
    seed: 42 # set seedling for reproducibilty
uncertainties:
    loads_t.p_set:
      type: uniform
      args: [0, 1]
    generators_t.p_max_pu.loc[:, n.generators.carrier == "onwind"]:
      type: lognormal
      args: [1.5]
    generators_t.p_max_pu.loc[:, n.generators.carrier == "solar"]:
      type: beta
      args: [0.5, 2]

See also

Documentation of the configuration file config.yaml at monte_carlo

Inputs#

  • networks/elec_s_10_ec_lcopt_Co2L-24H.nc

Outputs#

  • networks/elec_s_10_ec_lcopt_Co2L-24H_{unc}.nc

e.g. networks/elec_s_10_ec_lcopt_Co2L-24H_m0.nc

networks/elec_s_10_ec_lcopt_Co2L-24H_m1.nc …

Description#

PyPSA-Earth is deterministic which means that a set of inputs give a set of outputs. Parameter sweeps can help to explore the uncertainty of the outputs cause by parameter changes. Many are familiar with the classical “sensitivity analysis” that can be applied by varying the input of only one feature, while exploring its outputs changes. Here implemented is a “global sensitivity analysis” that can help to explore the multi-dimensional uncertainty space when more than one feature are changed at the same time.

To do so, the scripts is separated in two building blocks: One creates the experimental design, the other, modifies and outputs the network file. Building the experimental design is currently supported by the packages pyDOE2, chaospy and scipy. This should give users the freedom to explore alternative approaches. The orthogonal latin hypercube sampling is thereby found as most performant, hence, implemented here. Sampling the multi-dimensional uncertainty space is relatively easy. It only requires two things: The number of samples (defines the number of total networks to be optimized) and features (pypsa network object e.g loads_t.p_set or generators_t.p_max_pu). This results in an experimental design of the dimension (samples X features).

The experimental design lh (dimension: sample X features) is used to modify the PyPSA networks. Thereby, this script creates samples x amount of networks. The iterators comes from the wildcard {unc}, which is described in the config.yaml and created in the Snakefile as a range from 0 to (total number of) SAMPLES.

monte_carlo.monte_carlo_sampling_chaospy(N_FEATURES: int, SAMPLES: int, uncertainties_values: dict, seed: int, rule: str = 'latin_hypercube') ndarray#

Creates Latin Hypercube Sample (LHS) implementation from chaospy.

Documentation on Chaospy: clicumu/pyDOE2 (fixes latin_cube errors) Documentation on Chaospy latin-hyper cube (quasi-Monte Carlo method): https://chaospy.readthedocs.io/en/master/user_guide/fundamentals/quasi_random_samples.html#Quasi-random-samples

monte_carlo.monte_carlo_sampling_pydoe2(N_FEATURES: int, SAMPLES: int, uncertainties_values: dict, random_state: int, criterion: str = None, iteration: int = None, correlation_matrix: ndarray = None) ndarray#

Creates Latin Hypercube Sample (LHS) implementation from PyDOE2 with various options. Additionally, all “corners” are simulated.

Adapted from Disspaset: energy-modelling-toolkit/Dispa-SET Documentation on PyDOE2: clicumu/pyDOE2 (fixes latin_cube errors)

monte_carlo.monte_carlo_sampling_scipy(N_FEATURES: int, SAMPLES: int, uncertainties_values: dict, seed: int, strength: int = 2, optimization: str = None) ndarray#

Creates Latin Hypercube Sample (LHS) implementation from SciPy with various options:

  • Center the point within the multi-dimensional grid, centered=True

  • Optimization scheme, optimization=”random-cd”

  • Strength=1, classical LHS

  • Strength=2, performant orthogonal LHS, requires the sample to be square of a prime e.g. sq(11)=121

Options could be combined to produce an optimized centered orthogonal array based LHS. After optimization, the result would not be guaranteed to be of strength 2.

Documentation for Quasi-Monte Carlo approaches: https://docs.scipy.org/doc/scipy/reference/stats.qmc.html Documentation for Latin Hypercube: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.qmc.LatinHypercube.html#scipy.stats.qmc.LatinHypercube Orthogonal LHS is better than basic LHS: scipy/scipy#files, https://en.wikipedia.org/wiki/Latin_hypercube_sampling

monte_carlo.rescale_distribution(latin_hypercube: ndarray, uncertainties_values: dict) ndarray#

Rescales a Latin hypercube sampling (LHS) using specified distribution parameters. More information on the distributions can be found here https://docs.scipy.org/doc/scipy/reference/stats.html

Parameters:

  • latin_hypercube (np.array): The Latin hypercube sampling to be rescaled.

  • uncertainties_values (list): List of dictionaries containing distribution information.

Each dictionary should have ‘type’ key specifying the distribution type and ‘args’ key containing parameters specific to the chosen distribution.

Returns:

  • np.array: Rescaled Latin hypercube sampling with values in the range [0, 1].

Supported Distributions:

  • “uniform”: Rescaled to the specified lower and upper bounds.

  • “normal”: Rescaled using the inverse of the normal distribution function with specified mean and std.

  • “lognormal”: Rescaled using the inverse of the log-normal distribution function with specified mean and std.

  • “triangle”: Rescaled using the inverse of the triangular distribution function with mean calculated from given parameters.

  • “beta”: Rescaled using the inverse of the beta distribution function with specified shape parameters.

  • “gamma”: Rescaled using the inverse of the gamma distribution function with specified shape and scale parameters.

Note:

  • The function supports rescaling for uniform, normal, lognormal, triangle, beta, and gamma distributions.

  • The rescaled samples will have values in the range [0, 1].

monte_carlo.validate_parameters(sampling_strategy: str, samples: int, uncertainties_values: dict) None#

Validates the parameters for a given probability distribution. Inputs from user through the config file needs to be validated before proceeding to perform monte-carlo simulations.

Parameters:

  • sampling_strategy: str

    The chosen sampling strategy from chaospy, scipy and pydoe2

  • samples: int

    The number of samples to generate for the simulation

  • distribution: str

    The name of the probability distribution.

  • distribution_params: list

    The parameters associated with the probability distribution.

Raises:

  • ValueError: If the parameters are invalid for the specified distribution.