Skip to content

Rule build_shapes

Workflow Diagram

See the complete workflow in the repository.

digraph snakemake_dag { graph [bgcolor=white, margin=0, size="8,5" ]; node [fontname=sans, fontsize=10, penwidth=2, shape=box, style=rounded ]; edge [color=grey, penwidth=2 ]; 4 [color="0.61 0.6 0.85", label=add_electricity]; 5 [color="0.19 0.6 0.85", label=build_bus_regions]; 6 [color="0.17 0.6 0.85", label=base_network]; 8 [color="0.00 0.6 0.85", fillcolor=gray, label=build_shapes, style=filled]; 8 -> 4; 8 -> 5; 8 -> 6; 9 [color="0.22 0.6 0.85", label=build_renewable_profiles]; 8 -> 9; 10 [color="0.11 0.6 0.85", label=build_hydro_profile]; 8 -> 10; }

Script Documentation

get_GADM_filename(country_code)

Function to get the GADM filename given the country code.

Parameters

country_code : str Two letter country codes of the downloaded files

Returns

str GADM filename corresponding to the country code

download_GADM(country_code, update=False, out_logging=False)

Download gpkg file from GADM for a given country code.

Parameters

country_code : str Two letter country codes of the downloaded files update : bool Update = true, forces re-download of files

Returns

GADM_inputfile_gpkg : str Path of the downloaded gpkg file GADM_filename : str Name of the gpkg file per country

filter_gadm(geodf, layer, cc, contended_flag, output_nonstd_to_csv=False)

Function to filter the GADM geodataframe according to the contented_flag option.

Parameters

geodf : gpd.GeoDataFrame Geodataframe to filter layer : int Layer of the geodataframe to filter cc : str Country code to filter contended_flag : str Option to treat contended areas, i.e. areas that are not assigned to the country in the GADM layer but are part of the country according to the country code (GID_0) of the geodataframe. The options are: - "drop": drop contended areas from the geodataframe - "set_by_country": set GID_0 of contended areas to the country code of the country (default) output_nonstd_to_csv : bool If True, outputs the non-standard rows to a csv file for debugging purposes (default False)

Returns

gpd.GeoDataFrame Filtered GADM geodataframe

get_GADM_layer(country_list, layer_id, geo_crs='EPSG:4326', contended_flag='set_by_country', update=False, outlogging=False)

Function to retrieve a specific layer id of a geopackage for a selection of countries.

Parameters

country_list : list[str] List of the countries layer_id : int Layer to consider in the format GID_{layer_id}. When the requested layer_id is greater than the last available layer, then the last layer is selected. When a negative value is requested, then, the last layer is requested geo_crs : str CRS used for geographic projection, passed to GeoPandas (e.g. "EPSG:4326") contended_flag : str Option to treat contended areas, i.e. areas that are not assigned to the country in the GADM layer but are part of the country according to the country code (GID_0) of the geodataframe. The options are: - "drop": drop contended areas from the geodataframe - "set_by_country": set GID_0 of contended areas to the country code of the country (default) update : bool Update = true, forces re-download of files outlogging : bool If True, emits progress information via the module logger.

Returns

gpd.GeoDataFrame Geodataframe with the requested GADM layer for the selected countries

countries(countries, geo_crs, contended_flag, update=False, out_logging=False, tolerance=0.01, minarea=0.01)

Create country shapes

Parameters

countries : list[str] List of the countries geo_crs : str CRS used for geographic projection, passed to GeoPandas (e.g. "EPSG:4326") contended_flag : str Flag indicating whether to include contended areas update : bool Update = true, forces re-download of files out_logging : bool If True, emits progress information via the module logger. tolerance : float Tolerance for the simplification (default 0.01) minarea : float Minimum area of the polygons to keep (default 0.01)

Returns

gpd.GeoSeries Geoseries with the country shapes

country_cover(country_shapes, eez_shapes=None, out_logging=False, distance=0.02)

Create a continent shape by merging the country shapes and the EEZ shapes (if provided) with a buffer.

Parameters

country_shapes : gpd.GeoSeries Geoseries with the country shapes eez_shapes : gpd.GeoSeries, optional Geoseries with the EEZ shapes out_logging : bool If True, emits progress information via the module logger. distance : float Distance for the buffer (default 0.02)

Returns

gpd.GeoSeries Geoseries with the continent shape

load_EEZ(countries_codes, geo_crs, EEZ_gpkg='./data/eez/eez_v11.gpkg')

Function to load the database of the Exclusive Economic Zones.

The dataset shall be downloaded independently by the user (see guide) or together with pypsa-earth package.

Parameters

countries_codes : list[str] List of two-letter ISO country codes. geo_crs : str CRS used for geographic projection, passed to GeoPandas (e.g. "EPSG:4326"). EEZ_gpkg : str, default "./data/eez/eez_v11.gpkg" Path to the Marine Regions World EEZ v11 geopackage.

Returns

gpd.GeoDataFrame GeoDataFrame with the EEZ geometries.

eez(countries, geo_crs, country_shapes, EEZ_gpkg, out_logging=False, distance=0.0, minarea=0.01, tolerance=0.01, simplify_gadm=True)

Build offshore (EEZ) shapes for the requested countries.

The function loads EEZ geometries, unions them per country, and optionally simplifies them. If distance is non-zero, the onshore country shapes are buffered and subtracted from EEZ geometries to create an "offshore-only" region (e.g. to enforce a non-build coastal strip).

Parameters

countries : list[str] Two-letter ISO country codes to process (e.g. ["DE", "FR"]). geo_crs : str CRS used for geographic projection, passed to GeoPandas (e.g."EPSG:4326"). Note: buffering distances depend on the CRS units. country_shapes : geopandas.GeoSeries or geopandas.GeoDataFrame Country geometries indexed by the same two-letter ISO codes. Must be in geo_crs. EEZ_gpkg : str Path to the Marine Regions World EEZ v11 geopackage. out_logging : bool, default False If True, emits progress information via the module logger. distance : float, default 0.0 Buffer distance applied to country_shapes before subtraction from the EEZ geometries. Units are CRS-dependent. minarea : float, default 0.01 Minimum polygon area threshold used by the simplification routine. tolerance : float, default 0.01 Tolerance passed to Shapely simplify via _simplify_polys. simplify_gadm : bool, default True If True, simplify and validate geometries before and after the coastal subtraction.

Returns

geopandas.GeoDataFrame Offshore EEZ geometries indexed by country code (column name as index, geometry column geometry). Empty/invalid geometries are removed.

download_WorldPop(country_code, worldpop_method, year=2020, update=False, out_logging=False, size_min=300)

Download Worldpop using either the standard method or the API method.

Parameters

worldpop_method: str worldpop_method = "api" will use the API method to access the WorldPop 100mx100m dataset. worldpop_method = "standard" will use the standard method to access the WorldPop 1KMx1KM dataset. country_code : str Two letter country codes of the downloaded files. Files downloaded from https://data.worldpop.org/ datasets WorldPop UN adjusted year : int Year of the data to download update : bool Update = true, forces re-download of files size_min : int Minimum size of each file to download

Returns

WorldPop_inputfile : str Path of the file WorldPop_filename : str Name of the file

download_WorldPop_standard(country_code, year=2020, update=False, out_logging=False, size_min=300)

Download tiff file for each country code using the standard method from worldpop datastore with 1kmx1km resolution.

Parameters

country_code : str Two letter country codes of the downloaded files. Files downloaded from https://data.worldpop.org/ datasets WorldPop UN adjusted year : int Year of the data to download update : bool Update = true, forces re-download of files size_min : int Minimum size of each file to download

Returns

WorldPop_inputfile : str Path of the file WorldPop_filename : str Name of the file

download_WorldPop_API(country_code, year=2020, update=False, out_logging=False, size_min=300)

Download tiff file for each country code using the api method from worldpop API with 100mx100m resolution.

Parameters

country_code : str Two letter country codes of the downloaded files. Files downloaded from https://data.worldpop.org/ datasets WorldPop UN adjusted year : int Year of the data to download update : bool Update = true, forces re-download of files size_min : int Minimum size of each file to download

Returns

WorldPop_inputfile : str Path of the file WorldPop_filename : str Name of the file

convert_GDP(name_file_nc, year=2015, out_logging=False)

Function to convert the nc database of the GDP to tif, based on the work at https://doi.org/10.1038/sdata.2018.4. The dataset shall be downloaded independently by the user (see guide) or together with pypsa-earth package.

Parameters

name_file_nc : str Name of the nc file containing the GDP data (e.g. "GDP_PPP_1990_2015_5arcmin_v2.nc") year : int Year of the data to convert out_logging : bool If True, emits progress information via the module logger.

Returns

GDP_tif : str Path of the converted tif file name_file_tif : str Name of the converted tif file

load_GDP(year=2015, update=False, out_logging=False, name_file_nc='GDP_PPP_1990_2015_5arcmin_v2.nc')

Function to load the database of the GDP, based on the work at https://doi.org/10.1038/sdata.2018.4. The dataset shall be downloaded independently by the user (see guide) or together with pypsa-earth package.

Parameters

year : int Year of the data to load update : bool Update = true, forces re-download of files out_logging : bool If True, emits progress information via the module logger. name_file_nc : str Name of the nc file containing the GDP data (e.g. "GDP_PPP_1990_2015_5arcmin_v2.nc")

Returns

GDP_tif : str Path of the converted tif file name_file_tif : str Name of the converted tif file

generalized_mask(src, geom, **kwargs)

Generalize mask function to account for Polygon and MultiPolygon

Parameters

src : rasterio.io.DatasetReader Rasterio dataset reader object geom : shapely.geometry.base.BaseGeometry Geometry to mask the raster with. Can be a Polygon or a MultiPolygon. **kwargs : dict Additional keyword arguments to pass to rasterio.mask.mask function.

Returns

rasterio.io.DatasetReader Masked rasterio dataset reader object

add_gdp_data(df_gadm, year=2020, update=False, out_logging=False, name_file_nc='GDP_PPP_1990_2015_5arcmin_v2.nc', nprocesses=2, disable_progressbar=False)

Function to add gdp data to arbitrary number of shapes in a country.

Parameters

df_gadm: Geodataframe with one Multipolygon per row - Essential column ["country", "geometry"] - Non-essential column ["GADM_ID"]

Returns

df_gadm: Geodataframe with one Multipolygon per row - Same columns as input - Includes a new column ["gdp"]

process_function_population(row_id)

Function that reads the task from df_tasks and executes all the methods.

to obtain population values for the specified region

Parameters

row_id: int integer which indicates a specific row of df_tasks

Returns

windowed_pop_count: pd.DataFrame Dataframe containing "GADM_ID" and "pop" columns It represents the amount of population per region (GADM_ID), for the settings given by the row in df_tasks

get_worldpop_val_xy(WorldPop_inputfile, window_dimensions)

Function to extract data from .tif input file.

Parameters

WorldPop_inputfile: str file location of worldpop file window_dimensions: tuple dimensions of window used when reading file

Returns

np_pop_valid: np.ndarray array filled with values for each nonzero pixel in the worldpop file np_pop_xy: np.ndarray array with [x,y] coordinates of the corresponding nonzero values in np_pop_valid

compute_geomask_region(country_rows, affine_transform, window_dimensions, latlong_topleft, latlong_botright)

Function to mask geometries into np_map_ID using an incrementing counter.

Parameters

country_rows: gpd.GeoDataFrame geoDataFrame filled with geometries and their GADM_ID affine_transform: rasterio.transform.Affine affine transform of current window window_dimensions: tuple dimensions of window used when reading file latlong_topleft: list [latitude, longitude] of top left corner of the window latlong_botright: list [latitude, longitude] of bottom right corner of the window

Returns

np_map_ID.astype("H"): np.ndarray np_map_ID contains an ID for each location (undefined is 0) dimensions are taken from window_dimensions, .astype("H") for memory savings id_result: pd.DataFrame DataFrame of the mapping from id (from counter) to GADM_ID

sum_values_using_geomask(np_pop_val, np_pop_xy, region_geomask, id_mapping)

Function that sums all the population values in np_pop_val into the correct GADM_ID It uses np_pop_xy to access the key stored in region_geomask[x][y]

The relation of this key to GADM_ID is stored in id_mapping

Parameters

np_pop_val: np.ndarray array filled with values for each nonzero pixel in the worldpop file np_pop_xy: np.ndarray array with [x,y] coordinates of the corresponding nonzero values in np_pop_valid region_geomask: np.ndarray array with dimensions of window, values are keys that map to GADM_ID using id_mapping id_mapping: pd.DataFrame Dataframe that contains mappings of region_geomask values to GADM_IDs

Returns

df_pop_count: pd.DataFrame Dataframe with columns - "GADM_ID" - "pop" containing population of GADM_ID region

loop_and_extact_val_x_y(np_pop_count, np_pop_val, np_pop_xy, region_geomask, dict_id)

Function that will be compiled using @njit (numba) It takes all the population values from np_pop_val and stores them in np_pop_count.

where each location in np_pop_count is mapped to a GADM_ID through dict_id (id_mapping by extension)

Parameters

np_pop_count: np.ndarray np.zeros array, which will store population counts np_pop_val: np.ndarray array filled with values for each nonzero pixel in the worldpop file np_pop_xy: np.ndarray array with [x,y] coordinates of the corresponding nonzero values in np_pop_valid region_geomask: np.ndarray array with dimensions of window, values are keys that map to GADM_ID using id_mapping dict_id: dict numba typed.dict containing id_mapping.index -> location in np_pop_count

Returns

np_pop_count: np.ndarray np.array containing population counts

calculate_transform_and_coords_for_window(current_transform, window_dimensions, original_window=False)

Function which calculates the [lat,long] corners of the window given window_dimensions, if not(original_window) it also changes the affine transform to match the window.

Parameters

current_transform: rasterio.transform.Affine affine transform of source image window_dimensions: tuple dimensions of window used when reading file original_window: bool boolean to track if window covers entire country

Returns

list A list of: [ adjusted_transform: affine transform adjusted to window coordinate_topleft: [latitude, longitude] of top left corner of the window coordinate_botright: [latitude, longitude] of bottom right corner of the window ]

generate_df_tasks(c_code, mem_read_limit_per_process, WorldPop_inputfile)

Function to generate a list of tasks based on the memory constraints.

One task represents a single window of the image

Parameters

c_code: str country code mem_read_limit_per_process: int memory limit for src.read() operation WorldPop_inputfile: str file location of worldpop file

Returns

pd.DataFrame Dataframe of task_list

add_population_data(df_gadm, country_codes, worldpop_method, year=2020, update=False, out_logging=False, mem_read_limit_per_process=1024, nprocesses=2, disable_progressbar=False)

Function to add population data to arbitrary number of shapes in a country. It loads data from WorldPop raster files where each pixel represents the population in that square region. Each square polygon (or pixel) is then mapped into the corresponding GADM shape. Then the population in a GADM shape is identified by summing over all pixels mapped to that region.

This is performed with an iterative approach:

  1. All necessary WorldPop data tiff file are downloaded
  2. The so-called windows are created to handle RAM limitations related to large WorldPop files. Large WorldPop files require significant RAM to handle, which may not be available, hence, the entire activity is decomposed into multiple windows (or tasks). Each window represents a subset of a raster file on which the following algorithm is applied. Note: when enough RAM is available only a window is created for efficiency purposes.
  3. Execute all tasks by summing the values of the pixels mapped into each GADM shape. Parallelization applies in this task.

Parameters

df_gadm: gpd.GeoDataFrame Geodataframe with one Multipolygon per row - Essential column ["country", "geometry"] - Non-essential column ["GADM_ID"] country_codes: list List of country codes to download and process worldpop_method: str Method to download worldpop data, either "api" or "ftp" year: int Year of the data to download update: bool Update = true, forces re-download of files out_logging: bool If True, emits progress information via the module logger. mem_read_limit_per_process: int Memory limit for src.read() operation in MB, used to determine window size nprocesses: int Number of processes to use for parallelization, default is 2 disable_progressbar: bool If True, disables the progress bar, default is False

Returns

df_gadm: gpd.GeoDataFrame Geodataframe with one Multipolygon per row - Same columns as input - Includes a new column ["pop"]

gadm(worldpop_method, gdp_method, countries, geo_crs, contended_flag, mem_mb, layer_id=2, update=False, out_logging=False, year=2020, nprocesses=None, simplify_gadm=True, tolerance=0.01, minarea=0.01)

Function to create a GeoDataFrame with GADM shapes and population and gdp data.

Parameters

worldpop_method: str Method to download worldpop data, either "api" or "ftp" gdp_method: str Method to download gdp data, either "api" or "ftp" countries: list List of country codes to download and process geo_crs: str CRS used for geographic projection, passed to GeoPandas contended_flag: bool If True, includes contended territories in the GADM layer, which may have overlapping geometries. If False, contended territories are excluded. mem_mb: int Memory limit in megabytes for processing WorldPop data, used to determine window size for population data processing layer_id: int GADM layer ID to download, default is 2 (admin level 2) update: bool Update = true, forces re-download of files out_logging: bool If True, emits progress information via the module logger. year: int Year of the data to download nprocesses: int Number of processes to use for parallelization, default is None (uses os.cpu_count()) simplify_gadm: bool If True, simplifies the geometries in the GADM GeoDataFrame to reduce file size and speed up processing, default is True tolerance: float Tolerance parameter for geometry simplification, default is 0.01 (units depend on the CRS of the geometries) minarea: float Minimum area threshold for geometry simplification, geometries smaller than this area will be removed after simplification, default is 0.01 (units depend on the CRS of the geometries)

Returns

gpd.GeoDataFrame GeoDataFrame with one Multipolygon per row, indexed by "GADM_ID" - Essential columns: ["country", "geometry"] - Optional columns: ["gdp", "pop"] (if gdp_method and worldpop_method are not False, respectively)

crop_country(gadm_shapes, subregion_config)

Merge GADM administrative units into custom subregions and aggregate remaining units at the country level, returning combined geometries.

Parameters

gadm_shapes : GeoDataFrame GeoDataFrame of GADM units indexed by administrative ID, containing a country column and a valid CRS. subregion_config : dict[str, list[str]] Mapping of subregion names to lists of GADM index values to be merged into each subregion.

Returns

GeoDataFrame GeoDataFrame indexed by name with merged subregion and country geometries, preserving the input CRS.

generate_points_every_km(gdf, distance_crs='EPSG:3857', interval_km=100)

Generate perimeter points for each Polygon/MultiPolygon in a GeoDataFrame

Parameters

gdf : GeoDataFrame GeoDataFrame containing Polygon or MultiPolygon geometries. distance_crs : str Projected CRS in meters for distance calculation. interval_km : float Target spacing between consecutive points in kilometers.

Returns

GeoDataFrame GeoDataFrame of sampled points with a column 'name' preserving the original index of the input geometry.

determine_subregion_country(subregion_shapes, country_shapes, distance_crs)

Assign each subregion to the country with which it has the largest spatial overlap.

Parameters

subregion_shapes : GeoDataFrame GeoDataFrame indexed by subregion name containing onshore geometries with a defined CRS. country_shapes : GeoDataFrame GeoDataFrame of country geometries used to determine which country each subregion belongs to, with a defined CRS. distance_crs : str Projected CRS used for accurate area calculations, passed to GeoPandas for area computation.

Returns

dict[str, str] Mapping of subregion name to country name, where each subregion is assigned to the country with which it has the largest spatial overlap.

crop_offshore(subregion_shapes, country_shapes, offshore_shapes, distance_crs='EPSG:3857')

Split offshore (EEZ) geometries among subregions of the same country, assigning each subregion a portion of the country's offshore area.

Parameters

subregion_shapes : GeoDataFrame GeoDataFrame indexed by subregion name containing onshore geometries with a defined CRS. country_shapes : GeoDataFrame GeoDataFrame of country geometries used to determine which country each subregion belongs to. offshore_shapes : GeoSeries GeoSeries indexed by name containing offshore (e.g., EEZ) geometries to be partitioned. distance_crs : str, default "EPSG:3857" Projected CRS used for distance-based operations and Voronoi partitioning.

Returns

GeoDataFrame GeoDataFrame indexed by subregion name containing offshore geometries, either equal to the full country offshore area (single subregion) or Voronoi-partitioned among multiple subregions.