Rule build_shapes¶
Workflow Diagram
See the complete workflow in the repository.
digraph snakemake_dag { graph [bgcolor=white, margin=0, size="8,5" ]; node [fontname=sans, fontsize=10, penwidth=2, shape=box, style=rounded ]; edge [color=grey, penwidth=2 ]; 4 [color="0.61 0.6 0.85", label=add_electricity]; 5 [color="0.19 0.6 0.85", label=build_bus_regions]; 6 [color="0.17 0.6 0.85", label=base_network]; 8 [color="0.00 0.6 0.85", fillcolor=gray, label=build_shapes, style=filled]; 8 -> 4; 8 -> 5; 8 -> 6; 9 [color="0.22 0.6 0.85", label=build_renewable_profiles]; 8 -> 9; 10 [color="0.11 0.6 0.85", label=build_hydro_profile]; 8 -> 10; }
Script Documentation¶
get_GADM_filename(country_code)
¶
download_GADM(country_code, update=False, out_logging=False)
¶
Download gpkg file from GADM for a given country code.
Parameters¶
country_code : str Two letter country codes of the downloaded files update : bool Update = true, forces re-download of files
Returns¶
GADM_inputfile_gpkg : str Path of the downloaded gpkg file GADM_filename : str Name of the gpkg file per country
filter_gadm(geodf, layer, cc, contended_flag, output_nonstd_to_csv=False)
¶
Function to filter the GADM geodataframe according to the contented_flag option.
Parameters¶
geodf : gpd.GeoDataFrame Geodataframe to filter layer : int Layer of the geodataframe to filter cc : str Country code to filter contended_flag : str Option to treat contended areas, i.e. areas that are not assigned to the country in the GADM layer but are part of the country according to the country code (GID_0) of the geodataframe. The options are: - "drop": drop contended areas from the geodataframe - "set_by_country": set GID_0 of contended areas to the country code of the country (default) output_nonstd_to_csv : bool If True, outputs the non-standard rows to a csv file for debugging purposes (default False)
Returns¶
gpd.GeoDataFrame Filtered GADM geodataframe
get_GADM_layer(country_list, layer_id, geo_crs='EPSG:4326', contended_flag='set_by_country', update=False, outlogging=False)
¶
Function to retrieve a specific layer id of a geopackage for a selection of countries.
Parameters¶
country_list : list[str] List of the countries layer_id : int Layer to consider in the format GID_{layer_id}. When the requested layer_id is greater than the last available layer, then the last layer is selected. When a negative value is requested, then, the last layer is requested geo_crs : str CRS used for geographic projection, passed to GeoPandas (e.g. "EPSG:4326") contended_flag : str Option to treat contended areas, i.e. areas that are not assigned to the country in the GADM layer but are part of the country according to the country code (GID_0) of the geodataframe. The options are: - "drop": drop contended areas from the geodataframe - "set_by_country": set GID_0 of contended areas to the country code of the country (default) update : bool Update = true, forces re-download of files outlogging : bool If True, emits progress information via the module logger.
Returns¶
gpd.GeoDataFrame Geodataframe with the requested GADM layer for the selected countries
countries(countries, geo_crs, contended_flag, update=False, out_logging=False, tolerance=0.01, minarea=0.01)
¶
Create country shapes
Parameters¶
countries : list[str] List of the countries geo_crs : str CRS used for geographic projection, passed to GeoPandas (e.g. "EPSG:4326") contended_flag : str Flag indicating whether to include contended areas update : bool Update = true, forces re-download of files out_logging : bool If True, emits progress information via the module logger. tolerance : float Tolerance for the simplification (default 0.01) minarea : float Minimum area of the polygons to keep (default 0.01)
Returns¶
gpd.GeoSeries Geoseries with the country shapes
country_cover(country_shapes, eez_shapes=None, out_logging=False, distance=0.02)
¶
Create a continent shape by merging the country shapes and the EEZ shapes (if provided) with a buffer.
Parameters¶
country_shapes : gpd.GeoSeries Geoseries with the country shapes eez_shapes : gpd.GeoSeries, optional Geoseries with the EEZ shapes out_logging : bool If True, emits progress information via the module logger. distance : float Distance for the buffer (default 0.02)
Returns¶
gpd.GeoSeries Geoseries with the continent shape
load_EEZ(countries_codes, geo_crs, EEZ_gpkg='./data/eez/eez_v11.gpkg')
¶
Function to load the database of the Exclusive Economic Zones.
The dataset shall be downloaded independently by the user (see guide) or together with pypsa-earth package.
Parameters¶
countries_codes : list[str] List of two-letter ISO country codes. geo_crs : str CRS used for geographic projection, passed to GeoPandas (e.g. "EPSG:4326"). EEZ_gpkg : str, default "./data/eez/eez_v11.gpkg" Path to the Marine Regions World EEZ v11 geopackage.
Returns¶
gpd.GeoDataFrame GeoDataFrame with the EEZ geometries.
eez(countries, geo_crs, country_shapes, EEZ_gpkg, out_logging=False, distance=0.0, minarea=0.01, tolerance=0.01, simplify_gadm=True)
¶
Build offshore (EEZ) shapes for the requested countries.
The function loads EEZ geometries, unions them per country, and optionally
simplifies them. If distance is non-zero, the onshore country
shapes are buffered and subtracted from EEZ geometries to create an
"offshore-only" region (e.g. to enforce a non-build coastal strip).
Parameters¶
countries : list[str]
Two-letter ISO country codes to process (e.g. ["DE", "FR"]).
geo_crs : str
CRS used for geographic projection, passed to GeoPandas (e.g."EPSG:4326").
Note: buffering distances depend on the CRS units.
country_shapes : geopandas.GeoSeries or geopandas.GeoDataFrame
Country geometries indexed by the same two-letter ISO codes. Must be in geo_crs.
EEZ_gpkg : str
Path to the Marine Regions World EEZ v11 geopackage.
out_logging : bool, default False
If True, emits progress information via the module logger.
distance : float, default 0.0
Buffer distance applied to country_shapes before subtraction from
the EEZ geometries. Units are CRS-dependent.
minarea : float, default 0.01
Minimum polygon area threshold used by the simplification routine.
tolerance : float, default 0.01
Tolerance passed to Shapely simplify via _simplify_polys.
simplify_gadm : bool, default True
If True, simplify and validate geometries before and after the coastal
subtraction.
Returns¶
geopandas.GeoDataFrame
Offshore EEZ geometries indexed by country code (column name as
index, geometry column geometry). Empty/invalid geometries are
removed.
download_WorldPop(country_code, worldpop_method, year=2020, update=False, out_logging=False, size_min=300)
¶
Download Worldpop using either the standard method or the API method.
Parameters¶
worldpop_method: str worldpop_method = "api" will use the API method to access the WorldPop 100mx100m dataset. worldpop_method = "standard" will use the standard method to access the WorldPop 1KMx1KM dataset. country_code : str Two letter country codes of the downloaded files. Files downloaded from https://data.worldpop.org/ datasets WorldPop UN adjusted year : int Year of the data to download update : bool Update = true, forces re-download of files size_min : int Minimum size of each file to download
Returns¶
WorldPop_inputfile : str Path of the file WorldPop_filename : str Name of the file
download_WorldPop_standard(country_code, year=2020, update=False, out_logging=False, size_min=300)
¶
Download tiff file for each country code using the standard method from worldpop datastore with 1kmx1km resolution.
Parameters¶
country_code : str Two letter country codes of the downloaded files. Files downloaded from https://data.worldpop.org/ datasets WorldPop UN adjusted year : int Year of the data to download update : bool Update = true, forces re-download of files size_min : int Minimum size of each file to download
Returns¶
WorldPop_inputfile : str Path of the file WorldPop_filename : str Name of the file
download_WorldPop_API(country_code, year=2020, update=False, out_logging=False, size_min=300)
¶
Download tiff file for each country code using the api method from worldpop API with 100mx100m resolution.
Parameters¶
country_code : str Two letter country codes of the downloaded files. Files downloaded from https://data.worldpop.org/ datasets WorldPop UN adjusted year : int Year of the data to download update : bool Update = true, forces re-download of files size_min : int Minimum size of each file to download
Returns¶
WorldPop_inputfile : str Path of the file WorldPop_filename : str Name of the file
convert_GDP(name_file_nc, year=2015, out_logging=False)
¶
Function to convert the nc database of the GDP to tif, based on the work at https://doi.org/10.1038/sdata.2018.4. The dataset shall be downloaded independently by the user (see guide) or together with pypsa-earth package.
Parameters¶
name_file_nc : str Name of the nc file containing the GDP data (e.g. "GDP_PPP_1990_2015_5arcmin_v2.nc") year : int Year of the data to convert out_logging : bool If True, emits progress information via the module logger.
Returns¶
GDP_tif : str Path of the converted tif file name_file_tif : str Name of the converted tif file
load_GDP(year=2015, update=False, out_logging=False, name_file_nc='GDP_PPP_1990_2015_5arcmin_v2.nc')
¶
Function to load the database of the GDP, based on the work at https://doi.org/10.1038/sdata.2018.4. The dataset shall be downloaded independently by the user (see guide) or together with pypsa-earth package.
Parameters¶
year : int Year of the data to load update : bool Update = true, forces re-download of files out_logging : bool If True, emits progress information via the module logger. name_file_nc : str Name of the nc file containing the GDP data (e.g. "GDP_PPP_1990_2015_5arcmin_v2.nc")
Returns¶
GDP_tif : str Path of the converted tif file name_file_tif : str Name of the converted tif file
generalized_mask(src, geom, **kwargs)
¶
Generalize mask function to account for Polygon and MultiPolygon
Parameters¶
src : rasterio.io.DatasetReader Rasterio dataset reader object geom : shapely.geometry.base.BaseGeometry Geometry to mask the raster with. Can be a Polygon or a MultiPolygon. **kwargs : dict Additional keyword arguments to pass to rasterio.mask.mask function.
Returns¶
rasterio.io.DatasetReader Masked rasterio dataset reader object
add_gdp_data(df_gadm, year=2020, update=False, out_logging=False, name_file_nc='GDP_PPP_1990_2015_5arcmin_v2.nc', nprocesses=2, disable_progressbar=False)
¶
Function to add gdp data to arbitrary number of shapes in a country.
Parameters¶
df_gadm: Geodataframe with one Multipolygon per row - Essential column ["country", "geometry"] - Non-essential column ["GADM_ID"]
Returns¶
df_gadm: Geodataframe with one Multipolygon per row - Same columns as input - Includes a new column ["gdp"]
process_function_population(row_id)
¶
Function that reads the task from df_tasks and executes all the methods.
to obtain population values for the specified region
Parameters¶
row_id: int integer which indicates a specific row of df_tasks
Returns¶
windowed_pop_count: pd.DataFrame Dataframe containing "GADM_ID" and "pop" columns It represents the amount of population per region (GADM_ID), for the settings given by the row in df_tasks
get_worldpop_val_xy(WorldPop_inputfile, window_dimensions)
¶
Function to extract data from .tif input file.
Parameters¶
WorldPop_inputfile: str file location of worldpop file window_dimensions: tuple dimensions of window used when reading file
Returns¶
np_pop_valid: np.ndarray array filled with values for each nonzero pixel in the worldpop file np_pop_xy: np.ndarray array with [x,y] coordinates of the corresponding nonzero values in np_pop_valid
compute_geomask_region(country_rows, affine_transform, window_dimensions, latlong_topleft, latlong_botright)
¶
Function to mask geometries into np_map_ID using an incrementing counter.
Parameters¶
country_rows: gpd.GeoDataFrame geoDataFrame filled with geometries and their GADM_ID affine_transform: rasterio.transform.Affine affine transform of current window window_dimensions: tuple dimensions of window used when reading file latlong_topleft: list [latitude, longitude] of top left corner of the window latlong_botright: list [latitude, longitude] of bottom right corner of the window
Returns¶
np_map_ID.astype("H"): np.ndarray np_map_ID contains an ID for each location (undefined is 0) dimensions are taken from window_dimensions, .astype("H") for memory savings id_result: pd.DataFrame DataFrame of the mapping from id (from counter) to GADM_ID
sum_values_using_geomask(np_pop_val, np_pop_xy, region_geomask, id_mapping)
¶
Function that sums all the population values in np_pop_val into the correct GADM_ID It uses np_pop_xy to access the key stored in region_geomask[x][y]
The relation of this key to GADM_ID is stored in id_mapping
Parameters¶
np_pop_val: np.ndarray array filled with values for each nonzero pixel in the worldpop file np_pop_xy: np.ndarray array with [x,y] coordinates of the corresponding nonzero values in np_pop_valid region_geomask: np.ndarray array with dimensions of window, values are keys that map to GADM_ID using id_mapping id_mapping: pd.DataFrame Dataframe that contains mappings of region_geomask values to GADM_IDs
Returns¶
df_pop_count: pd.DataFrame Dataframe with columns - "GADM_ID" - "pop" containing population of GADM_ID region
loop_and_extact_val_x_y(np_pop_count, np_pop_val, np_pop_xy, region_geomask, dict_id)
¶
Function that will be compiled using @njit (numba) It takes all the population values from np_pop_val and stores them in np_pop_count.
where each location in np_pop_count is mapped to a GADM_ID through dict_id (id_mapping by extension)
Parameters¶
np_pop_count: np.ndarray np.zeros array, which will store population counts np_pop_val: np.ndarray array filled with values for each nonzero pixel in the worldpop file np_pop_xy: np.ndarray array with [x,y] coordinates of the corresponding nonzero values in np_pop_valid region_geomask: np.ndarray array with dimensions of window, values are keys that map to GADM_ID using id_mapping dict_id: dict numba typed.dict containing id_mapping.index -> location in np_pop_count
Returns¶
np_pop_count: np.ndarray np.array containing population counts
calculate_transform_and_coords_for_window(current_transform, window_dimensions, original_window=False)
¶
Function which calculates the [lat,long] corners of the window given window_dimensions, if not(original_window) it also changes the affine transform to match the window.
Parameters¶
current_transform: rasterio.transform.Affine affine transform of source image window_dimensions: tuple dimensions of window used when reading file original_window: bool boolean to track if window covers entire country
Returns¶
list A list of: [ adjusted_transform: affine transform adjusted to window coordinate_topleft: [latitude, longitude] of top left corner of the window coordinate_botright: [latitude, longitude] of bottom right corner of the window ]
generate_df_tasks(c_code, mem_read_limit_per_process, WorldPop_inputfile)
¶
Function to generate a list of tasks based on the memory constraints.
One task represents a single window of the image
Parameters¶
c_code: str country code mem_read_limit_per_process: int memory limit for src.read() operation WorldPop_inputfile: str file location of worldpop file
Returns¶
pd.DataFrame Dataframe of task_list
add_population_data(df_gadm, country_codes, worldpop_method, year=2020, update=False, out_logging=False, mem_read_limit_per_process=1024, nprocesses=2, disable_progressbar=False)
¶
Function to add population data to arbitrary number of shapes in a country. It loads data from WorldPop raster files where each pixel represents the population in that square region. Each square polygon (or pixel) is then mapped into the corresponding GADM shape. Then the population in a GADM shape is identified by summing over all pixels mapped to that region.
This is performed with an iterative approach:
- All necessary WorldPop data tiff file are downloaded
- The so-called windows are created to handle RAM limitations related to large WorldPop files. Large WorldPop files require significant RAM to handle, which may not be available, hence, the entire activity is decomposed into multiple windows (or tasks). Each window represents a subset of a raster file on which the following algorithm is applied. Note: when enough RAM is available only a window is created for efficiency purposes.
- Execute all tasks by summing the values of the pixels mapped into each GADM shape. Parallelization applies in this task.
Parameters¶
df_gadm: gpd.GeoDataFrame Geodataframe with one Multipolygon per row - Essential column ["country", "geometry"] - Non-essential column ["GADM_ID"] country_codes: list List of country codes to download and process worldpop_method: str Method to download worldpop data, either "api" or "ftp" year: int Year of the data to download update: bool Update = true, forces re-download of files out_logging: bool If True, emits progress information via the module logger. mem_read_limit_per_process: int Memory limit for src.read() operation in MB, used to determine window size nprocesses: int Number of processes to use for parallelization, default is 2 disable_progressbar: bool If True, disables the progress bar, default is False
Returns¶
df_gadm: gpd.GeoDataFrame Geodataframe with one Multipolygon per row - Same columns as input - Includes a new column ["pop"]
gadm(worldpop_method, gdp_method, countries, geo_crs, contended_flag, mem_mb, layer_id=2, update=False, out_logging=False, year=2020, nprocesses=None, simplify_gadm=True, tolerance=0.01, minarea=0.01)
¶
Function to create a GeoDataFrame with GADM shapes and population and gdp data.
Parameters¶
worldpop_method: str Method to download worldpop data, either "api" or "ftp" gdp_method: str Method to download gdp data, either "api" or "ftp" countries: list List of country codes to download and process geo_crs: str CRS used for geographic projection, passed to GeoPandas contended_flag: bool If True, includes contended territories in the GADM layer, which may have overlapping geometries. If False, contended territories are excluded. mem_mb: int Memory limit in megabytes for processing WorldPop data, used to determine window size for population data processing layer_id: int GADM layer ID to download, default is 2 (admin level 2) update: bool Update = true, forces re-download of files out_logging: bool If True, emits progress information via the module logger. year: int Year of the data to download nprocesses: int Number of processes to use for parallelization, default is None (uses os.cpu_count()) simplify_gadm: bool If True, simplifies the geometries in the GADM GeoDataFrame to reduce file size and speed up processing, default is True tolerance: float Tolerance parameter for geometry simplification, default is 0.01 (units depend on the CRS of the geometries) minarea: float Minimum area threshold for geometry simplification, geometries smaller than this area will be removed after simplification, default is 0.01 (units depend on the CRS of the geometries)
Returns¶
gpd.GeoDataFrame GeoDataFrame with one Multipolygon per row, indexed by "GADM_ID" - Essential columns: ["country", "geometry"] - Optional columns: ["gdp", "pop"] (if gdp_method and worldpop_method are not False, respectively)
crop_country(gadm_shapes, subregion_config)
¶
Merge GADM administrative units into custom subregions and aggregate remaining units at the country level, returning combined geometries.
Parameters¶
gadm_shapes : GeoDataFrame
GeoDataFrame of GADM units indexed by administrative ID, containing a country column and a valid CRS.
subregion_config : dict[str, list[str]]
Mapping of subregion names to lists of GADM index values to be merged into each subregion.
Returns¶
GeoDataFrame GeoDataFrame indexed by name with merged subregion and country geometries, preserving the input CRS.
generate_points_every_km(gdf, distance_crs='EPSG:3857', interval_km=100)
¶
Generate perimeter points for each Polygon/MultiPolygon in a GeoDataFrame
Parameters¶
gdf : GeoDataFrame GeoDataFrame containing Polygon or MultiPolygon geometries. distance_crs : str Projected CRS in meters for distance calculation. interval_km : float Target spacing between consecutive points in kilometers.
Returns¶
GeoDataFrame GeoDataFrame of sampled points with a column 'name' preserving the original index of the input geometry.
determine_subregion_country(subregion_shapes, country_shapes, distance_crs)
¶
Assign each subregion to the country with which it has the largest spatial overlap.
Parameters¶
subregion_shapes : GeoDataFrame GeoDataFrame indexed by subregion name containing onshore geometries with a defined CRS. country_shapes : GeoDataFrame GeoDataFrame of country geometries used to determine which country each subregion belongs to, with a defined CRS. distance_crs : str Projected CRS used for accurate area calculations, passed to GeoPandas for area computation.
Returns¶
dict[str, str] Mapping of subregion name to country name, where each subregion is assigned to the country with which it has the largest spatial overlap.
crop_offshore(subregion_shapes, country_shapes, offshore_shapes, distance_crs='EPSG:3857')
¶
Split offshore (EEZ) geometries among subregions of the same country, assigning each subregion a portion of the country's offshore area.
Parameters¶
subregion_shapes : GeoDataFrame GeoDataFrame indexed by subregion name containing onshore geometries with a defined CRS. country_shapes : GeoDataFrame GeoDataFrame of country geometries used to determine which country each subregion belongs to. offshore_shapes : GeoSeries GeoSeries indexed by name containing offshore (e.g., EEZ) geometries to be partitioned. distance_crs : str, default "EPSG:3857" Projected CRS used for distance-based operations and Voronoi partitioning.
Returns¶
GeoDataFrame GeoDataFrame indexed by subregion name containing offshore geometries, either equal to the full country offshore area (single subregion) or Voronoi-partitioned among multiple subregions.