thelper.data.geo package

Geospatial dataset parsing/loading package.

This package contains classes and functions whose role is to fetch the data required to train, validate, and test a model on geospatial data. Importing the modules inside this package requires GDAL.

Submodules

thelper.data.geo.agrivis module

Agricultural Semantic Segentation Challenge Dataset Interface

Original author: David Landry (david.landry@crim.ca) Updated by Pierre-Luc St-Charles (April 2020)

class thelper.data.geo.agrivis.Hdf5AgricultureDataset(hdf5_path: AnyStr, group_name: AnyStr, transforms: Any = None, use_global_normalization: bool = True, keep_file_open: bool = False, load_meta_keys: bool = False, copy_to_slurm_tmpdir: bool = False)[source]

Bases: thelper.data.parsers.Dataset

__getitem__(idx)[source]

Returns the data sample (a dictionary) for a specific (0-based) index.

__init__(hdf5_path: AnyStr, group_name: AnyStr, transforms: Any = None, use_global_normalization: bool = True, keep_file_open: bool = False, load_meta_keys: bool = False, copy_to_slurm_tmpdir: bool = False)[source]

Dataset parser constructor.

In order for derived datasets to be instantiated automatically by the framework from a configuration file, they must minimally accept a ‘transforms’ argument like the shown one here.

Parameters:
  • transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
  • deepcopy – specifies whether this dataset interface should be deep-copied inside thelper.data.loaders.LoaderFactory so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.

thelper.data.geo.bigearthnet module

thelper.data.geo.gdl module

Data parsers & utilities for cross-framework compatibility with Geo Deep Learning (GDL).

Geo Deep Learning (GDL) is a machine learning framework initiative for geospatial projects lead by the wonderful folks at NRCan’s CCMEO. See https://github.com/NRCan/geo-deep-learning for more information.

The classes and functions defined here were used for the exploration of research topics and for the validation and testing of new software components.

class thelper.data.geo.gdl.MetaSegmentationDataset(class_names, work_folder, dataset_type, meta_map, max_sample_count=None, dontcare=None, transforms=None)[source]

Bases: thelper.data.geo.gdl.SegmentationDataset

Semantic segmentation dataset interface that appends metadata under new tensor layers.

__getitem__(index)[source]

Returns the data sample (a dictionary) for a specific (0-based) index.

__init__(class_names, work_folder, dataset_type, meta_map, max_sample_count=None, dontcare=None, transforms=None)[source]

Segmentation dataset parser constructor.

This constructor receives all extra arguments necessary to build a segmentation task object.

Parameters:
  • class_names – list of all class names (or labels) that must be predicted in the image.
  • input_key – key used to index the input image in the loaded samples.
  • label_map_key – key used to index the label map in the loaded samples.
  • meta_keys – list of extra keys that will be available in the loaded samples.
  • transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
  • deepcopy – specifies whether this dataset interface should be deep-copied inside thelper.data.loaders.LoaderFactory so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.
static get_meta_value(map, key)[source]
metadata_handling_modes = ['const_channel', 'scaled_channel']
class thelper.data.geo.gdl.SegmentationDataset(class_names, work_folder, dataset_type, max_sample_count=None, dontcare=None, transforms=None)[source]

Bases: thelper.data.parsers.SegmentationDataset

Semantic segmentation dataset interface for GDL-based HDF5 parsing.

__getitem__(index)[source]

Returns the data sample (a dictionary) for a specific (0-based) index.

__init__(class_names, work_folder, dataset_type, max_sample_count=None, dontcare=None, transforms=None)[source]

Segmentation dataset parser constructor.

This constructor receives all extra arguments necessary to build a segmentation task object.

Parameters:
  • class_names – list of all class names (or labels) that must be predicted in the image.
  • input_key – key used to index the input image in the loaded samples.
  • label_map_key – key used to index the label map in the loaded samples.
  • meta_keys – list of extra keys that will be available in the loaded samples.
  • transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
  • deepcopy – specifies whether this dataset interface should be deep-copied inside thelper.data.loaders.LoaderFactory so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.

thelper.data.geo.infer module

class thelper.data.geo.infer.SlidingWindowTester(session_name, session_dir, model, task, loaders, config, ckptdata=None)[source]

Bases: thelper.infer.base.Tester

Tester that satisfies the requirements of the Tester in order to run classification inference

__init__(session_name, session_dir, model, task, loaders, config, ckptdata=None)[source]

Receives the trainer configuration dictionary, parses it, and sets up the session.

eval_epoch(model, epoch, dev, loader, metrics, output_path)[source]

Computes the pixelwise prediction on an image.

It does the prediction per batch size of N pixels. It returns the class predicted and its probability. The results are saved into two images created with the same size and projection info as the input rasters.

The class image gives the class id, a number between 1 and the number of classes for corresponding pixels. Class id 0 is reserved for nodata.

The probs image contains N-class channels with the probability values of the pixels for each class. The probabilities by default are normalised.

Also, a config-classes.json file is created listing the name-to-class-id mapping that was used to generate the values in the class image (i.e.: class names defined by the pre-trained model).

Parameters:
  • model – the model with which to run inference that is already uploaded to the target device(s).
  • epoch – the epoch index we are training for (0-based, and should normally only be 0 for single test pass).
  • dev – the target device that tensors should be uploaded to (corresponding to model’s device(s)).
  • loader – the data loader used to get transformed test samples.
  • metrics – the dictionary of metrics/consumers to report inference results (mostly loggers and basic report generator in this case since there shouldn’t be ground truth labels to validate against).
  • output_path – directory where output files should be written, if necessary.
supports_classification = True

thelper.data.geo.ogc module

Data parsers & utilities module for OGC-related projects.

class thelper.data.geo.ogc.TB15D104[source]

Bases: object

Wrapper class for OGC Testbed-15 (D104) identifiers.

BACKGROUND_ID = 0
LAKE_ID = 1
TYPECE_LAKE = '21'
TYPECE_RIVER = '10'
class thelper.data.geo.ogc.TB15D104Dataset(raster_path, vector_path, px_size=None, allow_outlying_vectors=True, clip_outlying_vectors=True, lake_area_min=0.0, lake_area_max=inf, lake_river_max_dist=inf, feature_buffer=1000, master_roi=None, focus_lakes=True, srs_target='3857', force_parse=False, reproj_rasters=False, reproj_all_cpus=True, display_debug=False, keep_rasters_open=True, parallel=False, transforms=None)[source]

Bases: thelper.data.geo.parsers.VectorCropDataset

OGC Testbed-15 dataset parser for D104 (lake/river) segmentation task.

__getitem__(idx)[source]

Returns the data sample (a dictionary) for a specific (0-based) index.

__init__(raster_path, vector_path, px_size=None, allow_outlying_vectors=True, clip_outlying_vectors=True, lake_area_min=0.0, lake_area_max=inf, lake_river_max_dist=inf, feature_buffer=1000, master_roi=None, focus_lakes=True, srs_target='3857', force_parse=False, reproj_rasters=False, reproj_all_cpus=True, display_debug=False, keep_rasters_open=True, parallel=False, transforms=None)[source]

Dataset parser constructor.

In order for derived datasets to be instantiated automatically by the framework from a configuration file, they must minimally accept a ‘transforms’ argument like the shown one here.

Parameters:
  • transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
  • deepcopy – specifies whether this dataset interface should be deep-copied inside thelper.data.loaders.LoaderFactory so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.
static lake_cleaner(features, area_min, area_max, lake_river_max_dist, parallel=False)[source]

Flags geometric features as ‘clean’ based on type and distance to nearest river.

static lake_cropper(features, rasters_data, coverage, srs_target, px_size, skew, feature_buffer, parallel=False)[source]

Returns the ROI information for a given feature (may be modified in derived classes).

class thelper.data.geo.ogc.TB15D104DetectLogger(conf_threshold=0.5)[source]

Bases: thelper.train.utils.DetectLogger

__init__(conf_threshold=0.5)[source]

Receives the logging parameters & the optional class label names used to decorate the log.

report_geojson()[source]
class thelper.data.geo.ogc.TB15D104TileDataset(raster_path, vector_path, tile_size, tile_overlap, px_size=None, allow_outlying_vectors=True, clip_outlying_vectors=True, lake_area_min=0.0, lake_area_max=inf, master_roi=None, srs_target='3857', force_parse=False, reproj_rasters=False, reproj_all_cpus=True, display_debug=False, keep_rasters_open=True, parallel=False, transforms=None)[source]

Bases: thelper.data.geo.parsers.TileDataset

OGC Testbed-15 dataset parser for D104 (lake/river) segmentation task.

__getitem__(idx)[source]

Returns the data sample (a dictionary) for a specific (0-based) index.

__init__(raster_path, vector_path, tile_size, tile_overlap, px_size=None, allow_outlying_vectors=True, clip_outlying_vectors=True, lake_area_min=0.0, lake_area_max=inf, master_roi=None, srs_target='3857', force_parse=False, reproj_rasters=False, reproj_all_cpus=True, display_debug=False, keep_rasters_open=True, parallel=False, transforms=None)[source]

Dataset parser constructor.

In order for derived datasets to be instantiated automatically by the framework from a configuration file, they must minimally accept a ‘transforms’ argument like the shown one here.

Parameters:
  • transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
  • deepcopy – specifies whether this dataset interface should be deep-copied inside thelper.data.loaders.LoaderFactory so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.
thelper.data.geo.ogc.postproc_features(input_file, bboxes_srs, orig_geoms_path, output_file, final_srs=None, write_shapefile_copy=False)[source]

Post-processes bounding box detections produced during an evaluation session into a GeoJSON file.

thelper.data.geo.parsers module

Geospatial data parser & utilities module.

class thelper.data.geo.parsers.ImageFolderGDataset(root, transforms=None, image_key='image', label_key='label', path_key='path', idx_key='idx', channels=None)[source]

Bases: thelper.data.parsers.ImageFolderDataset

Image folder dataset specialization interface for classification tasks on geospatial images.

This specialization is used to parse simple image subfolders, and it essentially replaces the very basic torchvision.datasets.ImageFolder interface with similar functionalities. It it used to provide a proper task interface as well as path metadata in each loaded packet for metrics/logging output.

The difference with the parent class ImageFolderDataset is the used of gdal to manage multi channels images found in remote sensing domain. The user can specify the channels to load. By default the first three channels are loaded [1,2,3].

__getitem__(idx)[source]

Returns the data sample (a dictionary) for a specific (0-based) index.

__init__(root, transforms=None, image_key='image', label_key='label', path_key='path', idx_key='idx', channels=None)[source]

Image folder dataset parser constructor.

class thelper.data.geo.parsers.SlidingWindowDataset(raster_path, raster_bands, patch_size, transforms=None, image_key='image')[source]

Bases: thelper.data.parsers.Dataset

Sliding window dataset specialization interface for classification tasks over a geospatial image.

The dataset runs a sliding window over the whole geospatial image in order to return tile patches. The operation can be accomplished over multiple raster bands if they can be found in the provided raster container.

__getitem__(idx)[source]

Returns the data sample (a dictionary) for a specific (0-based) index.

__init__(raster_path, raster_bands, patch_size, transforms=None, image_key='image')[source]

Dataset parser constructor.

In order for derived datasets to be instantiated automatically by the framework from a configuration file, they must minimally accept a ‘transforms’ argument like the shown one here.

Parameters:
  • transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
  • deepcopy – specifies whether this dataset interface should be deep-copied inside thelper.data.loaders.LoaderFactory so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.
class thelper.data.geo.parsers.TileDataset(raster_path, vector_path, tile_size, tile_overlap=0, skip_empty_tiles=False, skip_nodata_tiles=True, px_size=None, allow_outlying_vectors=True, clip_outlying_vectors=True, vector_area_min=0.0, vector_area_max=inf, vector_target_prop=None, master_roi=None, srs_target='3857', raster_key='raster', mask_key='mask', cleaner=None, force_parse=False, reproj_rasters=False, reproj_all_cpus=True, keep_rasters_open=True, transforms=None)[source]

Bases: thelper.data.geo.parsers.VectorCropDataset

Abstract dataset used to systematically tile vector data and rasters.

__init__(raster_path, vector_path, tile_size, tile_overlap=0, skip_empty_tiles=False, skip_nodata_tiles=True, px_size=None, allow_outlying_vectors=True, clip_outlying_vectors=True, vector_area_min=0.0, vector_area_max=inf, vector_target_prop=None, master_roi=None, srs_target='3857', raster_key='raster', mask_key='mask', cleaner=None, force_parse=False, reproj_rasters=False, reproj_all_cpus=True, keep_rasters_open=True, transforms=None)[source]

Dataset parser constructor.

In order for derived datasets to be instantiated automatically by the framework from a configuration file, they must minimally accept a ‘transforms’ argument like the shown one here.

Parameters:
  • transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
  • deepcopy – specifies whether this dataset interface should be deep-copied inside thelper.data.loaders.LoaderFactory so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.
class thelper.data.geo.parsers.VectorCropDataset(raster_path, vector_path, px_size=None, skew=None, allow_outlying_vectors=True, clip_outlying_vectors=True, vector_area_min=0.0, vector_area_max=inf, vector_target_prop=None, feature_buffer=None, master_roi=None, srs_target='3857', raster_key='raster', mask_key='mask', cleaner=None, cropper=None, force_parse=False, reproj_rasters=False, reproj_all_cpus=True, keep_rasters_open=True, transforms=None)[source]

Bases: thelper.data.parsers.Dataset

Abstract dataset used to combine geojson vector data and rasters.

__getitem__(idx)[source]

Returns the data sample (a dictionary) for a specific (0-based) index.

__init__(raster_path, vector_path, px_size=None, skew=None, allow_outlying_vectors=True, clip_outlying_vectors=True, vector_area_min=0.0, vector_area_max=inf, vector_target_prop=None, feature_buffer=None, master_roi=None, srs_target='3857', raster_key='raster', mask_key='mask', cleaner=None, cropper=None, force_parse=False, reproj_rasters=False, reproj_all_cpus=True, keep_rasters_open=True, transforms=None)[source]

Dataset parser constructor.

In order for derived datasets to be instantiated automatically by the framework from a configuration file, they must minimally accept a ‘transforms’ argument like the shown one here.

Parameters:
  • transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
  • deepcopy – specifies whether this dataset interface should be deep-copied inside thelper.data.loaders.LoaderFactory so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.

thelper.data.geo.utils module

thelper.data.geo.utils.export_geojson_with_crs(features, srs_target)[source]

Exports a list of features along with their SRS into a GeoJSON-compat string.

thelper.data.geo.utils.export_geotiff(filepath, crop, srs, geotransform)[source]
thelper.data.geo.utils.get_feature_bbox(geom, offsets=None)[source]
thelper.data.geo.utils.get_feature_roi(geom, px_size, skew, roi_buffer=None, crop_img_size=None, crop_real_size=None)[source]
thelper.data.geo.utils.get_geocoord(geotransform, x, y)[source]
thelper.data.geo.utils.get_geoextent(geotransform, x, y, cols, rows)[source]
thelper.data.geo.utils.get_pxcoord(geotransform, x, y)[source]
thelper.data.geo.utils.open_rasterfile(raster_data, keep_rasters_open=False)[source]
thelper.data.geo.utils.parse_geojson(geojson, srs_target=None, roi=None, allow_outlying=False, clip_outlying=False)[source]
thelper.data.geo.utils.parse_geojson_crs(body)[source]

Imports a coordinate reference system (CRS) from a GeoJSON tree.

thelper.data.geo.utils.parse_raster_metadata(raster_metadata, raster_dataset=None)[source]

Parses the provided raster metadata and updates it by adding extra details required for later use.

The provided raster metadata is updated directly. Metadata is validated against the matching data storage. If any important, required or requested (bands) metadata is missing, the function raises the issue immediately.

Parameters:
  • raster_metadata (dict) – raster metadata dictionary with minimally a file ‘path’ and list of ‘bands’ indices to process.
  • raster_dataset (gdal.Dataset) – (optional) preloaded dataset object corresponding to the raster metadata.
Raises:
  • ValueError – at least one input raster was missing a required metadata parameter or a parameter is erroneous.
  • IOError – the raster path could not be found or reading it did not generate a valid raster using GDAL.
thelper.data.geo.utils.parse_rasters(raster_paths, srs_target=None, reproj=False)[source]
thelper.data.geo.utils.parse_roi(roi_path, srs_target=None)[source]
thelper.data.geo.utils.parse_shapefile(shapefile_path, srs_target=None, roi=None, allow_outlying=False, clip_outlying=False, layer_id=0)[source]
thelper.data.geo.utils.reproject_coords(coords, src_srs, tgt_srs)[source]
thelper.data.geo.utils.reproject_crop(raster, crop_raster, crop_size, crop_datatype, crop_nodataval=None, reproj_opt=None, fill_nodata=False)[source]