thelper.data.geo package¶
Geospatial dataset parsing/loading package.
This package contains classes and functions whose role is to fetch the data required to train, validate, and test a model on geospatial data. Importing the modules inside this package requires GDAL.
Submodules¶
thelper.data.geo.agrivis module¶
Agricultural Semantic Segentation Challenge Dataset Interface
Original author: David Landry (david.landry@crim.ca) Updated by Pierre-Luc St-Charles (April 2020)
-
class
thelper.data.geo.agrivis.
Hdf5AgricultureDataset
(hdf5_path: AnyStr, group_name: AnyStr, transforms: Any = None, use_global_normalization: bool = True, keep_file_open: bool = False, load_meta_keys: bool = False, copy_to_slurm_tmpdir: bool = False)[source]¶ Bases:
thelper.data.parsers.Dataset
-
__init__
(hdf5_path: AnyStr, group_name: AnyStr, transforms: Any = None, use_global_normalization: bool = True, keep_file_open: bool = False, load_meta_keys: bool = False, copy_to_slurm_tmpdir: bool = False)[source]¶ Dataset parser constructor.
In order for derived datasets to be instantiated automatically by the framework from a configuration file, they must minimally accept a ‘transforms’ argument like the shown one here.
Parameters: - transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
- deepcopy – specifies whether this dataset interface should be deep-copied inside
thelper.data.loaders.LoaderFactory
so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.
-
thelper.data.geo.bigearthnet module¶
thelper.data.geo.gdl module¶
Data parsers & utilities for cross-framework compatibility with Geo Deep Learning (GDL).
Geo Deep Learning (GDL) is a machine learning framework initiative for geospatial projects lead by the wonderful folks at NRCan’s CCMEO. See https://github.com/NRCan/geo-deep-learning for more information.
The classes and functions defined here were used for the exploration of research topics and for the validation and testing of new software components.
-
class
thelper.data.geo.gdl.
MetaSegmentationDataset
(class_names, work_folder, dataset_type, meta_map, max_sample_count=None, dontcare=None, transforms=None)[source]¶ Bases:
thelper.data.geo.gdl.SegmentationDataset
Semantic segmentation dataset interface that appends metadata under new tensor layers.
-
__init__
(class_names, work_folder, dataset_type, meta_map, max_sample_count=None, dontcare=None, transforms=None)[source]¶ Segmentation dataset parser constructor.
This constructor receives all extra arguments necessary to build a segmentation task object.
Parameters: - class_names – list of all class names (or labels) that must be predicted in the image.
- input_key – key used to index the input image in the loaded samples.
- label_map_key – key used to index the label map in the loaded samples.
- meta_keys – list of extra keys that will be available in the loaded samples.
- transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
- deepcopy – specifies whether this dataset interface should be deep-copied inside
thelper.data.loaders.LoaderFactory
so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.
-
metadata_handling_modes
= ['const_channel', 'scaled_channel']¶
-
-
class
thelper.data.geo.gdl.
SegmentationDataset
(class_names, work_folder, dataset_type, max_sample_count=None, dontcare=None, transforms=None)[source]¶ Bases:
thelper.data.parsers.SegmentationDataset
Semantic segmentation dataset interface for GDL-based HDF5 parsing.
-
__init__
(class_names, work_folder, dataset_type, max_sample_count=None, dontcare=None, transforms=None)[source]¶ Segmentation dataset parser constructor.
This constructor receives all extra arguments necessary to build a segmentation task object.
Parameters: - class_names – list of all class names (or labels) that must be predicted in the image.
- input_key – key used to index the input image in the loaded samples.
- label_map_key – key used to index the label map in the loaded samples.
- meta_keys – list of extra keys that will be available in the loaded samples.
- transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
- deepcopy – specifies whether this dataset interface should be deep-copied inside
thelper.data.loaders.LoaderFactory
so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.
-
thelper.data.geo.infer module¶
-
class
thelper.data.geo.infer.
SlidingWindowTester
(session_name, session_dir, model, task, loaders, config, ckptdata=None)[source]¶ Bases:
thelper.infer.base.Tester
Tester that satisfies the requirements of the
Tester
in order to run classification inference-
__init__
(session_name, session_dir, model, task, loaders, config, ckptdata=None)[source]¶ Receives the trainer configuration dictionary, parses it, and sets up the session.
-
eval_epoch
(model, epoch, dev, loader, metrics, output_path)[source]¶ Computes the pixelwise prediction on an image.
It does the prediction per batch size of N pixels. It returns the class predicted and its probability. The results are saved into two images created with the same size and projection info as the input rasters.
The
class
image gives the class id, a number between 1 and the number of classes for corresponding pixels. Class id 0 is reserved fornodata
.The
probs
image contains N-class channels with the probability values of the pixels for each class. The probabilities by default are normalised.Also, a
config-classes.json
file is created listing thename-to-class-id
mapping that was used to generate the values in theclass
image (i.e.: class names defined by the pre-trainedmodel
).Parameters: - model – the model with which to run inference that is already uploaded to the target device(s).
- epoch – the epoch index we are training for (0-based, and should normally only be 0 for single test pass).
- dev – the target device that tensors should be uploaded to (corresponding to model’s device(s)).
- loader – the data loader used to get transformed test samples.
- metrics – the dictionary of metrics/consumers to report inference results (mostly loggers and basic report generator in this case since there shouldn’t be ground truth labels to validate against).
- output_path – directory where output files should be written, if necessary.
-
supports_classification
= True¶
-
thelper.data.geo.ogc module¶
Data parsers & utilities module for OGC-related projects.
-
class
thelper.data.geo.ogc.
TB15D104
[source]¶ Bases:
object
Wrapper class for OGC Testbed-15 (D104) identifiers.
-
BACKGROUND_ID
= 0¶
-
LAKE_ID
= 1¶
-
TYPECE_LAKE
= '21'¶
-
TYPECE_RIVER
= '10'¶
-
-
class
thelper.data.geo.ogc.
TB15D104Dataset
(raster_path, vector_path, px_size=None, allow_outlying_vectors=True, clip_outlying_vectors=True, lake_area_min=0.0, lake_area_max=inf, lake_river_max_dist=inf, feature_buffer=1000, master_roi=None, focus_lakes=True, srs_target='3857', force_parse=False, reproj_rasters=False, reproj_all_cpus=True, display_debug=False, keep_rasters_open=True, parallel=False, transforms=None)[source]¶ Bases:
thelper.data.geo.parsers.VectorCropDataset
OGC Testbed-15 dataset parser for D104 (lake/river) segmentation task.
-
__init__
(raster_path, vector_path, px_size=None, allow_outlying_vectors=True, clip_outlying_vectors=True, lake_area_min=0.0, lake_area_max=inf, lake_river_max_dist=inf, feature_buffer=1000, master_roi=None, focus_lakes=True, srs_target='3857', force_parse=False, reproj_rasters=False, reproj_all_cpus=True, display_debug=False, keep_rasters_open=True, parallel=False, transforms=None)[source]¶ Dataset parser constructor.
In order for derived datasets to be instantiated automatically by the framework from a configuration file, they must minimally accept a ‘transforms’ argument like the shown one here.
Parameters: - transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
- deepcopy – specifies whether this dataset interface should be deep-copied inside
thelper.data.loaders.LoaderFactory
so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.
-
-
class
thelper.data.geo.ogc.
TB15D104DetectLogger
(conf_threshold=0.5)[source]¶
-
class
thelper.data.geo.ogc.
TB15D104TileDataset
(raster_path, vector_path, tile_size, tile_overlap, px_size=None, allow_outlying_vectors=True, clip_outlying_vectors=True, lake_area_min=0.0, lake_area_max=inf, master_roi=None, srs_target='3857', force_parse=False, reproj_rasters=False, reproj_all_cpus=True, display_debug=False, keep_rasters_open=True, parallel=False, transforms=None)[source]¶ Bases:
thelper.data.geo.parsers.TileDataset
OGC Testbed-15 dataset parser for D104 (lake/river) segmentation task.
-
__init__
(raster_path, vector_path, tile_size, tile_overlap, px_size=None, allow_outlying_vectors=True, clip_outlying_vectors=True, lake_area_min=0.0, lake_area_max=inf, master_roi=None, srs_target='3857', force_parse=False, reproj_rasters=False, reproj_all_cpus=True, display_debug=False, keep_rasters_open=True, parallel=False, transforms=None)[source]¶ Dataset parser constructor.
In order for derived datasets to be instantiated automatically by the framework from a configuration file, they must minimally accept a ‘transforms’ argument like the shown one here.
Parameters: - transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
- deepcopy – specifies whether this dataset interface should be deep-copied inside
thelper.data.loaders.LoaderFactory
so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.
-
thelper.data.geo.parsers module¶
Geospatial data parser & utilities module.
-
class
thelper.data.geo.parsers.
ImageFolderGDataset
(root, transforms=None, image_key='image', label_key='label', path_key='path', idx_key='idx', channels=None)[source]¶ Bases:
thelper.data.parsers.ImageFolderDataset
Image folder dataset specialization interface for classification tasks on geospatial images.
This specialization is used to parse simple image subfolders, and it essentially replaces the very basic
torchvision.datasets.ImageFolder
interface with similar functionalities. It it used to provide a proper task interface as well as path metadata in each loaded packet for metrics/logging output.The difference with the parent class ImageFolderDataset is the used of gdal to manage multi channels images found in remote sensing domain. The user can specify the channels to load. By default the first three channels are loaded [1,2,3].
See also
-
class
thelper.data.geo.parsers.
SlidingWindowDataset
(raster_path, raster_bands, patch_size, transforms=None, image_key='image')[source]¶ Bases:
thelper.data.parsers.Dataset
Sliding window dataset specialization interface for classification tasks over a geospatial image.
The dataset runs a sliding window over the whole geospatial image in order to return tile patches. The operation can be accomplished over multiple raster bands if they can be found in the provided raster container.
-
__init__
(raster_path, raster_bands, patch_size, transforms=None, image_key='image')[source]¶ Dataset parser constructor.
In order for derived datasets to be instantiated automatically by the framework from a configuration file, they must minimally accept a ‘transforms’ argument like the shown one here.
Parameters: - transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
- deepcopy – specifies whether this dataset interface should be deep-copied inside
thelper.data.loaders.LoaderFactory
so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.
-
-
class
thelper.data.geo.parsers.
TileDataset
(raster_path, vector_path, tile_size, tile_overlap=0, skip_empty_tiles=False, skip_nodata_tiles=True, px_size=None, allow_outlying_vectors=True, clip_outlying_vectors=True, vector_area_min=0.0, vector_area_max=inf, vector_target_prop=None, master_roi=None, srs_target='3857', raster_key='raster', mask_key='mask', cleaner=None, force_parse=False, reproj_rasters=False, reproj_all_cpus=True, keep_rasters_open=True, transforms=None)[source]¶ Bases:
thelper.data.geo.parsers.VectorCropDataset
Abstract dataset used to systematically tile vector data and rasters.
-
__init__
(raster_path, vector_path, tile_size, tile_overlap=0, skip_empty_tiles=False, skip_nodata_tiles=True, px_size=None, allow_outlying_vectors=True, clip_outlying_vectors=True, vector_area_min=0.0, vector_area_max=inf, vector_target_prop=None, master_roi=None, srs_target='3857', raster_key='raster', mask_key='mask', cleaner=None, force_parse=False, reproj_rasters=False, reproj_all_cpus=True, keep_rasters_open=True, transforms=None)[source]¶ Dataset parser constructor.
In order for derived datasets to be instantiated automatically by the framework from a configuration file, they must minimally accept a ‘transforms’ argument like the shown one here.
Parameters: - transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
- deepcopy – specifies whether this dataset interface should be deep-copied inside
thelper.data.loaders.LoaderFactory
so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.
-
-
class
thelper.data.geo.parsers.
VectorCropDataset
(raster_path, vector_path, px_size=None, skew=None, allow_outlying_vectors=True, clip_outlying_vectors=True, vector_area_min=0.0, vector_area_max=inf, vector_target_prop=None, feature_buffer=None, master_roi=None, srs_target='3857', raster_key='raster', mask_key='mask', cleaner=None, cropper=None, force_parse=False, reproj_rasters=False, reproj_all_cpus=True, keep_rasters_open=True, transforms=None)[source]¶ Bases:
thelper.data.parsers.Dataset
Abstract dataset used to combine geojson vector data and rasters.
-
__init__
(raster_path, vector_path, px_size=None, skew=None, allow_outlying_vectors=True, clip_outlying_vectors=True, vector_area_min=0.0, vector_area_max=inf, vector_target_prop=None, feature_buffer=None, master_roi=None, srs_target='3857', raster_key='raster', mask_key='mask', cleaner=None, cropper=None, force_parse=False, reproj_rasters=False, reproj_all_cpus=True, keep_rasters_open=True, transforms=None)[source]¶ Dataset parser constructor.
In order for derived datasets to be instantiated automatically by the framework from a configuration file, they must minimally accept a ‘transforms’ argument like the shown one here.
Parameters: - transforms – function or object that should be applied to all loaded samples in order to return the data in the requested transformed/augmented state.
- deepcopy – specifies whether this dataset interface should be deep-copied inside
thelper.data.loaders.LoaderFactory
so that it may be shared between different threads. This is false by default, as we assume datasets do not contain a state or buffer that might cause problems in multi-threaded data loaders.
-
thelper.data.geo.utils module¶
-
thelper.data.geo.utils.
export_geojson_with_crs
(features, srs_target)[source]¶ Exports a list of features along with their SRS into a GeoJSON-compat string.
-
thelper.data.geo.utils.
get_feature_roi
(geom, px_size, skew, roi_buffer=None, crop_img_size=None, crop_real_size=None)[source]¶
-
thelper.data.geo.utils.
parse_geojson
(geojson, srs_target=None, roi=None, allow_outlying=False, clip_outlying=False)[source]¶
-
thelper.data.geo.utils.
parse_geojson_crs
(body)[source]¶ Imports a coordinate reference system (CRS) from a GeoJSON tree.
-
thelper.data.geo.utils.
parse_raster_metadata
(raster_metadata, raster_dataset=None)[source]¶ Parses the provided raster metadata and updates it by adding extra details required for later use.
The provided raster metadata is updated directly. Metadata is validated against the matching data storage. If any important, required or requested (bands) metadata is missing, the function raises the issue immediately.
Parameters: - raster_metadata (dict) – raster metadata dictionary with minimally a file ‘path’ and list of ‘bands’ indices to process.
- raster_dataset (gdal.Dataset) – (optional) preloaded dataset object corresponding to the raster metadata.
Raises: ValueError
– at least one input raster was missing a required metadata parameter or a parameter is erroneous.IOError
– the raster path could not be found or reading it did not generate a valid raster using GDAL.