thelper.transforms package

Transformation operations package.

This package contains data transformation classes and wrappers for preprocessing, augmentation, and normalization of data samples.

Submodules

thelper.transforms.composers module

Transformation composers module.

All transforms should aim to be compatible with both numpy arrays and PyTorch tensors. By default, images are processed using __call__, meaning that for a given transformation t, we apply it via:

image_transformed = t(image)

All important parameters for an operation should also be passed in the constructor and exposed in the operation’s __repr__ function so that external parsers can discover exactly how to reproduce their behavior. For now, these representations are used for debugging more than anything else.

class thelper.transforms.composers.Compose(transforms)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Composes several transforms together (with support for invert ops).

This interface is fully compatible with torchvision.transforms.Compose.

__getitem__(idx)[source]

Returns the idx-th operation wrapped by the composer.

__init__(transforms)[source]

Forwards the list of transformations to the base class.

invert(sample)[source]

Tries to invert the transformations applied to a sample.

Will throw if one of the transformations cannot be inverted.

set_epoch(epoch=0)[source]

Sets the current epoch number in order to change the behavior of some suboperations.

set_seed(seed)[source]

Sets the internal seed to use for stochastic ops.

class thelper.transforms.composers.CustomStepCompose(milestones, last_epoch=-1)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Composes several transforms together based on an epoch schedule.

This interface is fully compatible with torchvision.transforms.Compose. It can be useful if some operations should change their behavior over the course of a training session. Note that all epoch indices are assumed to be 0-based.

Usage example in Python:

# We will scale the resolution of input patches based on an arbitrary schedule
# dsize = (16, 16)   if epoch < 2         (16x16 patches before epoch 2)
# dsize = (32, 32)   if 2 <= epoch < 4    (32x32 patches before epoch 4)
# dsize = (64, 64)   if 4 <= epoch < 8    (64x64 patches before epoch 8)
# dsize = (112, 112) if 8 <= epoch < 12   (112x112 patches before epoch 12)
# dsize = (160, 160) if 12 <= epoch < 15  (160x160 patches before epoch 15)
# dsize = (196, 196) if 15 <= epoch < 18  (196x196 patches before epoch 18)
# dsize = (224, 224) if epoch >= 18       (224x224 patches past epoch 18)
transforms = CustomStepCompose(milestones={
    0: thelper.transforms.Resize(dsize=(16, 16)),
    2: thelper.transforms.Resize(dsize=(32, 32)),
    4: thelper.transforms.Resize(dsize=(64, 64)),
    8: thelper.transforms.Resize(dsize=(112, 112)),
    12: thelper.transforms.Resize(dsize=(160, 160)),
    15: thelper.transforms.Resize(dsize=(196, 196)),
    18: thelper.transforms.Resize(dsize=(224, 224)),
})
for epoch in range(100):
    transforms.set_epoch(epoch)
    for sample in loader:
        sample = transforms(sample)
        train(...)
Variables:
  • stages – list of epochs where a new scaling factor is to be applied.
  • transforms – list of transformation to apply at each stage.
  • milestones – original milestones map provided in the constructor.
  • epoch – index of the current epoch.
__call__(img)[source]

Applies the current stage of transformation operations to a sample.

__getitem__(idx)[source]

Returns the idx-th operation wrapped by the composer.

__init__(milestones, last_epoch=-1)[source]

Receives the milestone stages (or stage lists), and the initialization state.

If the milestones do not include the first epoch (idx = 0), then no transform will be applied until the next specified epoch index. When last_epoch is -1, the training is assumed to start from scratch.

Parameters:
  • milestones – Map of epoch indices tied to transformation stages. Keys must be increasing.
  • last_epoch – The index of last epoch. Default: -1.
invert(sample)[source]

Tries to invert the transformations applied to a sample.

Will throw if one of the transformations cannot be inverted.

set_epoch(epoch=0)[source]

Sets the current epoch number in order to change the behavior of some suboperations.

set_seed(seed)[source]

Sets the internal seed to use for stochastic ops.

step(epoch=None)[source]

Advances the epoch tracker in order to change the behavior of some suboperations.

thelper.transforms.operations module

Transformation operations module.

All transforms should aim to be compatible with both numpy arrays and PyTorch tensors. By default, images are processed using __call__, meaning that for a given transformation t, we apply it via:

image_transformed = t(image)

All important parameters for an operation should also be passed in the constructor and exposed in the operation’s __repr__ function so that external parsers can discover exactly how to reproduce their behavior. For now, these representations are used for debugging more than anything else.

class thelper.transforms.operations.Affine(transf, out_size=None, flags=None, border_mode=None, border_val=None)[source]

Bases: object

Warps a given image using an affine matrix via OpenCV and numpy.

This operation is deterministic. The code relies on OpenCV, meaning the border arguments must be compatible with cv2.warpAffine.

Variables:
  • transf – the 2x3 transformation matrix passed to cv2.warpAffine.
  • out_size – target image size (tuple of width, height). If None, same as original.
  • flags – extra warp flags forwarded to cv2.warpAffine.
  • border_mode – border extrapolation mode forwarded to cv2.warpAffine.
  • border_val – border constant extrapolation value forwarded to cv2.warpAffine.
__call__(sample)[source]

Warps a given image using an affine matrix.

Parameters:sample – the image to warp; should be a 2d or 3d numpy array.
Returns:The warped image.
__init__(transf, out_size=None, flags=None, border_mode=None, border_val=None)[source]

Validates and initializes affine warp parameters.

Parameters:
  • transf – the 2x3 transformation matrix passed to cv2.warpAffine.
  • out_size – target image size (tuple of width, height). If None, same as original.
  • flags – extra warp flags forwarded to cv2.warpAffine.
  • border_mode – border extrapolation mode forwarded to cv2.warpAffine.
  • border_val – border constant extrapolation value forwarded to cv2.warpAffine.
invert(sample)[source]

Inverts the warp transformation, but only is the output image has not been cropped before.

class thelper.transforms.operations.CenterCrop(size, bordertype=<sphinx.ext.autodoc.importer._MockObject object>, borderval=0)[source]

Bases: object

Returns a center crop from a given image via OpenCV and numpy.

This operation is deterministic. The code relies on OpenCV, meaning the border arguments must be compatible with cv2.copyMakeBorder.

Variables:
  • size – the size of the target crop (tuple of width, height).
  • relative – specifies whether the target crop size is relative to the image size or not.
  • bordertype – argument forwarded to cv2.copyMakeBorder.
  • borderval – argument forwarded to cv2.copyMakeBorder.
__call__(sample)[source]

Extracts and returns a central crop from the provided image.

Parameters:sample – the image to generate the crop from; should be a 2d or 3d numpy array.
Returns:The center crop.
__init__(size, bordertype=<sphinx.ext.autodoc.importer._MockObject object>, borderval=0)[source]

Validates and initializes center crop parameters.

Parameters:
  • size – size of the target crop, provided as tuple or list. If integer values are used, the size is assumed to be absolute. If floating point values are used, the size is assumed to be relative, and will be determined dynamically for each sample. If a tuple is used, it is assumed to be (width, height).
  • bordertype – border copy type to use when the image is too small for the required crop size. See cv2.copyMakeBorder for more information.
  • borderval – border value to use when the image is too small for the required crop size. See cv2.copyMakeBorder for more information.
invert(sample)[source]

Specifies that this operation cannot be inverted, as data loss is incurred during image transformation.

class thelper.transforms.operations.Duplicator(count, deepcopy=False)[source]

Bases: thelper.transforms.operations.NoTransform

Duplicates and returns a list of copies of the input sample.

This operation is used in data augmentation pipelines that rely on probabilistic or preset transformations. It can produce a fixed number of simple copies or deep copies of the input samples as required.

Warning

Since the duplicates will be given directly to the data loader as part of the same minibatch, using too many copies can adversely affect gradient descent for that minibatch. To simply increase the total size of the training set while still allowing a proper shuffling of samples and/or to keep the minibatch size intact, we instead recommend setting the train_scale configuration value in the data loader. See thelper.data.utils.create_loaders() for more information.

Variables:
  • count – number of copies to generate.
  • deepcopy – specifies whether to deep-copy samples or not.
__call__(sample)[source]

Generates and returns duplicates of the sample/object.

If a dictionary is provided, its values will be expanded into lists that will contain all duplicates. Otherwise, the duplicates will be returned directly as a list.

Parameters:sample – the sample/object to duplicate.
Returns:A list of duplicated samples, or a dictionary of duplicate lists.
__init__(count, deepcopy=False)[source]

Validates and initializes duplication parameters.

Parameters:
  • count – number of copies to generate.
  • deepcopy – specifies whether to deep-copy samples or not.
invert(sample)[source]

Returns the first instance of the list of duplicates.

class thelper.transforms.operations.NoTransform[source]

Bases: object

Used to flag some ops that should not be externally wrapped for sample/key handling.

__call__(sample)[source]

Identity transform.

invert(sample)[source]

Identity transform.

class thelper.transforms.operations.NormalizeMinMax(min, max, out_type=<sphinx.ext.autodoc.importer._MockObject object>)[source]

Bases: object

Normalizes a given image using a set of minimum and maximum values.

The samples will be transformed such that s = (s - min) / (max - min).

Note that this operation is also not restricted to images.

Variables:
  • min – an array of minimum values to subtract with.
  • max – an array of maximum values to divide with.
  • out_type – the output data type to cast the normalization result to.
__call__(sample)[source]

Normalizes a given sample.

Parameters:
  • sample – the sample to normalize. If given as a PIL image, it
  • will be converted to a numpy array first.
Returns:

The warped sample, in a numpy array of type self.out_type.

__init__(min, max, out_type=<sphinx.ext.autodoc.importer._MockObject object>)[source]

Validates and initializes normalization parameters.

Parameters:
  • min – an array of minimum values to subtract with.
  • max – an array of maximum values to divide with.
  • out_type – the output data type to cast the normalization result to.
invert(sample)[source]

Inverts the normalization.

class thelper.transforms.operations.NormalizeZeroMeanUnitVar(mean, std, out_type=<sphinx.ext.autodoc.importer._MockObject object>)[source]

Bases: object

Normalizes a given image using a set of mean and standard deviation parameters.

The samples will be transformed such that s = (s - mean) / std.

This can be used for whitening; see https://en.wikipedia.org/wiki/Whitening_transformation for more information. Note that this operation is also not restricted to images.

Variables:
  • mean – an array of mean values to subtract from data samples.
  • std – an array of standard deviation values to divide with.
  • out_type – the output data type to cast the normalization result to.
__call__(sample)[source]

Normalizes a given sample.

Parameters:
  • sample – the sample to normalize. If given as a PIL image, it
  • will be converted to a numpy array first.
Returns:

The warped sample, in a numpy array of type self.out_type.

__init__(mean, std, out_type=<sphinx.ext.autodoc.importer._MockObject object>)[source]

Validates and initializes normalization parameters.

Parameters:
  • mean – an array of mean values to subtract from data samples.
  • std – an array of standard deviation values to divide with.
  • out_type – the output data type to cast the normalization result to.
invert(sample)[source]

Inverts the normalization.

class thelper.transforms.operations.RandomResizedCrop(output_size, input_size=(0.08, 1.0), ratio=(0.75, 1.33), probability=1.0, random_attempts=10, min_roi_iou=1.0, flags=<sphinx.ext.autodoc.importer._MockObject object>)[source]

Bases: object

Returns a resized crop of a randomly selected image region.

This operation is stochastic, and thus cannot be inverted. Each time the operation is called, a random check will determine whether a transformation is applied or not. The code relies on OpenCV, meaning the interpolation arguments must be compatible with cv2.resize.

Variables:
  • output_size – size of the output crop, provided as a single element (edge_size) or as a two-element tuple or list ([width, height]). If integer values are used, the size is assumed to be absolute. If floating point values are used (i.e. in [0,1]), the output size is assumed to be relative to the original image size, and will be determined at execution time for each sample. If set to None, the crop will not be resized.
  • input_size – range of the input region sizes, provided as a pair of elements ([min_edge_size, max_edge_size]) or as a pair of tuples or lists ([[min_width, min_height], [max_width, max_height]]). If the pair-of-pairs format is used, the ratio argument cannot be used. If integer values are used, the ranges are assumed to be absolute. If floating point values are used (i.e. in [0,1]), the ranges are assumed to be relative to the original image size, and will be determined at execution time for each sample.
  • ratio – range of minimum/maximum input region aspect ratios to use. This argument cannot be used if the pair-of-pairs format is used for the input_size argument.
  • probability – the probability that the transformation will be applied when called; if not applied, the returned image will be the original.
  • random_attempts – the number of random sampling attempts to try before reverting to center or most-probably-valid crop generation.
  • min_roi_iou – minimum roi intersection over union (IoU) required for accepting a tile (in [0,1]).
  • flags – interpolation flag forwarded to cv2.resize.
__call__(image, roi=None, mask=None, bboxes=None)[source]

Extracts and returns a random (resized) crop from the provided image.

Parameters:
  • image – the image to generate the crop from. If given as a 2-element list, it is assumed to contain both the image and the roi (passed through a composer).
  • roi – the roi to check tile intersections with (may be None).
  • mask – a mask to crop simultaneously with the input image (may be None).
  • bboxes – a list or array of bounding boxes to crop with the input image (may be None).
Returns:

The randomly selected and resized crop. If mask and/or bboxes is given, the output will be a dictionary containing the results under the image, mask, and bboxes keys.

__init__(output_size, input_size=(0.08, 1.0), ratio=(0.75, 1.33), probability=1.0, random_attempts=10, min_roi_iou=1.0, flags=<sphinx.ext.autodoc.importer._MockObject object>)[source]

Validates and initializes center crop parameters.

Parameters:
  • output_size – size of the output crop, provided as a single element (edge_size) or as a two-element tuple or list ([width, height]). If integer values are used, the size is assumed to be absolute. If floating point values are used (i.e. in [0,1]), the output size is assumed to be relative to the original image size, and will be determined at execution time for each sample. If set to None, the crop will not be resized.
  • input_size – range of the input region sizes, provided as a pair of elements ([min_edge_size, max_edge_size]) or as a pair of tuples or lists ([[min_width, min_height], [max_width, max_height]]). If the pair-of-pairs format is used, the ratio argument cannot be used. If integer values are used, the ranges are assumed to be absolute. If floating point values are used (i.e. in [0,1]), the ranges are assumed to be relative to the original image size, and will be determined at execution time for each sample.
  • ratio – range of minimum/maximum input region aspect ratios to use. This argument cannot be used if the pair-of-pairs format is used for the input_size argument.
  • probability – the probability that the transformation will be applied when called; if not applied, the returned image will be the original.
  • random_attempts – the number of random sampling attempts to try before reverting to center or most-probably-valid crop generation.
  • min_roi_iou – minimum roi intersection over union (IoU) required for producing a tile.
  • flags – interpolation flag forwarded to cv2.resize.
invert(image)[source]

Specifies that this operation cannot be inverted, as data loss is incurred during image transformation.

set_seed(seed)[source]

Sets the internal seed to use for stochastic ops.

class thelper.transforms.operations.RandomShift(min, max, probability=1.0, flags=None, border_mode=None, border_val=None)[source]

Bases: object

Randomly translates an image in a provided range via OpenCV and numpy.

This operation is stochastic, and thus cannot be inverted. Each time the operation is called, a random check will determine whether a transformation is applied or not. The code relies on OpenCV, meaning the border arguments must be compatible with cv2.warpAffine.

Variables:
  • min – the minimum pixel shift that can be applied stochastically.
  • max – the maximum pixel shift that can be applied stochastically.
  • probability – the probability that the transformation will be applied when called.
  • flags – extra warp flags forwarded to cv2.warpAffine.
  • border_mode – border extrapolation mode forwarded to cv2.warpAffine.
  • border_val – border constant extrapolation value forwarded to cv2.warpAffine.
__call__(sample)[source]

Translates a given image using a predetermined min/max range.

Parameters:sample – the image to translate; should be a 2d or 3d numpy array.
Returns:The translated image.
__init__(min, max, probability=1.0, flags=None, border_mode=None, border_val=None)[source]

Validates and initializes shift parameters.

Parameters:
  • min – the minimum pixel shift that can be applied stochastically.
  • max – the maximum pixel shift that can be applied stochastically.
  • probability – the probability that the transformation will be applied when called.
  • flags – extra warp flags forwarded to cv2.warpAffine.
  • border_mode – border extrapolation mode forwarded to cv2.warpAffine.
  • border_val – border constant extrapolation value forwarded to cv2.warpAffine.
invert(sample)[source]

Specifies that this operation cannot be inverted, as it is stochastic, and data loss occurs during transformation.

set_seed(seed)[source]

Sets the internal seed to use for stochastic ops.

class thelper.transforms.operations.Resize(dsize, fx=0, fy=0, interp=<sphinx.ext.autodoc.importer._MockObject object>, buffer=False)[source]

Bases: object

Resizes a given image using OpenCV and numpy.

This operation is deterministic. The code relies on OpenCV, meaning the interpolation arguments must be compatible with cv2.resize.

Variables:
  • interp – interpolation type to use (forwarded to cv2.resize)
  • buffer – specifies whether a destination buffer should be used to avoid allocations
  • dsize – target image size (tuple of width, height).
  • dst – buffer used to avoid reallocations if self.buffer == True
  • fx – argument forwarded to cv2.resize.
  • fy – argument forwarded to cv2.resize.
__call__(sample)[source]

Returns a resized copy of the provided image.

Parameters:sample – the image to resize; should be a 2d or 3d numpy array.
Returns:The resized image. May be allocated on the spot, or be a pointer to a local buffer.
__init__(dsize, fx=0, fy=0, interp=<sphinx.ext.autodoc.importer._MockObject object>, buffer=False)[source]

Validates and initializes resize parameters.

Parameters:
  • dsize – size of the target image, forwarded to cv2.resize.
  • fx – x-scaling factor, forwarded to cv2.resize.
  • fy – y-scaling factor, forwarded to cv2.resize.
  • interp – resize interpolation type, forwarded to cv2.resize.
  • buffer – specifies whether a destination buffer should be used to avoid allocations
invert(sample)[source]

Specifies that this operation cannot be inverted, as data loss is incurred during image transformation.

class thelper.transforms.operations.SelectChannels(channels)[source]

Bases: object

Returns selected channels by indices from an array in numpy, torch.Tensor or PIL.Image format.

This operation does not attempt to interpret the meaning of the channel’s content. It only moves them around by indices. It is up to the user to make sure indices make sense for desired result. Input is expected to be encoded as HxWxC and will be returned as numpy array of same format.

Behavior according to provided channels:

  • single index (int): only that channel is extracted to form a single channel array as result
  • multiple indices (list, tuple, set): all specified unique channel indices are extracted and placed in the specified order as result
  • indices map (dict with int values): channels at key index are moved to value index values
Examples::

out = SelectChannels(0)(img) # only the first channel is kept, out is HxWx1 out = SelectChannels([2,1,0])(img3C) # out image will have channels in reversed order out = SelectChannels([0,1,3])(img4C) # out image will drop the channel at index #2 out = SelectChannels({3:0, 0:1, 1:2})(img4C) # out image is HxWx3 with remapped channels using

# <from:to> definitions and drops channel #2

# all of the following are equivalent, some implicit and other explicit with None out = SelectChannels([0, 1, 3])(img4C) out = SelectChannels({0:0, 1:1, 3:3})(img4C) out = SelectChannels({0:0, 1:1, 2:None 3:3})(img4C)

Variables:channels – indices or map of the channels to select from the original array.
Returns:numpy array of selected channels
__call__(sample)[source]

Converts and returns an array in numpy format with selected channels.

Parameters:sample – the array to convert; should be a tensor, numpy array, or PIL image.
Returns:The numpy-converted array with selected channels.
__init__(channels)[source]

Initializes transformation parameters.

invert(sample)[source]

Specifies that this operation cannot be inverted. Original data type is unknown and channels can be dropped.

class thelper.transforms.operations.Tile(tile_size, tile_overlap=0.0, min_mask_iou=1.0, offset_overlap=False, bordertype=<sphinx.ext.autodoc.importer._MockObject object>, borderval=0)[source]

Bases: object

Returns a list of tiles cut out from a given image.

This operation can perform tiling given an optional mask with a target intersection over union (IoU) score, and with an optional overlap between tiles. The tiling is deterministic and can thus be inverted, but only if a mask is not used, as some image regions may be lost otherwise.

If a mask is used, the first tile position is tested exhaustively by iterating over all input coordinates starting from the top-left corner of the image. Otherwise, the first tile position is set as (0,0). Then, all other tiles are found by offsetting frm these coordinates, and testing for IoU with the mask (if needed).

Variables:
  • tile_size – size of the output tiles, provided as a single element (edge_size) or as a two-element tuple or list ([width, height]). If integer values are used, the size is assumed to be absolute. If floating point values are used (i.e. in [0,1]), the output size is assumed to be relative to the original image size, and will be determined at execution time for each image.
  • tile_overlap – overlap allowed between two neighboring tiles; should be a ratio in [0,1].
  • min_mask_iou – minimum mask intersection over union (IoU) required for accepting a tile (in [0,1]).
  • offset_overlap – specifies whether the overlap tiling should be offset outside the image or not.
  • bordertype – border copy type to use when the image is too small for the required crop size. See cv2.copyMakeBorder for more information.
  • borderval – border value to use when the image is too small for the required crop size. See cv2.copyMakeBorder for more information.
__call__(image, mask=None)[source]

Extracts and returns a list of tiles cut out from the given image.

Parameters:
  • image – the image to cut into tiles. If given as a 2-element list, it is assumed to contain both the image and the mask (passed through a composer).
  • mask – the mask to check tile intersections with (may be None).
Returns:

A list of tiles (numpy-compatible images).

__init__(tile_size, tile_overlap=0.0, min_mask_iou=1.0, offset_overlap=False, bordertype=<sphinx.ext.autodoc.importer._MockObject object>, borderval=0)[source]

Validates and initializes tiling parameters.

Parameters:
  • tile_size – size of the output tiles, provided as a single element (edge_size) or as a two-element tuple or list ([width, height]). If integer values are used, the size is assumed to be absolute. If floating point values are used (i.e. in [0,1]), the output size is assumed to be relative to the original image size, and will be determined at execution time for each image.
  • tile_overlap – overlap ratio between two consecutive (neighboring) tiles; should be in [0,1].
  • min_mask_iou – minimum mask intersection over union (IoU) required for producing a tile.
  • offset_overlap – specifies whether the overlap tiling should be offset outside the image or not.
  • bordertype – border copy type to use when the image is too small for the required crop size. See cv2.copyMakeBorder for more information.
  • borderval – border value to use when the image is too small for the required crop size. See cv2.copyMakeBorder for more information.
count_tiles(image, mask=None)[source]

Returns the number of tiles that would be cut out from the given image.

Parameters:
  • image – the image to cut into tiles. If given as a 2-element list, it is assumed to contain both the image and the mask (passed through a composer).
  • mask – the mask to check tile intersections with (may be None).
Returns:

The number of tiles that would be cut with thelper.transforms.operations.Tile.__call__().

invert(image, mask=None)[source]

Returns the reconstituted image from a list of tiles, or throws if a mask was used.

class thelper.transforms.operations.ToColor[source]

Bases: object

Converts a single-channel image to color (RGB).

This operation is deterministic and reversible. It CANNOT be applied to images with more than one channel (HxWx1). The byte ordering (BGR or RGB) does not matter.

__call__(sample)[source]

Call self as a function.

__init__()[source]

Does nothing, there’s no attribute to store for this operation.

invert(sample)[source]

Inverts the operation by calling the ‘ToGray’ operation.

Note that this operation is probably lossy due to OpenCV’s grayscale conversion code which uses “0.21 R + 0.72 G + 0.07 B” to compute the luminosity of a pixel.

class thelper.transforms.operations.ToGray[source]

Bases: object

Converts a multi-channel image to grayscale.

This operation is deterministic, but not reversible. It can be applied to images with more than three channels (RGB) — in that case, it will compute their per-pixel mean value. Note that in any case, the last dimension (that corresponds to the channels) will remain and be of size 1.

__call__(sample)[source]

Call self as a function.

__init__()[source]

Does nothing, there’s no attribute to store for this operation.

invert(sample)[source]

Specifies that this operation cannot be inverted, as data loss occurs during transformation.

class thelper.transforms.operations.ToNumpy(reorder_bgr=False)[source]

Bases: object

Converts and returns an image in numpy format from a torch.Tensor or PIL.Image format.

This operation is deterministic. The returned image will always be encoded as HxWxC, where if the input has three channels, the ordering might be optionally changed.

Variables:reorder_bgr – specifies whether the channels should be reordered in OpenCV format.
__call__(sample)[source]

Converts and returns an image in numpy format.

Parameters:sample – the image to convert; should be a tensor, numpy array, or PIL image.
Returns:The numpy-converted image.
__init__(reorder_bgr=False)[source]

Initializes transformation parameters.

invert(sample)[source]

Specifies that this operation cannot be inverted, as the original data type is unknown.

class thelper.transforms.operations.Transpose(axes)[source]

Bases: object

Transposes an image via numpy.

This operation is deterministic.

Variables:
  • axes – the axes on which to apply the transpose; forwarded to numpy.transpose.
  • axes_inv – used to invert the tranpose; also forwarded to numpy.transpose.
__call__(sample)[source]

Transposes a given image.

Parameters:sample – the image to transpose; should be a numpy array.
Returns:The transposed image.
__init__(axes)[source]

Validates and initializes tranpose parameters.

Parameters:axes – the axes on which to apply the transpose; forwarded to numpy.transpose.
invert(sample)[source]

Invert-transposes a given image.

Parameters:sample – the image to invert-transpose; should be a numpy array.
Returns:The invert-transposed image.
class thelper.transforms.operations.Unsqueeze(axis)[source]

Bases: object

Expands a dimension in the input array via numpy/PyTorch.

This operation is deterministic.

Variables:axis – the axis on which to apply the expansion.
__call__(sample)[source]

Expands a dimension in the input array via numpy/PyTorch.

Parameters:sample – the array to expand.
Returns:The array with an extra dimension.
__init__(axis)[source]

Validates and initializes tranpose parameters.

Parameters:axis – the axis on which to apply the expansion.
invert(sample)[source]

Squeezes a dimension in the input array via numpy/PyTorch.

Parameters:sample – the array to squeeze.
Returns:The array with one less dimension.

thelper.transforms.utils module

Transformations utilities module.

This module contains utility functions used to instantiate transformation/augmentation ops.

thelper.transforms.utils.load_augments(config)[source]

Loads a data augmentation pipeline.

An augmentation pipeline is essentially a specialized transformation pipeline that can be appended or prefixed to the base transforms defined for all samples. Augmentations are typically used to diversify the samples within the training set in order to help model generalization. They can also be applied to validation and test samples in order to get multiple responses for the same input so that they can be averaged/concatenated into a single output.

Usage examples inside a session configuration file:

# ...
# the 'loaders' field can contain several augmentation pipelines
# (see 'thelper.data.utils.create_loaders' for more information on these pipelines)
"loaders": {
    # ...
    # the 'train_augments' operations are applied to training samples only
    "train_augments": {
        # specifies whether to apply the augmentations before or after the base transforms
        "append": false,
        "transforms": [
            {
                # here, we use a single stage, which is actually an augmentor sub-pipeline
                # that is purely probabilistic (i.e. it does not increase input sample count)
                "operation": "Augmentor.Pipeline",
                "params": {
                    # the augmentor pipeline defines two operations: rotations and flips
                    "rotate_random_90": {"probability": 0.75},
                    "flip_random": {"probability": 0.75}
                }
            }
        ]
    },
    # ...
}
# ...
Parameters:config – the configuration dictionary defining the meta parameters as well as the list of transformation operations of the augmentation pipeline.
Returns:A tuple that consists of a pipeline compatible with the torchvision.transforms interfaces, and a bool specifying whether this pipeline should be appended or prefixed to the base transforms.
thelper.transforms.utils.load_transforms(stages, avoid_transform_wrapper=False)[source]

Loads a transformation pipeline from a list of stages.

Each entry in the provided list will be considered a stage in the pipeline. The ordering of the stages is important, as some transformations might not be compatible if taken out of order. The entries must each be dictionaries that define an operation, its parameters, and some meta-parameters (detailed below).

The operation field of each stage will be used to dynamically import a specific type of operation to apply. The params field of each stage will then be used to pass parameters to the constructor of this operation.

If an operation is identified as "Augmentor.Pipeline" or "albumentations.Compose", it will be specially handled. In both case, the params field becomes mandatory in the stage dictionary, and it must specify the Augmentor or albumentations pipeline operation names and parameters (as a dictionary). Two additional optional config fields can then be set for Augmentor pipelines: input_tensor (bool) which specifies whether the previous stage provides a torch.Tensor to the pipeline (default=False); and output_tensor (bool) which specifies whether the output of the pipeline should be converted into a tensor (default=False). For albumentations pipelines, two additional fields are also available, namely bbox_params (dict) and keypoint_params (dict). For more information on these, refer to the documentation of albumentations.core.composition.Compose. Finally, when unpacking dictionaries for albumentations pipelines, the keys associated to bounding boxes/masks/keypoints that must be forwarded to the composer can be specified via the bboxes_key, mask_key, and keypoints_key fields.

All operations can also specify which sample components they should be applied to via the target_key field. This field can contain a single key (typically a string), or a list of keys. The operation will be applied at runtime to all values which are found in the samples with one of those keys. If no key is provided for an operation, it will be applied to all array-like components of the sample. Finally, all operations can specify a linked_fate field (bool) to specify whether the samples provided in lists should all have the same fate or not (default=True).

Usage examples inside a session configuration file:

# ...
# the 'loaders' field may contain several transformation pipelines
# (see 'thelper.data.utils.create_loaders' for more information on these pipelines)
"loaders": {
    # ...
    # the 'base_transforms' operations are applied to all loaded samples
    "base_transforms": [
        {
            "operation": "...",
            "params": {
                ...
            },
            "target_key": [ ... ],
            "linked_fate": ...
        },
        {
            "operation": "...",
            "params": {
                ...
            },
            "target_key": [ ... ],
            "linked_fate": ...
        },
        ...
    ],
# ...
Parameters:stages – a list defining a series of transformations to apply as a single pipeline.
Returns:A transformation pipeline object compatible with the torchvision.transforms interface.

thelper.transforms.wrappers module

Transformations wrappers module.

The wrapper classes herein are used to either support inline operations on odd sample types (e.g. lists of images) or for external libraries (e.g. Augmentor).

class thelper.transforms.wrappers.AlbumentationsWrapper(transforms, bbox_params=None, add_targets=None, image_key='image', bboxes_key='bboxes', mask_key='mask', keypoints_key='keypoints', probability=1.0, cvt_kpts_to_bboxes=False, linked_fate=False)[source]

Bases: object

Albumentations pipeline wrapper that allows dictionary unpacking.

See https://github.com/albu/albumentations for more information.

Variables:
  • pipeline – the augmentor pipeline instance to apply to images.
  • image_key – the key to fetch images from (when dictionaries are passed in).
  • bboxes_key – the key to fetch bounding boxes from (when dictionaries are passed in).
  • mask_key – the key to fetch masks from (when dictionaries are passed in).
  • keypoints_key – the key to fetch keypoints from (when dictionaries are passed in).
  • cvt_kpts_to_bboxes – specifies whether keypoints should be converted to bboxes for compatbility.
  • linked_fate – specifies whether input list samples should all have the same fate or not.
__call__(sample, force_linked_fate=False, op_seed=None)[source]

Transforms a (dict) sample, a single image, or a list of images using the augmentor pipeline.

Parameters:
  • sample – the sample or image(s) to transform (can also contain embedded lists/tuples of images).
  • force_linked_fate – override flag for recursive use allowing forced linking of arrays.
  • op_seed – seed to set before calling the wrapped operation.
Returns:

The transformed image(s), with the same list/tuple formatting as the input.

__init__(transforms, bbox_params=None, add_targets=None, image_key='image', bboxes_key='bboxes', mask_key='mask', keypoints_key='keypoints', probability=1.0, cvt_kpts_to_bboxes=False, linked_fate=False)[source]

Receives and stores an augmentor pipeline for later use.

The pipeline itself is instantiated in thelper.transforms.utils.load_transforms().

set_epoch(epoch=0)[source]

Sets the current epoch number in order to change the behavior of some suboperations.

set_seed(seed)[source]

Sets the internal seed to use for stochastic ops.

class thelper.transforms.wrappers.AugmentorWrapper(pipeline, target_keys=None, linked_fate=True)[source]

Bases: object

Augmentor pipeline wrapper that allows pickling and multi-threading.

See https://github.com/mdbloice/Augmentor for more information. This wrapper was last updated to work with version 0.2.2 — more recent versions introduced yet unfixed (as of 2018/08) issues on some platforms.

All original transforms are supported here. This wrapper also fixes the list output bug for single-image samples when using operations individually.

Variables:
  • pipeline – the augmentor pipeline instance to apply to images.
  • target_keys – the sample keys to apply the pipeline to (when dictionaries are passed in).
  • linked_fate – specifies whether input list samples should all have the same fate or not.
__call__(sample, force_linked_fate=False, op_seed=None, in_cvts=None)[source]

Transforms a (dict) sample, a single image, or a list of images using the augmentor pipeline.

Parameters:
  • sample – the sample or image(s) to transform (can also contain embedded lists/tuples of images).
  • force_linked_fate – override flag for recursive use allowing forced linking of arrays.
  • op_seed – seed to set before calling the wrapped operation.
  • in_cvts – holds the input conversion flag array (for recursive usage).
Returns:

The transformed image(s), with the same list/tuple formatting as the input.

__init__(pipeline, target_keys=None, linked_fate=True)[source]

Receives and stores an augmentor pipeline for later use.

The pipeline itself is instantiated in thelper.transforms.utils.load_transforms().

set_epoch(epoch=0)[source]

Sets the current epoch number in order to change the behavior of some suboperations.

set_seed(seed)[source]

Sets the internal seed to use for stochastic ops.

class thelper.transforms.wrappers.TransformWrapper(operation, params=None, probability=1, convert_pil=False, target_keys=None, linked_fate=True)[source]

Bases: object

Transform wrapper that allows operations on samples, lists, tuples, and single elements.

Can be used to wrap the operations in thelper.transforms or in torchvision.transforms that only accept array-like objects as input. Will optionally force-convert content to PIL images.

Can also be used to transform a list/tuple of images uniformly based on a shared dice roll, or to ensure that each image is transformed independently.

Warning

Stochastic transforms (e.g. torchvision.transforms.RandomCrop) will always treat each image in a list differently. If the same operations are to be applied to all images, you should consider using a series non-stochastic operations wrapped inside an instance of torchvision.transforms.RandomApply, or simply provide the probability of applying the transforms to this wrapper’s constructor.

Variables:
  • operation – the wrapped operation (callable object or class name string to import).
  • params – the parameters that are passed to the operation when init’d or called.
  • probability – the probability that the wrapped operation will be applied.
  • convert_pil – specifies whether images should be converted into PIL format or not.
  • target_keys – the sample keys to apply the transform to (when dictionaries are passed in).
  • linked_fate – specifies whether images given in a list/tuple should have the same fate or not.
__call__(sample, force_linked_fate=False, op_seed=None, in_cvts=None)[source]

Transforms a (dict) sample, a single image, or a list of images using a wrapped operation.

Parameters:
  • sample – the sample or image(s) to transform (can also contain embedded lists/tuples of images).
  • force_linked_fate – override flag for recursive use allowing forced linking of arrays.
  • op_seed – seed to set before calling the wrapped operation.
  • in_cvts – holds the input conversion flag array (for recursive usage).
Returns:

The transformed image(s), with the same list/tuple formatting as the input.

__init__(operation, params=None, probability=1, convert_pil=False, target_keys=None, linked_fate=True)[source]

Receives and stores a torchvision transform operation for later use.

If the operation is given as a string, it is assumed to be a class name and it will be imported. The parameters (if any) will then be given to the constructor of that class. Otherwise, the operation is assumed to be a callable object, and its parameters (if any) will be provided at call-time.

Parameters:
  • operation – the wrapped operation (callable object or class name string to import).
  • params – the parameters that are passed to the operation when init’d or called.
  • probability – the probability that the wrapped operation will be applied.
  • convert_pil – specifies whether images should be forced into PIL format or not.
  • target_keys – the sample keys to apply the pipeline to (when dictionaries are passed in).
  • linked_fate – specifies whether images given in a list/tuple should have the same fate or not.
set_epoch(epoch=0)[source]

Sets the current epoch number in order to change the behavior of some suboperations.

set_seed(seed)[source]

Sets the internal seed to use for stochastic ops.