thelper.tasks package

Task definition package.

This package contains classes and functions whose role is to define the input/output formats and operations expected from a trained model. This essentially defines the ‘goal’ of the model, and is used to specialize and automate its training.

Submodules

thelper.tasks.classif module

Classification task interface module.

This module contains a class that defines the objectives of models/trainers for classification tasks.

class thelper.tasks.classif.Classification(class_names: Iterable[AnyStr], input_key: collections.abc.Hashable, label_key: collections.abc.Hashable, meta_keys: Optional[Iterable[collections.abc.Hashable]] = None, multi_label: bool = False)[source]

Bases: thelper.tasks.utils.Task, thelper.ifaces.ClassNamesHandler

Interface for input labeling/classification tasks.

This specialization requests that when given an input tensor, the trained model should provide prediction scores for each predefined label (or class). The label names are used here to help categorize samples, and to assure that two tasks are only identical when their label counts and ordering match.

Variables:
  • class_names – list of label (class) names to predict (each name should be a string).
  • input_key – the key used to fetch input tensors from a sample dictionary.
  • label_key – the key used to fetch label (class) names/indices from a sample dictionary.
  • meta_keys – the list of extra keys provided by the data parser inside each sample.
__init__(class_names: Iterable[AnyStr], input_key: collections.abc.Hashable, label_key: collections.abc.Hashable, meta_keys: Optional[Iterable[collections.abc.Hashable]] = None, multi_label: bool = False)[source]

Receives and stores the class (or label) names to predict, the input tensor key, the groundtruth label (class) key, and the extra (meta) keys produced by the dataset parser(s).

The class names can be provided as a list of strings, or as a path to a json file that contains such a list. The list must contain at least two items. All other arguments are used as-is to index dictionaries, and must therefore be key-compatible types.

If the multi_label is activated, samples with non-scalar class labels will be allowed in the get_class_sizes and get_class_sample_map functions.

check_compat(task: thelper.tasks.utils.Task, exact: bool = False) → bool[source]

Returns whether the current task is compatible with the provided one or not.

This is useful for sanity-checking, and to see if the inputs/outputs of two models are compatible. If exact = True, all fields will be checked for exact (perfect) compatibility (in this case, matching meta keys and class name order).

get_class_sample_map(samples: Iterable, unset_key: collections.abc.Hashable = None) → Dict[AnyStr, List[int]][source]

Splits a list of samples based on their labels into a map of sample lists.

This function is useful if we need to split a dataset based on its label categories in order to sort it, augment it, or re-balance it. The samples do not need to be fully loaded for this to work, as only their label (gt) value will be queried. If a sample is missing its label, it will be ignored and left out of the generated dictionary unless a value is given for unset_key.

Parameters:
  • samples – the samples to split, where each sample is provided as a dictionary.
  • unset_key – a key under which all unlabeled samples should be kept (None = ignore).
Returns:

A dictionary that maps each class label to its corresponding list of samples.

get_class_sizes(samples: Iterable) → Dict[AnyStr, int][source]

Given a list of samples, returns a map of sample counts for each class label.

get_compat(task: thelper.tasks.utils.Task) → thelper.tasks.utils.Task[source]

Returns a task instance compatible with the current task and the given one.

supports_classification = True

thelper.tasks.detect module

Detection task interface module.

This module contains classes that define object detection utilities and task interfaces.

class thelper.tasks.detect.BoundingBox(class_id, bbox, include_margin=True, difficult=False, occluded=False, truncated=False, iscrowd=False, confidence=None, image_id=None, task=None)[source]

Bases: object

Interface used to hold instance metadata for object detection tasks.

Object detection trainers and display utilities in the framework will expect this interface to be used when parsing a predicted detection or an annotation. The base contents are based on the PASCALVOC metadata structure, and this class can be derived if necessary to contain more metadata.

Variables:
  • class_id – type identifier for the underlying object instance.
  • bbox – four-element tuple holding the (xmin,ymin,xmax,ymax) bounding box parameters.
  • include_margin – defines whether xmax/ymax is included in the bounding box area or not.
  • difficult – defines whether this instance is considered “difficult” (false by default).
  • occluded – defines whether this instance is considered “occluded” (false by default).
  • truncated – defines whether this instance is considered “truncated” (false by default).
  • iscrowd – defines whether this instance covers a “crowd” of objects or not (false by default).
  • confidence – scalar or array of prediction confidence values tied to class types (empty by default).
  • image_id – string used to identify the image containing this bounding box (i.e. file path or uuid).
  • task – reference to the task object that holds extra metadata regarding the content of the bbox (None by default).
__init__(class_id, bbox, include_margin=True, difficult=False, occluded=False, truncated=False, iscrowd=False, confidence=None, image_id=None, task=None)[source]

Receives and stores low-level input detection metadata for later access.

area

Returns a scalar indicating the total surface of the annotation (may be None if unknown/unspecified).

bbox

Returns the bounding box tuple \((x_min,y_min,x_max,y_max)\).

bottom

Returns the bottom bounding box edge origin offset value.

bottom_right

Returns the bottom right bounding box corner coordinates \((x,y)\).

centroid

Returns the bounding box centroid coordinates \((x,y)\).

class_id

Returns the object class type identifier.

confidence

Returns the confidence value (or array of confidence values) associated to the predicted class types.

static decode(vec, format=None)[source]

Returns a BoundingBox object from a vectorized representation in a specified format.

Note

The input bbox is expected to be a 4 element array \((x_min,y_min,x_max,y_max)\).

difficult

Returns whether this bounding box is considered difficult by the dataset (false by default).

encode(format=None)[source]

Returns a vectorizable representation of this bounding box in a specified format.

WARNING: Encoding might cause information loss (e.g. task reference is discarded).

height

Returns the height of the bounding box.

image_id

Returns the image string identifier.

include_margin

Returns whether \(x_max\) and \(y_max\) are included in the bounding box area or not

intersects(geom)[source]

Returns whether the bounding box intersects a geometry (i.e. a 2D point or another bbox).

iscrowd

Returns whether this instance covers a crowd of objects or not (false by default).

json()[source]

Gets a JSON-serializable representation of the bounding box parameters.

left

Returns the left bounding box edge origin offset value.

occluded

Returns whether this bounding box is considered occluded by the dataset (false by default).

right

Returns the right bounding box edge origin offset value.

supports_detection = True
task

Returns the reference to the task object that holds extra metadata regarding the content of the bbox.

tolist()[source]

Gets a list representation of the underlying bounding box tuple \((x_min,y_min,x_max,y_max)\).

This ensures that Tensor objects are converted to native Python types.

top

Returns the top bounding box edge origin offset value.

top_left

Returns the top left bounding box corner coordinates \((x,y)\).

totuple()[source]

Gets a tuple representation of the underlying bounding box tuple \((x_min,y_min,x_max,y_max)\).

This ensures that Tensor objects are converted to native Python types.

truncated

Returns whether this bounding box is considered truncated by the dataset (false by default).

width

Returns the width of the bounding box.

class thelper.tasks.detect.Detection(class_names, input_key, bboxes_key, meta_keys=None, input_shape=None, target_shape=None, target_min=None, target_max=None, background=None, color_map=None)[source]

Bases: thelper.tasks.regr.Regression, thelper.ifaces.ClassNamesHandler, thelper.ifaces.ColorMapHandler

Interface for object detection tasks.

This specialization requests that when given an input image, the trained model should provide a list of bounding box (bbox) proposals that correspond to probable objects detected in the image.

This specialized regression interface is currently used to help display functions.

Variables:
  • class_names – map of class name-value pairs for object types to detect.
  • input_key – the key used to fetch input tensors from a sample dictionary.
  • bboxes_key – the key used to fetch target (groundtruth) bboxes from a sample dictionary.
  • meta_keys – the list of extra keys provided by the data parser inside each sample.
  • input_shape – a numpy-compatible shape to expect input images to possess.
  • target_shape – a numpy-compatible shape to expect the predictions to be in.
  • target_min – a 2-dim tensor containing minimum (x,y) bounding box corner values.
  • target_max – a 2-dim tensor containing maximum (x,y) bounding box corner values.
  • background – value of the ‘background’ label (if any) used in the class map.
  • color_map – map of class name-color pairs to use when displaying results.
__init__(class_names, input_key, bboxes_key, meta_keys=None, input_shape=None, target_shape=None, target_min=None, target_max=None, background=None, color_map=None)[source]

Receives and stores the bbox types to detect, the input tensor key, the groundtruth bboxes list key, the extra (meta) keys produced by the dataset parser(s), and the color map used to color bboxes when displaying results.

The class names can be provided as a list of strings, as a path to a json file that contains such a list, or as a map of predefined name-value pairs to use in gt maps. This list/map must contain at least two elements (background and one class). All other arguments are used as-is to index dictionaries, and must therefore be key- compatible types.

check_compat(task, exact=False)[source]

Returns whether the current task is compatible with the provided one or not.

This is useful for sanity-checking, and to see if the inputs/outputs of two models are compatible. If exact = True, all fields will be checked for exact (perfect) compatibility (in this case, matching meta keys and class maps).

get_class_sizes(samples, bbox_format=None)[source]

Given a list of samples, returns a map of element counts for each object type.

get_compat(task)[source]

Returns a task instance compatible with the current task and the given one.

supports_detection = True

thelper.tasks.regr module

Regression task interface module.

This module contains a class that defines the objectives of models/trainers for regression tasks.

class thelper.tasks.regr.Regression(input_key, target_key, meta_keys=None, input_shape=None, target_shape=None, target_type=None, target_min=None, target_max=None)[source]

Bases: thelper.tasks.utils.Task

Interface for n-dimension regression tasks.

This specialization requests that when given an input tensor, the trained model should provide an n-dimensional target prediction. This is a fairly generic task that (unlike image classification and semantic segmentation) is not linked to a pre-existing set of possible solutions. The task interface is used to carry useful metadata for this task, e.g. input/output shapes, types, and min/max values for rounding/saturation.

Variables:
  • input_shape – a numpy-compatible shape to expect model inputs to be in.
  • target_shape – a numpy-compatible shape to expect the predictions to be in.
  • target_type – a numpy-compatible type to cast the predictions to (if needed).
  • target_min – an n-dim tensor containing minimum target values (if applicable).
  • target_max – an n-dim tensor containing maximum target values (if applicable).
  • input_key – the key used to fetch input tensors from a sample dictionary.
  • target_key – the key used to fetch target (groundtruth) values from a sample dictionary.
  • meta_keys – the list of extra keys provided by the data parser inside each sample.
__init__(input_key, target_key, meta_keys=None, input_shape=None, target_shape=None, target_type=None, target_min=None, target_max=None)[source]

Receives and stores the keys produced by the dataset parser(s).

check_compat(task, exact=False)[source]

Returns whether the current task is compatible with the provided one or not.

This is useful for sanity-checking, and to see if the inputs/outputs of two models are compatible. If exact = True, all fields will be checked for exact (perfect) compatibility (in this case, matching meta keys).

get_compat(task)[source]

Returns a task instance compatible with the current task and the given one.

input_shape

Returns the shape of the input tensors to be processed by the model.

supports_regression = True
target_max

Returns the maximum target value(s) to be generated by the model.

target_min

Returns the minimum target value(s) to be generated by the model.

target_shape

Returns the shape of the output tensors to be generated by the model.

target_type

Returns the type of the output tensors to be generated by the model.

class thelper.tasks.regr.SuperResolution(input_key, target_key, meta_keys=None, input_shape=None, target_type=None, target_min=None, target_max=None)[source]

Bases: thelper.tasks.regr.Regression

Interface for super-resolution tasks.

This specialization requests that when given an input tensor, the trained model should provide an identically-shape target prediction that essentially contains more (or more adequate) high-frequency spatial components.

This specialized regression interface is currently used to help display functions.

Variables:
  • input_shape – a numpy-compatible shape to expect model inputs/outputs to be in.
  • target_type – a numpy-compatible type to cast the predictions to (if needed).
  • target_min – an n-dim tensor containing minimum target values (if applicable).
  • target_max – an n-dim tensor containing maximum target values (if applicable).
  • input_key – the key used to fetch input tensors from a sample dictionary.
  • target_key – the key used to fetch target (groundtruth) values from a sample dictionary.
  • meta_keys – the list of extra keys provided by the data parser inside each sample.
__init__(input_key, target_key, meta_keys=None, input_shape=None, target_type=None, target_min=None, target_max=None)[source]

Receives and stores the keys produced by the dataset parser(s).

supports_regression = True

thelper.tasks.segm module

Segmentation task interface module.

This module contains a class that defines the objectives of models/trainers for segmentation tasks.

class thelper.tasks.segm.Segmentation(class_names, input_key, label_map_key, meta_keys=None, dontcare=None, color_map=None)[source]

Bases: thelper.tasks.utils.Task, thelper.ifaces.ClassNamesHandler, thelper.ifaces.ColorMapHandler

Interface for pixel-level labeling/classification (segmentation) tasks.

This specialization requests that when given an input tensor, the trained model should provide prediction scores for each predefined label (or class), for each element of the input tensor. The label names are used here to help categorize samples, and to assure that two tasks are only identical when their label counts and ordering match.

Variables:
  • class_names – map of class name-value pairs to predict for each pixel.
  • input_key – the key used to fetch input tensors from a sample dictionary.
  • label_map_key – the key used to fetch label (class) maps from a sample dictionary.
  • meta_keys – the list of extra keys provided by the data parser inside each sample.
  • dontcare – value of the ‘dontcare’ label (if any) used in the class map.
  • color_map – map of class name-color pairs to use when displaying results.
__init__(class_names, input_key, label_map_key, meta_keys=None, dontcare=None, color_map=None)[source]

Receives and stores the class (or label) names to predict, the input tensor key, the groundtruth label (class) map key, the extra (meta) keys produced by the dataset parser(s), the ‘dontcare’ label value that might be present in gt maps (if any), and the color map used to swap label indices for colors when displaying results.

The class names can be provided as a list of strings, as a path to a json file that contains such a list, or as a map of predefined name-value pairs to use in gt maps. This list/map must contain at least two elements. All other arguments are used as-is to index dictionaries, and must therefore be key-compatible types.

check_compat(task, exact=False)[source]

Returns whether the current task is compatible with the provided one or not.

This is useful for sanity-checking, and to see if the inputs/outputs of two models are compatible. If exact = True, all fields will be checked for exact (perfect) compatibility (in this case, matching meta keys and class maps).

get_class_sizes(samples)[source]

Given a list of samples, returns a map of element counts for each class label.

get_compat(task)[source]

Returns a task instance compatible with the current task and the given one.

supports_segmentation = True

thelper.tasks.utils module

Task utility functions & base interface module.

This module contains utility functions used to instantiate tasks and check their compatibility, and the base interface used to define new tasks.

class thelper.tasks.utils.Task(input_key: collections.abc.Hashable, gt_key: collections.abc.Hashable = None, meta_keys: Optional[Iterable[collections.abc.Hashable]] = None)[source]

Bases: object

Basic task interface that defines a training objective and that holds sample i/o keys.

Since the framework’s data loaders expect samples to be passed in as dictionaries, keys are required to obtain the input that should be forwarded to a model, and to obtain the groundtruth required for the evaluation of model predictions. Other keys might also be kept by this interface for reference (these are considered meta keys).

Note that while this interface can be instantiated directly, trainers and models might not be provided enough information about their goal to be correctly instantiated. Thus, specialized task objects derived from this base class should be used if possible.

Variables:
  • input_key – the key used to fetch input tensors from a sample dictionary.
  • gt_key – the key used to fetch gt tensors from a sample dictionary.
  • meta_keys – the list of extra keys provided by the data parser inside each sample.
__init__(input_key: collections.abc.Hashable, gt_key: collections.abc.Hashable = None, meta_keys: Optional[Iterable[collections.abc.Hashable]] = None)[source]

Receives and stores the keys used to index dataset sample contents.

check_compat(task: thelper.tasks.utils.Task, exact: bool = False) → bool[source]

Returns whether the current task is compatible with the provided one or not.

This is useful for sanity-checking, and to see if the inputs/outputs of two models are compatible. It should be overridden in derived classes to specialize the compatibility verification. If exact = True, all fields will be checked for exact compatibility.

get_compat(task: thelper.tasks.utils.Task) → thelper.tasks.utils.Task[source]

Returns a task instance compatible with the current task and the given one.

gt_key

Returns the key used to fetch groundtruth data tensors from a sample dictionary.

input_key

Returns the key used to fetch input data tensors from a sample dictionary.

keys

Returns a list of all keys used to carry tensors and metadata in samples.

meta_keys

Returns the list of keys used to carry meta/auxiliary data in samples.

thelper.tasks.utils.create_global_task(tasks: Optional[Iterable[Task]]) → Optional[thelper.tasks.utils.Task][source]

Returns a new task object that is compatible with a list of subtasks.

When different datasets must be combined in a session, the tasks they define must also be merged. This functions allows us to do so as long as the tasks all share a common objective. If creating a globally-compatible task is impossible, this function will raise an exception. Otherwise, the returned task object can be used to replace the subtasks of all used datasets.

thelper.tasks.utils.create_task(config: Union[Dict[AnyStr, Union[AnyStr, float, int, List[Any], Dict[Any, Any], ConfigDict]], AnyStr]) → Task[source]

Parses a configuration dictionary or repr string and instantiates a task from it.

If a string is provided, it will first be parsed to get the task type, and then the object will be instantiated by forwarding the parameters contained in the string to the constructor of that type. Note that it is important for this function to work that the constructor argument names match the names of parameters printed in the task’s __repr__ function.

If a dict is provided, it should contain a ‘type’ and a ‘params’ field with the values required for direct instantiation.

If a Task instance was specified, it is directly returned.