lit_ecology_classifier.data package

Submodules

lit_ecology_classifier.data.datamodule module

class lit_ecology_classifier.data.datamodule.DataModule(datapath: str, batch_size: int, dataset: str, TTA: bool = False, class_map: dict = {}, priority_classes: list = [], rest_classes: list = [], splits: Iterable = [0.7, 0.15], **kwargs)[source]

Bases: LightningDataModule

A LightningDataModule for handling image datasets stored in a tar file. This module is responsible for preparing and loading data in a way that is compatible with PyTorch training routines using the PyTorch Lightning framework.

tarpath

Path to the tar file containing the dataset.

Type:

str

batch_size

Number of images to load per batch.

Type:

int

dataset

Identifier for the dataset being used.

Type:

str

testing

Flag to enable testing mode, which includes TTA (Test Time Augmentation).

Type:

bool

priority_classes

Path to the JSON file containing a list of the priority classes.

Type:

str

splits

Proportions to split the dataset into training, validation, and testing.

Type:

Iterable

predict_dataloader()[source]

Constructs the DataLoader for inference on data. :returns: DataLoader object for the inference dataset. :rtype: DataLoader

setup(stage=None)[source]

Prepares the datasets for training, validation, and testing by applying appropriate splits. This method also handles the TTA mode adjustments.

Parameters:

stage (Optional[str]) – Current stage of the model training/testing. Not used explicitly in the method.

test_dataloader()[source]

Constructs the DataLoader for testing data. :returns: DataLoader object for the testing dataset. :rtype: DataLoader

train_dataloader()[source]

Constructs the DataLoader for training data. :returns: DataLoader object for the training dataset. :rtype: DataLoader

val_dataloader()[source]

Constructs the DataLoader for validation data. :returns: DataLoader object for the validation dataset. :rtype: DataLoader

lit_ecology_classifier.data.tardataset module

class lit_ecology_classifier.data.tardataset.TarImageDataset(tar_path: str, class_map: dict, priority_classes: list, rest_classes: list, TTA: bool = False, train: bool = False)[source]

Bases: Dataset

A Dataset subclass for managing and accessing image data stored in tar files. This class supports optional image transformations, and Test Time Augmentation (TTA) for enhancing model evaluation during testing.

tar_path

Path to the tar file containing image data.

Type:

str

class_map_path

Path to the JSON file mapping class names to labels.

Type:

str

priority_classes

Path to a JSON file specifying priority classes for targeted training or evaluation.

Type:

str

train

Specifies whether the dataset will be used for training. Determines the type of transformations applied.

Type:

bool

TTA

Indicates if Test Time Augmentation should be applied during testing.

Type:

bool

get_label_from_filename(filename)[source]

Extracts the label index from a given filename.

Parameters:

filename (str) – The filename from which to extract the label.

Returns:

The label index corresponding to the class.

Return type:

int

shuffle()[source]

Shuffles the list of image information to randomize data access, useful during training.

Module contents