BigEarthNetv2 (reBEN) DataSets and DataModules#

Dataset for BigEarthNet dataset. Files can be requested by contacting the author. Original Paper of Image Data: TO BE PUBLISHED

https://bigearth.net/

class configilm.extra.DataSets.BENv2_DataSet.BENv2DataSet#

Dataset for BigEarthNet dataset. LMDB-Files can be requested by contacting the author or by downloading the dataset from the official website and encoding it using the BigEarthNet Encoder.

The dataset can be loaded with different channel configurations. The channel configuration is defined by the first element of the img_size tuple (c, h, w). The available configurations are:

  • 2 -> Sentinel-1

  • 3 -> RGB

  • 4 -> 10m Sentinel-2

  • 10 -> 10m + 20m Sentinel-2

  • 12 -> 10m + 20m Sentinel-2 + 10m Sentinel-1

__init__(data_dirs, split=None, transform=None, max_len=None, img_size=(3, 120, 120), return_extras=False, patch_prefilter=None, include_cloudy=False, include_snowy=False)#

Dataset for BigEarthNet v2 dataset. Files can be requested by contacting the author or visiting the official website.

Original Paper of Image Data: TO BE PUBLISHED

Parameters:
  • data_dirs (Mapping[str, Union[str, Path]]) – A mapping from file key to file path. The file key is used to identify the function of the file. The required keys are: “images_lmdb”, “metadata_parquet”, “metadata_snow_cloud_parquet”.

  • split (Optional[str]) –

    The name of the split to use. Can be either “train”, “val” or “test”. If None is provided, all splits are used.

    default:

    None

  • transform (Optional[Callable]) –

    A callable that is used to transform the images after loading them. If None is provided, no transformation is applied.

    default:

    None

  • max_len (Optional[int]) –

    The maximum number of images to use. If None or -1 is provided, all images are used.

    default:

    None

  • img_size (tuple) –

    The size of the images. Note that this includes the number of channels. For example, if the images are RGB images, the size should be (3, h, w).

    default:

    (3, 120, 120)

  • return_extras (bool) –

    If True, the dataset will return the patch name as a third return value.

    default:

    False

  • patch_prefilter (Optional[Callable[[str], bool]]) –

    A callable that is used to filter the patches before they are loaded. If None is provided, no filtering is applied. The callable must take a patch name as input and return True if the patch should be included and False if it should be excluded from the dataset.

    default:

    None

  • include_cloudy (bool) –

  • include_snowy (bool) –

classmethod get_available_channel_configurations()#

Prints all available preconfigured channel combinations.

get_index_from_patchname(patchname)#

Gives the index of the image of a specific name. Does not distinguish between invalid names (not in original BigEarthNet) and names not in loaded list.

Parameters:

patchname (str) – name of an image

Returns:

index of the image or None, if the name is not loaded

Return type:

Optional[int]

get_patchname_from_index(idx)#

Gives the patch name of the image at the specified index. May return invalid names (names that are not actually loadable because they are not part of the lmdb file) if the name is included in the metadata file(s).

Parameters:

idx (int) – index of an image

Returns:

patch name of the image or None, if the index is invalid

Return type:

Optional[str]

Datamodule for BigEarthNet dataset. Files can be requested by contacting the author. Original Paper of Image Data: TO BE PUBLISHED https://bigearth.net/

class configilm.extra.DataModules.BENv2_DataModule.BENv2DataModule#
__init__(data_dirs, batch_size=16, img_size=(3, 120, 120), num_workers_dataloader=4, shuffle=None, max_len=None, pin_memory=None, patch_prefilter=None, train_transforms=None, eval_transforms=None)#

This datamodule is designed to work with the BigEarthNet dataset. It is a multi-label classification dataset. The dataset is split into train, validation and test sets. The datamodule provides dataloaders for each of these sets.

Parameters:
  • data_dirs (Mapping[str, Union[str, Path]]) – A mapping from file key to file path. Required keys are “images_lmdb”, “metadata_parquet” and “metadata_snow_cloud_parquet”. The “images_lmdb” key is used to identify the lmdb file that contains the images. The “metadata_” keys are used to identify the parquet files that contain the metadata. The metadata files contain information about the images, such as the labels, split and cloud and snow info. Note, that the lmdb file is encoded using the RICO-HDL Encoder and contains images in the form of safe files.

  • batch_size (int) –

    The batch size to use for the dataloaders.

    default:

    16

  • img_size (tuple) –

    The size of the images. Note that this includes the number of channels. For example, if the images are RGB images, the size should be (3, h, w). See BEN_DataSet.avail_chan_configs for available channel configurations.

    default:

    (3, 120, 120)

  • num_workers_dataloader (int) –

    The number of workers to use for the dataloaders.

    default:

    4

  • shuffle (Optional[bool]) –

    Whether to shuffle the data. If None is provided, the data is shuffled for training and not shuffled for validation and test.

    default:

    None

  • max_len (Optional[int]) –

    The maximum number of images to use. If None or -1 is provided, all images are used. Applies per split.

    default:

    None

  • pin_memory (Optional[bool]) –

    Whether to use pinned memory for the dataloaders. If None is provided, it is set to True if a GPU is available and False otherwise.

    default:

    None

  • patch_prefilter (Optional[Callable[[str], bool]]) –

    A callable that is used to filter out images. If None is provided, no filtering is applied. The callable should take a string as input and return a boolean. If the callable returns True, the image is included in the dataset, otherwise it is excluded.

    default:

    None

  • train_transforms (Optional[Callable]) –

    A callable that is used to transform the training images. If None is provided, the default train transform is used, that consists of resizing, horizontal and vertical flipping and normalization.

    default:

    None

  • eval_transforms (Optional[Callable]) –

    A callable that is used to transform the evaluation images. If None is provided, the default eval transform is used, that consists of resizing and normalization.

    default:

    None

setup(stage=None)#

Prepares the data sets for the specific stage.

  • “fit”: train and validation data set

  • “test”: test data set

  • None: all data sets

Prints the time it needed for this operation and other statistics if print_infos is set.

Parameters:

stage (Optional[str]) – None, “fit” or “test”

Return type:

None

test_dataloader()#

Create a Dataloader according to the specification in the __init__ call. Uses the test set and expects it to be set (e.g. via setup() call)

Returns:

torch DataLoader for the test set

train_dataloader()#

Create a Dataloader according to the specification in the __init__ call. Uses the train set and expects it to be set (e.g. via setup() call)

Returns:

torch DataLoader for the train set

val_dataloader()#

Create a Dataloader according to the specification in the __init__ call. Uses the validation set and expects it to be set (e.g. via setup() call)

Returns:

torch DataLoader for the validation set