BigEarthNet v1.0#

This page describes the usage of Dataloader and Datamodule for BigEarthNet v1.0, a multi-spectral multilabel Remote Sensing Land-Use/Land-Cover classification dataset.

The official paper of the BigEarthNet v1.0 (BigEarthNet-S2) dataset was initially published in Sumbul et al. [6] and updated to multi-modal BigEarthNet v1.0 in Sumbul et al. [7].

For detailed information on the dataset itself please refer to the publications and the BigEarthNet Guide.

The dataset is divided into two modules which contains two classes, a standard torch.util.data.Dataset and a pytorch_lightning.LightningDataModule that encapsulates the Dataset for easy use in pytorch_lightning applications. The Dataset uses a BENv1LMDBReader to read images and labels from a LMDB file. Labels are returned in their 19-label version as one-hot vector.

BENDataSet#

In its most basic form, the Dataset only needs the base path of the LMDB file and csv files. Note, that from an os point of view, LMDB files are folders. This Dataset will load 12 channels (10m + 20m Sentinel-2 + 10m Sentinel-1).

The full data path structure expected is

datapath = {
    "images_lmdb": "/path/to/BigEarthNetEncoded.lmdb",
    "train_data": "/path/to/train.csv",
    "val_data": "/path/to/val.csv",
    "test_data": "/path/to/test.csv"
}

Note, that the keys have to match exactly while the paths can be selected freely.

from configilm import util
util.MESSAGE_LEVEL = util.MessageLevel.INFO  # use INFO to see all messages

from configilm.extra.DataSets import BENv1_DataSet
from configilm.extra.DataModules import BENv1_DataModule

ds = BENv1_DataSet.BENv1DataSet(
    data_dirs=my_data_path  # path set by to dataset
)

img, lbl = ds[26]
img = img[:3]  # only choose RGB channels
[INFO]    Loading BEN data for None...
[INFO]        30 patches indexed
[INFO]        30 pre-filtered patches indexed
[INFO]        30 filtered patches indexed
Size: torch.Size([3, 120, 120])
Labels:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        1.])
../../_images/316d9d51aa4f45f0ece1408ce3953241b8456c69ee110254b6092b3cf8f6ac9a.png

Selecting Bands#

The Dataset also supports different channel configurations, however, setting the selected channels is only supported via image size selection and only limited combinations are available. To see the available combinations call BENv1_DataSet.BENv1DataSet.get_available_channel_configurations(). Alternatively, a faulty configuration will display the possibilities as well whilst raising an AssertionError.

The configurations are working like setting the respective number as bands parameter in the LMDBReader.

BENv1_DataSet.BENv1DataSet.get_available_channel_configurations()
[HINT]    Available channel configurations are:
[HINT]          2 -> Sentinel-1
[HINT]          3 -> RGB
[HINT]          4 -> 10m Sentinel-2
[HINT]         10 -> 10m + 20m Sentinel-2
[HINT]         12 -> 10m + 20m Sentinel-2 + 10m Sentinel-1

Splits#

It is possible to load only a specific split ('train', 'val' or 'test') in the dataset. The images loaded are specified using the csv files specified in the data_dirs parameter. By default (None), all three splits are loaded into the same Dataset.

_ = BENv1_DataSet.BENv1DataSet(
    data_dirs=my_data_path,  # path set by to dataset
    split="train"
)
[INFO]    Loading BEN data for train...
[INFO]        10 patches indexed
[INFO]        10 pre-filtered patches indexed
[INFO]        10 filtered patches indexed

Restricting the number of loaded images#

It is also possible to restrict the number of images indexed. By setting max_img_idx = n only the first n images (in alphabetical order based on their S2-name) will be loaded. A max_img_idx of None, -1 or larger than the number of images in the csv file(s) (in this case 10) equals to load-all-images behaviour.

_ = BENv1_DataSet.BENv1DataSet(
    data_dirs=my_data_path,  # path set by to dataset
    split="train",
    max_len=5
)
[INFO]    Loading BEN data for train...
[INFO]        10 patches indexed
[INFO]        10 pre-filtered patches indexed
[INFO]        10 filtered patches indexed
_ = BENv1_DataSet.BENv1DataSet(
    data_dirs=my_data_path,  # path set by to dataset
    split="train",
    max_len=100
)
[INFO]    Loading BEN data for train...
[INFO]        10 patches indexed
[INFO]        10 pre-filtered patches indexed
[INFO]        10 filtered patches indexed

BENDataModule#

This class is a Lightning Data Module, that wraps the BENv1_DataSet. It automatically generates DataLoader per split with augmentations, shuffling, etc., depending on the split. All images are resized and normalized and images in the train set additionally basic-augmented via noise and flipping/rotation. The train split is also shuffled, however this can be overwritten (see below).

To use a DataModule, the setup() function has to be called. This populates the Dataset splits inside the DataModule. Depending on the stage ('fit', 'test' or None), the setup will prepare only train & validation Dataset, only test Dataset or all three.

dm = BENv1_DataModule.BENv1DataModule(
    data_dirs=my_data_path  # path set by to dataset
)
print("Before:")
print(dm.train_ds)
print(dm.val_ds)
print(dm.test_ds)

print("\n=== SETUP ===")
dm.setup(stage="fit")
print("=== END SETUP ===\n")
print("After:")
print(dm.train_ds)
print(dm.val_ds)
print(dm.test_ds)
[WARNING] Using default train transform.
[WARNING] Using default eval transform.
Before:
None
None
None

=== SETUP ===
[INFO]    Loading BEN data for train...
[INFO]        10 patches indexed
[INFO]        10 pre-filtered patches indexed
[INFO]        10 filtered patches indexed
[INFO]    Loading BEN data for val...
[INFO]        10 patches indexed
[INFO]        10 pre-filtered patches indexed
[INFO]        10 filtered patches indexed
[INFO]      Total training samples:       10  Total validation samples:       10
=== END SETUP ===

After:
<configilm.extra.DataSets.BENv1_DataSet.BENv1DataSet object at 0x75c137134e50>
<configilm.extra.DataSets.BENv1_DataSet.BENv1DataSet object at 0x75c13192d3f0>
None

Afterwards the pytorch DataLoader can be easily accessed. Note, that \(len(DL) = \lceil \frac{len(DS)}{batch\_size} \rceil\), therefore here with the default batch_size of 16: 10/16 -> 1.

train_loader = dm.train_dataloader()
print(len(train_loader))
1

The DataModule has in addition to the DataLoader settings a parameter each for data_dir, image_size and max_img_idx which are passed through to the DataSet.

DataLoader settings#

The DataLoader have three settable parameters: batch_size, num_workers_dataloader and shuffle with 16, os.cpu_count() / 2 and None as their default values. A shuffle of None means, that the train set is shuffled but validation and test are not. Changing this setting will be accompanied by a Message-Hint printed.

Not changeable is the usage of pinned memory, which is set to True if a cuda-enabled device is found and False otherwise.

dm = BENv1_DataModule.BENv1DataModule(
    data_dirs=my_data_path,  # path set by to dataset
    batch_size=4
)
print("\n=== SETUP ===")
dm.setup(stage="fit")
print("=== END SETUP ===\n")
print(len(dm.train_dataloader()))
[WARNING] Using default train transform.
[WARNING] Using default eval transform.

=== SETUP ===
[INFO]    Loading BEN data for train...
[INFO]        10 patches indexed
[INFO]        10 pre-filtered patches indexed
[INFO]        10 filtered patches indexed
[INFO]    Loading BEN data for val...
[INFO]        10 patches indexed
[INFO]        10 pre-filtered patches indexed
[INFO]        10 filtered patches indexed
[INFO]      Total training samples:       10  Total validation samples:       10
=== END SETUP ===

3
_ = BENv1_DataModule.BENv1DataModule(
    data_dirs=my_data_path,  # path set by to dataset
    shuffle=False
)
[WARNING] Shuffle was set to False. This is not recommended for most configuration. Use shuffle=None (default) for recommended configuration.
[WARNING] Using default train transform.
[WARNING] Using default eval transform.
_ = BENv1_DataModule.BENv1DataModule(
    data_dirs=my_data_path,  # path set by to dataset
    num_workers_dataloader=2
)
[WARNING] Using default train transform.
[WARNING] Using default eval transform.