HRVQA DataSets and DataModules#

class configilm.extra.DataSets.HRVQA_DataSet.HRVQADataSet#
__init__(data_dirs, split=None, transform=None, max_len=None, img_size=(3, 1024, 1024), selected_answers=None, num_classes=1000, tokenizer=None, seq_length=64, return_extras=False, div_seed=42, split_size=0.5)#

This class implements the HRVQA dataset. It is a subclass of ClassificationVQADataset and is used to load the HRVQA dataset.

Parameters:
  • data_dirs (Mapping[str, Path]) – A mapping of strings to Path objects that contains the paths to the data directories. It should contain the following keys: “images”, “train_data”, “val_data”, “test_data”. The “_data” keys should point to the directory that contains the question and answer json files. Each directory should contain the following files: “{split}_question.json” and “{split}_answer.json”.

  • split (Optional[str]) –

    The split of the dataset to load. It can be one of the following: “train”, “val”, “val-div”, “test-div”, “test”. If None, the train and val splits will be loaded. The “val-div” and “test-div” splits are just the val-split split into two parts. This is done to allow for a standard train/val/test splitting even though the original dataset does not have a test set with public answers. This is also the reason why test is not included in the return value of the split_names method.

    default:

    None

  • transform (Optional[Callable]) –

    A callable that is used to transform the images after loading them. If None, no transformation will be applied.

    default:

    None

  • max_len (Optional[int]) –

    The maximum number of qa-pairs to use. If None or -1 is provided, all qa-pairs are used.

    default:

    None

  • img_size (tuple) –

    The size of the images.

    default:

    (3, 1024, 1024)

  • selected_answers (Optional[list]) –

    A list of answers that should be used. If None is provided, the num_classes most common answers are used. If selected_answers is not None, num_classes is ignored.

    default:

    None

  • num_classes (Optional[int]) –

    The number of classes to use. Only used if selected_answers is None. If set to None, all answers are used.

    default:

    1_000

  • tokenizer (Optional[Callable]) –

    A callable that is used to tokenize the questions. If set to None, the default tokenizer (from configilm.util) is used.

    default:

    None

  • seq_length (int) –

    The maximum length of the tokenized questions. If the tokenized question is longer than seq_length, it will be truncated.

    default:

    64

  • return_extras (bool) –

    If True, the dataset will return the type of the question in addition to the image, question and answer.

    default:

    False

  • div_seed (Union[int, str]) –

    The seed to use for the split of the val-div and test-div splits. If set to “repeat”, the split will be the same full val split for both val-div and test-div. If set to an integer, the split will be different every time the dataset is loaded and the seed will be used to initialize the random number generator. The state of the random number generator will be saved before the split and restored after the split to ensure reproducibility independent of the global random state and also that the global random state is not affected by the split.

    default:

    42

  • split_size (Union[float, int]) –

    The size of the val-div and test-div splits. If set to a float, it should be a value between 0 and 1 and will be interpreted as the fraction of the val split to use for the val-div. The rest of the val split will be used for the test-div. If set to an integer, it will be interpreted as the number of samples to use for the val-div. The rest of the val split will be used for the test-div. If div_seed is set to “repeat”, the split will be the same (full val split) for both val-div and test-div.

    default:

    0.5

load_image(key)#

This method should load the image with the given name and return it as a tensor.

Parameters:

key (str) – The name of the image to load

Returns:

The image as a tensor

Return type:

Tensor

prepare_split(split)#

This method should return a list of tuples, where each tuple contains the following elements:

  • The key of the image at index 0

  • The question at index 1

  • The answer at index 2

  • additional information at index 3 and higher

Parameters:

split (str) – The name of the split to prepare

Returns:

A list of tuples, each tuple containing the elements described

Return type:

list

split_names()#

Returns the names of the splits that are available for this dataset.

Note:

This dataset has actually 5 splits: train, val, val-div, test-div, test. However, the val-div and test-div splits are just the val-split split into two parts. This is done to allow for a standard train/val/test splitting even though the original dataset does not have a test set with public answers. This is also the reason why test is not included in the return value of this method, as the answers for the test set are not public and therefore set to an empty string.

Return type:

set[str]

configilm.extra.DataSets.HRVQA_DataSet.resolve_data_dir(data_dir, allow_mock=False, force_mock=False)#

Helper function that tries to resolve the correct directory for the HRVQA dataset.

Parameters:
  • data_dir (Optional[Mapping[str, Path]]) – Optional path to the data directory. If None, the default data directory will be used.

  • allow_mock (bool) –

    allows mock data path to be returned

    Default:

    False

  • force_mock (bool) –

    only mock data path will be returned. Useful for debugging with small data or if the data is not downloaded yet.

    Default:

    False

Return type:

Mapping[str, Path]

Dataloader and Datamodule for RSVQA LR dataset.

class configilm.extra.DataModules.HRVQA_DataModule.HRVQADataModule#
__init__(data_dirs, batch_size=16, img_size=(3, 1024, 1024), num_workers_dataloader=4, shuffle=None, max_len=None, tokenizer=None, seq_length=64, pin_memory=None, test_splitting_seed=None, test_splitting_size=0.5)#

This class implements the DataModule for the HR-VQA dataset.

Parameters:
  • data_dirs (Mapping[str, Path]) – A mapping of strings to Path objects that contains the paths to the data directories. It should contain the following keys: “images”, “train_data”, “val_data”, “test_data”. The “_data” keys should point to the directory that contains the question and answer json files. Each directory should contain the following files: “{split}_question.json” and “{split}_answer.json”.

  • batch_size (int) –

    The batch size to use for the dataloaders.

    default:

    16

  • img_size (tuple) –

    The size of the images.

    default:

    (3, 1024, 1024)

  • num_workers_dataloader (int) –

    The number of workers to use for the dataloaders.

    default:

    4

  • shuffle (Optional[bool]) –

    Whether to shuffle the data in the dataloaders. If None is provided, the data is shuffled for training and not shuffled for validation and test.

    default:

    None

  • max_len (Optional[int]) –

    The maximum number of qa-pairs to use. If None or -1 is provided, all qa-pairs are used.

    default:

    None

  • tokenizer (Optional[Callable]) –

    A callable that is used to tokenize the questions. If set to None, the default tokenizer (from configilm.util) is used.

    default:

    None

  • seq_length (int) –

    The maximum length of the tokenized questions. If the tokenized question is longer than this, it will be truncated. If it is shorter, it will be padded.

    default:

    64

  • pin_memory (Optional[bool]) –

    Whether to use pinned memory for the dataloaders. If None is provided, it is set to True if a GPU is available and False otherwise.

    default:

    None

  • test_splitting_seed (Optional[Union[str, int]]) –

    The seed to use for the split of the val-div and test-div splits. If set to “repeat”, the split will be the same full val split for both val-div and test-div. If set to an integer, the split will be different every time the dataset is loaded and the seed will be used to initialize the random number generator. The state of the random number generator will be saved before the split and restored after the split to ensure reproducibility independent of the global random state and also that the global random state is not affected by the split.

    default:

    42

  • test_splitting_size (Union[float, int]) –

    The size of the val-div and test-div splits. If set to a float, it should be a value between 0 and 1 and will be interpreted as the fraction of the val split to use for the val-div. The rest of the val split will be used for the test-div. If set to an integer, it will be interpreted as the number of samples to use for the val-div. The rest of the val split will be used for the test-div. If div_seed is set to “repeat”, the split will be the same (full val split) for both val-div and test-div.

    default:

    0.5

setup(stage=None)#

Prepares the data sets for the specific stage.

  • “fit”: train and validation data set

  • “test”: test data set

  • None: all data sets

Parameters:

stage (Optional[str]) –

None, “fit”, or “test”

default:

None

test_dataloader()#

Returns the dataloader for the test data.

Raises:

AssertionError if the test dataset is not set up. This can happen if the setup() method is not called before this method or the dataset has no test data.

train_dataloader()#

Returns the dataloader for the training data.

Raises:

AssertionError if the training dataset is not set up. This can happen if the setup() method is not called before this method or the dataset has no training data.

val_dataloader()#

Returns the dataloader for the validation data.

Raises:

AssertionError if the validation dataset is not set up. This can happen if the setup() method is not called before this method or the dataset has no validation data.