Throughput#
- class configilm.extra.DataSets.ThroughputTest_DataSet.VQAThroughputTestDataset#
- __init__(data_dirs, split=None, transform=None, max_len=None, img_size=(3, 256, 256), selected_answers=None, num_classes=1000, tokenizer=None, seq_length=64, return_extras=False, num_samples=1000)#
This class implements the ThroughputTest dataset. It is a subclass of ClassificationVQADataset and provides some dataset specific functionality.
- Parameters:
data_dirs (Mapping[str, Path]) – A mapping from file key to file path. Is ignored, as the dataset does not use any real files but included for compatibility.
split (Optional[str]) –
The name of the split to use. Is ignored, as the dataset does not use any real splits but included for compatibility with other
- default:
None
transform (Optional[Callable]) –
A callable that is used to transform the images after loading them. Is ignored, as the dataset does not use any real images but included for compatibility with other datasets.
- default:
None
max_len (Optional[int]) –
The maximum number of qa-pairs to use. If None or -1 is provided, all qa-pairs are used. “all” is defined in num_samples.
- default:
None
img_size (tuple) –
The size of the images.
- default:
(3, 120, 120)
selected_answers (Optional[list]) –
A list of answers that should be used. If None is provided, the num_classes most common answers are used. If selected_answers is not None, num_classes is ignored. Only used to infer the number of classes.
- default:
None
num_classes (Optional[int]) –
The number of classes to use. Only used if selected_answers is None. If set to None, all answers are used.
- default:
430
tokenizer (Optional[Callable]) –
A callable that is used to tokenize the questions. Is ignored, as the dataset does not use any real questions but included for compatibility with other datasets.
- default:
None
seq_length (int) –
The exact length of the tokenized questions.
- default:
64
return_extras (bool) –
Is ignored, as the dataset does not use any real data but included for compatibility with other datasets.
- default:
False
num_samples (int) –
The number of samples to simulate.
- default:
1000
- load_image(key)#
This method should load the image with the given name and return it as a tensor.
- Parameters:
key (str) – The name of the image to load
- Returns:
The image as a tensor
- Return type:
Tensor
- prepare_split(split)#
This method should return a list of tuples, where each tuple contains the following elements:
The key of the image at index 0
The question at index 1
The answer at index 2
additional information at index 3 and higher
- Parameters:
split (str) – The name of the split to prepare
- Returns:
A list of tuples, each tuple containing the elements described
- split_names()#
Returns the names of the splits that are available for this dataset. The default implementation returns {“train”, “val”, “test”}. If you want to use different names, you should override this method.
- Returns:
A set of strings, each string being the name of a split
- Return type:
set[str]
- configilm.extra.DataSets.ThroughputTest_DataSet.resolve_data_dir(data_dir, allow_mock=False, force_mock=False)#
Helper function that tries to resolve the correct directory
- Parameters:
data_dir (Optional[Mapping[str, Path]]) – current path that is suggested
allow_mock (bool) –
if True, mock data will be used if no real data is found
- Default:
False
force_mock (bool) –
if True, only mock data will be used
- Default:
False
- Returns:
a dict with all paths to the data
- Return type:
Mapping[str, Union[str, Path]]
Dataloader and Datamodule for ThroughputTest dataset.
- class configilm.extra.DataModules.ThroughputTest_DataModule.VQAThroughputTestDataModule#
- __init__(data_dirs, batch_size=16, img_size=(3, 256, 256), num_workers_dataloader=4, shuffle=None, max_len=None, tokenizer=None, seq_length=64, pin_memory=None, num_samples=1000, num_classes=1000)#
This class implements the DataModule for the ThroughputTest dataset.
- Parameters:
data_dirs (Mapping[str, Path]) – A mapping from file key to file path. Is ignored for this dataset.
batch_size (int) –
The batch size to use for the dataloaders.
- default:
16
img_size (tuple) –
The size of the images.
- default:
(3, 256, 256)
num_workers_dataloader (int) –
The number of workers to use for the dataloaders.
- default:
4
shuffle (Optional[bool]) –
Whether to shuffle the data in the dataloaders. If None is provided, the data is shuffled for training and not shuffled for validation and test. However, for this dataset, all the data is the same, so this parameter has no effect.
- default:
None
max_len (Optional[int]) –
The maximum number of qa-pairs to use. If None or -1 is provided, all qa-pairs are used. “all” is defines in num_samples.
- default:
None
tokenizer (Optional[Callable]) –
A callable that is used to tokenize the questions. Is ignored, as the dataset does not use any real questions but included for compatibility with other datasets.
- default:
None
seq_length (int) –
The maximum length of the tokenized questions. If the tokenized question is longer than this, it will be truncated. If it is shorter, it will be padded.
- default:
64
pin_memory (Optional[bool]) –
Whether to use pinned memory for the dataloaders. If None is provided, it is set to True if a GPU is available and False otherwise.
- default:
None
num_samples (int) –
The number of samples to simulate per split.
- default:
1000
num_classes (int) –
The number of classes in the dataset.
- default:
1000
- setup(stage=None)#
Prepares the data sets for the specific stage.
“fit”: train and validation data set
“test”: test data set
None: all data sets
- Parameters:
stage (Optional[str]) –
None, “fit”, or “test”
- default:
None
- test_dataloader()#
Returns the dataloader for the test data.
- Raises:
AssertionError if the test dataset is not set up. This can happen if the setup() method is not called before this method or the dataset has no test data.
- train_dataloader()#
Returns the dataloader for the training data.
- Raises:
AssertionError if the training dataset is not set up. This can happen if the setup() method is not called before this method or the dataset has no training data.
- val_dataloader()#
Returns the dataloader for the validation data.
- Raises:
AssertionError if the validation dataset is not set up. This can happen if the setup() method is not called before this method or the dataset has no validation data.