COCO-QA#
This page describes the usage of Dataloader and Datamodule for COCO-QA, a VQA dataset based on the COCO Dataset. It was first published in Ren et al. [5].
COCOQA DataSet#
In its most basic form, the Dataset
only needs the path of the data, if the path is not “./”.
The full folder structure expected at this path is
.
├── images
│ ├── COCO_train2014_<id a>.jpg
│ ├── COCO_train2014_<id b>.jpg
│ ├── ...
│ ├── COCO_train2014_<id i>.jpg
│ ├── COCO_val2014_<id j>.jpg
│ ├── ...
│ └── COCO_val2014_<id z>.jpg
├── COCO-QA_QA_test.json
└── COCO-QA_QA_train.json
from configilm.extra.DataSets import COCOQA_DataSet
ds = COCOQA_DataSet.COCOQADataSet(
data_dirs=my_data_path, # path to dataset
)
img, question, answer = ds[0]
Loading COCOQA data for None...
20 QA-pairs indexed
20 QA-pairs used
/home/runner/work/ConfigILM/ConfigILM/configilm/extra/DataSets/ClassificationVQADataset.py:144: UserWarning: No tokenizer was provided, using BertTokenizer (uncased). This may result in very bad performance if the used network expected other tokens
warn(
Size: torch.Size([3, 120, 120])
Question (start): [101, 1996, 2158, 4755, 2054, 3216, 2046, 1996, 4153, 102, 0, 0, 0, 0, 0]
Answer (start): tensor([1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
As we can see, this Dataset uses a tokenizer to generate the Question out of a natural language text. If no tokenizer is provided, a default one will be used, however this may lead to bad performance if not accounted for. The tokenizer can be configured as input parameter.
from configilm.ConfigILM import _get_hf_model
tokenizer, _ = _get_hf_model("prajjwal1/bert-tiny")
ds = COCOQA_DataSet.COCOQADataSet(
data_dirs=my_data_path, # path to dataset
tokenizer=tokenizer
)
img, question, answer = ds[0]
Loading COCOQA data for None...
20 QA-pairs indexed
20 QA-pairs used
Other parameters are split
(“train” or “test”), transform
for image transformations, max_img_idx
to limit the number of used images, img_size
(channels should be 3, high, width) and seq_length
of the tokenized question.
tokenizer, _ = _get_hf_model("prajjwal1/bert-tiny")
ds = COCOQA_DataSet.COCOQADataSet(
data_dirs=my_data_path, # path to dataset
tokenizer=tokenizer,
img_size=(3, 200, 100),
max_len=5,
seq_length=32,
transform=None,
split="train"
)
img, question, answer = ds[0]
Loading COCOQA data for train...
10 QA-pairs indexed
5 QA-pairs used
Size: torch.Size([3, 200, 100])
Question (start): [101, 2054, 2003, 1996, 3609, 1997, 1996, 2839, 102, 0, 0, 0, 0, 0, 0]
Answer (start): tensor([1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])