COCO-QA#

This page describes the usage of Dataloader and Datamodule for COCO-QA, a VQA dataset based on the COCO Dataset. It was first published in Ren et al. [5].

COCOQA DataSet#

In its most basic form, the Dataset only needs the path of the data, if the path is not “./”. The full folder structure expected at this path is

.
├── images
│   ├── COCO_train2014_<id a>.jpg
│   ├── COCO_train2014_<id b>.jpg
│   ├── ...
│   ├── COCO_train2014_<id i>.jpg
│   ├── COCO_val2014_<id j>.jpg
│   ├── ...
│   └── COCO_val2014_<id z>.jpg
├── COCO-QA_QA_test.json
└── COCO-QA_QA_train.json
from configilm.extra.DataSets import COCOQA_DataSet

ds = COCOQA_DataSet.COCOQADataSet(
    data_dirs=my_data_path,  # path to dataset
)

img, question, answer = ds[0]
Loading COCOQA data for None...
          20 QA-pairs indexed
          20 QA-pairs used
/home/runner/work/ConfigILM/ConfigILM/configilm/extra/DataSets/ClassificationVQADataset.py:144: UserWarning: No tokenizer was provided, using BertTokenizer (uncased). This may result in very bad performance if the used network expected other tokens
  warn(
Size: torch.Size([3, 120, 120])
Question (start): [101, 1996, 2158, 4755, 2054, 3216, 2046, 1996, 4153, 102, 0, 0, 0, 0, 0]
Answer (start): tensor([1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
../../_images/57485edde41fc7f199db06e31fac17000c56cbe3922e4e1fe980bb5ebda325c1.png

As we can see, this Dataset uses a tokenizer to generate the Question out of a natural language text. If no tokenizer is provided, a default one will be used, however this may lead to bad performance if not accounted for. The tokenizer can be configured as input parameter.

from configilm.ConfigILM import _get_hf_model

tokenizer, _ = _get_hf_model("prajjwal1/bert-tiny")

ds = COCOQA_DataSet.COCOQADataSet(
    data_dirs=my_data_path,  # path to dataset
    tokenizer=tokenizer
)
img, question, answer = ds[0]
Loading COCOQA data for None...
          20 QA-pairs indexed
          20 QA-pairs used

Other parameters are split (“train” or “test”), transform for image transformations, max_img_idx to limit the number of used images, img_size (channels should be 3, high, width) and seq_length of the tokenized question.

tokenizer, _ = _get_hf_model("prajjwal1/bert-tiny")

ds = COCOQA_DataSet.COCOQADataSet(
    data_dirs=my_data_path,  # path to dataset
    tokenizer=tokenizer,
    img_size=(3, 200, 100),
    max_len=5,
    seq_length=32,
    transform=None,
    split="train"
)
img, question, answer = ds[0]
Loading COCOQA data for train...
          10 QA-pairs indexed
           5 QA-pairs used
Size: torch.Size([3, 200, 100])
Question (start): [101, 2054, 2003, 1996, 3609, 1997, 1996, 2839, 102, 0, 0, 0, 0, 0, 0]
Answer (start): tensor([1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
../../_images/7345f58f11aceb8b7b4cb4c292a1b200d7f509c88850e33a75f5acf5b042c462.png