Fusion Methods#

class configilm.Fusion.BlockFusion.Block#
__init__(input_dims, output_dim, mm_dim=1600, chunks=20, rank=15, shared=False, dropout_input=0.0, dropout_pre_lin=0.0, dropout_output=0.0, pos_norm='before_cat')#

Initializes internal Module state of Block Fusion of “BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection”. Limits complexity of intermediate states by dividing the intermediate dimension into chunks (blocks). Inverse bottleneck expansion of multi-modal fusion is defined by rank.

Parameters:
  • input_dims (Sequence[int]) – Sequence of dimensions of different inputs. Only the first two dimensions are used

  • output_dim (int) – Dimension of output tensor

  • mm_dim (int) – intermediate multi-modal dimension

  • chunks (int) – number of chunks the intermediate dimension will be divided into. Has to be smaller or equal then mm_dim. If mm_dim is not divisible by chunks, the last chunk will be smaller

  • rank (int) – Rank of input merging matrix, factor to the size calculated by mm_dim/chunks

  • shared (bool) – flag if the input mappings are shared between both inputs. Only works, if all input_dims are equal

  • dropout_input (float) – Dropout rate of the inputs

  • dropout_pre_lin (float) – Dropout rate before linear mapping

  • dropout_output (float) – Dropout rate before the output

  • pos_norm (str) – position of normalization, has to be “before_cat” or “after_cat”

Returns:

Block-Fusion torch.nn module

forward(input_0, input_1)#

Forward call to Block-Fusion

Parameters:
  • input_0 (Tensor) – first modality input

  • input_1 (Tensor) – second modality input

Returns:

multi modality output

Return type:

Tensor

class configilm.Fusion.BlockTuckerFusion.BlockTucker#
__init__(input_dims, output_dim, mm_dim=1600, chunks=20, shared=False, dropout_input=0.0, dropout_pre_lin=0.0, dropout_output=0.0, pos_norm='before_cat')#

Initializes internal Module state of Block Tucker Fusion of “BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection”. Uses Tucker decomposition for tensor complexity restriction instead of rank limitation while limiting the intermediate state complexity via division into chunks (blocks).

Parameters:
  • input_dims (Sequence[int]) – Sequence of dimensions of different inputs. Only the first two dimensions are used

  • output_dim (int) – Dimension of output tensor

  • mm_dim (int) – intermediate multi-modal dimension

  • chunks (int) – number of chunks the intermediate dimension will be divided into. Has to be smaller or equal then mm_dim. If mm_dim is not divisible by chunks, the last chunk will be smaller

  • shared (bool) – flag if the input mappings are shared between both inputs. Only works, if all input_dims are equal

  • dropout_input (float) – Dropout rate of the inputs

  • dropout_pre_lin (float) – Dropout rate before linear mapping

  • dropout_output (float) – Dropout rate before the output

  • pos_norm (str) – position of normalization, has to be “before_cat” or “after_cat”

Returns:

Block-Tucker-Fusion torch.nn module

forward(input_0, input_1)#

Forward call to Block-Tucker-Fusion

Parameters:
  • input_0 (Tensor) – first modality input

  • input_1 (Tensor) – second modality input

Returns:

multi modality output

Return type:

Tensor

class configilm.Fusion.ConcatMLPFusion.ConcatMLP#
__init__(input_dims, output_dim, dimensions=None, activation='relu', dropout=0.0)#

Initializes internal Module state of ConcatMLP Fusion. Concatenates all inputs and passes them through linear layers defined by dimensions and output_dim. Activation and dropout is applied after every linear layer.

Parameters:
  • input_dims (Iterable[int]) – Sequence of dimensions of different inputs. Only the first two dimensions are used

  • output_dim (int) – Dimension of output tensor

  • dimensions (Optional[List[int]]) – list of intermediate dimensions, set to [500, 500] if not set manually

  • activation (str) – name of activation function from torch.nn.functional

  • dropout (float) – Dropout rate after every activation

Returns:

ConcatMLP-Fusion torch.nn module

forward(input_0, input_1)#

Forward call to ConcatMLP-Fusion

Parameters:
  • input_0 (Tensor) – first modality input

  • input_1 (Tensor) – second modality input

Returns:

multi modality output

Return type:

Tensor

class configilm.Fusion.LinearSumFusion.LinearSum#
__init__(input_dims, output_dim, mm_dim=1200, activ_input='relu', activ_output='relu', normalize=False, dropout_input=0.0, dropout_pre_lin=0.0, dropout_output=0.0)#

Initializes internal Module state of LinearSum-Fusion. Passes both modalities independent of each other through a linear layer to map into a common dimension. Results are added (point-wise) and mapped to the output dimension. Normalization, dropout and activations are applied if set.

Parameters:
  • input_dims (Sequence[int]) – Sequence of dimensions of different inputs. Only the first two dimensions are used

  • output_dim (int) – Dimension of output tensor

  • mm_dim (int) – intermediate multi-modal dimension

  • activ_input (str) – activation function after the first linear layer

  • activ_output (str) – activation function after the second linear layer

  • normalize (bool) – flag if normalization should be applied or not

  • dropout_input (float) – Dropout rate of the inputs

  • dropout_pre_lin (float) – Dropout rate before linear mapping

  • dropout_output (float) – Dropout rate before the output

Returns:

LinearSum-Fusion torch.nn module

forward(input_0, input_1)#

Forward call to Pointwise-Linear-Addition-Fusion

Parameters:
  • input_0 (Tensor) – first modality input

  • input_1 (Tensor) – second modality input

Returns:

multi modality output

Return type:

Tensor

class configilm.Fusion.MFBFusion.MFB#
__init__(input_dims, output_dim, mm_dim=1200, factor=2, activ_input='relu', activ_output='relu', normalize=False, dropout_input=0.0, dropout_pre_norm=0.0, dropout_output=0.0)#

Initializes internal Module state of Multi-Modal Factorized Bilinear Pooling Fusion of “Multi-Modal Factorized Bilinear Pooling With Co-Attention Learning for Visual Question Answering”. Linear mapping of inputs to higher dimension paired with a point-wise multiplication, pooling and output mapping.

Parameters:
  • input_dims (Sequence[int]) – Sequence of dimensions of different inputs. Only the first two dimensions are used

  • output_dim (int) – Dimension of output tensor

  • mm_dim (int) – intermediate multi-modal dimension

  • factor (int) – rank of linear input mappings / pooling rate on output

  • activ_input (str) – activation function after the first linear layer

  • activ_output (str) – activation function after the second linear layer

  • dropout_input (float) – Dropout rate of the inputs

  • dropout_pre_norm (float) – Dropout rate before normalization

  • dropout_output (float) – Dropout rate before the output

  • normalize (bool) – flag if normalization should be applied or not

Returns:

MFB-Fusion torch.nn module

forward(input_0, input_1)#

Forward call to Multi-Modal Factorized Bilinear Pooling Fusion

Parameters:
  • input_0 (Tensor) – first modality input

  • input_1 (Tensor) – second modality input

Returns:

multi modality output

Return type:

Tensor

class configilm.Fusion.MFHFusion.MFH#
__init__(input_dims, output_dim, mm_dim=1200, factor=2, activ_input='relu', activ_output='relu', normalize=False, dropout_input=0.0, dropout_pre_norm=0.0, dropout_output=0.0)#

Initializes internal Module state of Generalized Multimodal Factorized High-order Pooling Fusion of “Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering”. Linear mapping of inputs to higher dimension paired with a point-wise multiplications, poolings, additional linear mappings and an output mapping.

Parameters:
  • input_dims (Sequence[int]) – Sequence of dimensions of different inputs. Only the first two dimensions are used

  • output_dim (int) – Dimension of output tensor

  • mm_dim (int) – intermediate multi-modal dimension

  • factor (int) – rank of linear input mappings / pooling rate on output

  • activ_input (str) – activation function after the first linear layer

  • activ_output (str) – activation function after the second linear layer

  • dropout_input (float) – Dropout rate of the inputs

  • dropout_pre_norm (float) – Dropout rate before normalization

  • dropout_output (float) – Dropout rate before the output

  • normalize (bool) – flag if normalization should be applied or not

Returns:

MFH-Fusion torch.nn module

forward(input_0, input_1)#

Forward call to Generalized Multimodal Factorized High-order Pooling Fusion

Parameters:
  • input_0 (Tensor) – first modality input

  • input_1 (Tensor) – second modality input

Returns:

multi modality output

Return type:

Tensor

class configilm.Fusion.MLBFusion.MLB#
__init__(input_dims, output_dim, mm_dim=1200, activ_input='relu', activ_output='relu', normalize=False, dropout_input=0.0, dropout_pre_lin=0.0, dropout_output=0.0)#

Initializes internal Module state of Multi-Modal Hadamard Product for Low-rank Bilinear Pooling Fusion of “Hadamard Product for Low-rank Bilinear Pooling”. Linear mapping of inputs to higher dimension paired with a point-wise multiplication and output mapping.

Parameters:
  • input_dims (Sequence[int]) – Sequence of dimensions of different inputs. Only the first two dimensions are used

  • output_dim (int) – Dimension of output tensor

  • mm_dim (int) – intermediate multi-modal dimension

  • activ_input (str) – activation function after the first linear layer

  • activ_output (str) – activation function after the second linear layer

  • dropout_input (float) – Dropout rate of the inputs

  • dropout_pre_lin (float) – Dropout rate before linear mapping

  • dropout_output (float) – Dropout rate before the output

  • normalize (bool) – flag if normalization should be applied or not

Returns:

MLB-Fusion torch.nn module

forward(input_0, input_1)#

Forward call to Multi-Modal Hadamard Product for Low-rank Bilinear Pooling Fusion

Parameters:
  • input_0 (Tensor) – first modality input

  • input_1 (Tensor) – second modality input

Returns:

multi modality output

Return type:

Tensor

class configilm.Fusion.MutanFusion.Mutan#
__init__(input_dims, output_dim, mm_dim=1600, rank=15, shared=False, normalize=False, dropout_input=0.0, dropout_pre_lin=0.0, dropout_output=0.0)#

Initializes internal Module state of Multimodal-Tucker-Fusion of “Mutan: Multimodal tucker fusion for visual question answering”. Uses Tucker decomposition for tensor complexity restriction. Inverse bottleneck expansion of multi-modal fusion is defined by rank.

Parameters:
  • input_dims (Sequence[int]) – Sequence of dimensions of different inputs. Only the first two dimensions are used

  • output_dim (int) – Dimension of output tensor

  • mm_dim (int) – intermediate multi-modal dimension

  • rank (int) – Rank of input merging matrix, factor to the size calculated by mm_dim

  • shared (bool) – flag if the input mappings are shared between both inputs. Only works, if all input_dims are equal

  • normalize (bool) – flag if normalization should be applied or not

  • dropout_input (float) – Dropout rate of the inputs

  • dropout_pre_lin (float) – Dropout rate before linear mapping

  • dropout_output (float) – Dropout rate before the output

Returns:

Multimodal-Tucker-Fusion torch.nn module

forward(input_0, input_1)#

Forward call to Multimodal-Tucker-Fusion

Parameters:
  • input_0 (Tensor) – first modality input

  • input_1 (Tensor) – second modality input

Returns:

multi modality output

Return type:

Tensor

class configilm.Fusion.TuckerFusion.Tucker#
__init__(input_dims, output_dim, mm_dim=1600, shared=False, normalize=False, dropout_input=0.0, dropout_pre_lin=0.0, dropout_output=0.0)#

Initializes internal Module state of Tucker Fusion of “Mutan: Multimodal tucker fusion for visual question answering”. Uses Tucker decomposition for tensor complexity restriction.

Parameters:
  • input_dims (Sequence[int]) – Sequence of dimensions of different inputs. Only the first two dimensions are used

  • output_dim (int) – Dimension of output tensor

  • mm_dim (int) – intermediate multi-modal dimension

  • shared (bool) – flag if the input mappings are shared between both inputs. Only works, if all input_dims are equal

  • dropout_input (float) – Dropout rate of the inputs

  • dropout_pre_lin (float) – Dropout rate before linear mapping

  • dropout_output (float) – Dropout rate before the output

  • normalize (bool) – flag if normalization should be applied or not

Returns:

Tucker-Fusion torch.nn module

forward(input_0, input_1)#

Forward call to Tucker-Fusion

Parameters:
  • input_0 (Tensor) – first modality input

  • input_1 (Tensor) – second modality input

Returns:

multi modality output

Return type:

Tensor