intent_recognition module

The intent_recognition module contains the IntentRecognitionLearner class and can be used to recognize 20 intents of a person based on text. It is recommended to use IntentRecognitionModule together with SpeechTranscriptionModule to enable intent recognition based on transcribed speech. The module supports multimodal training on face (vision), speech (audio), and text data to facilitate improved unimodal inference on text modality.

We provide data processing scripts and pre-trained model for MIntRec dataset. The class labels correspond to the following intent categories: 0 - Complain, 1 - Praise, 2 - Apologise, 3 - Thank, 4 - Criticize, 5 - Agree, 6 - Taunt, 7 - Flaunt, 8 - Joke, 9 - Oppose, 10 - Comfort, 11 - Care, 12 - Inform, 13 - Advise, 14 - Arrange, 15 - Introduce, 16 - Leave, 17 - Prevent, 18 - Greet, 19 - Ask for help.

Class IntentRecognitionLearner

The learner has the following public methods:

`IntentRecognitionLearner` constructor

IntentRecognitionLearner(self, text_backbone, mode, log_path, cache_path, results_path, output_path, device, benchmark)

Constructor parameters:

text_backbone: {"bert-base-uncased", "albert-base-v2", "prajjwal1/bert-small", "prajjwal1/bert-mini", "prajjwal1/bert-tiny"}, default="bert-base-uncased"
Specifies the text backbone to be used. The name matches the corresponding huggingface hub model, e.g., prajjwal1/bert-small.
mode: {'language', 'joint'}, default="joint"
Specifies the modality of the model. 'Language' corresponds to text-only model, 'Joint' corresponds to multimodal model with vision, audio, and language modalities trained jointly.
log_path: str, default="logs"
Specifies the path where to store the logs.
cache_path: str, default="cache"
Specifies the path for cache, mainly used for tokenizer files.
results_path: str, default="results"
Specifies where to store the results (performance metrics).
output_path: str, default="outputs"
Specifies where to store the outputs: trained models, predictions, etc.
device: str, default="cuda"
Specifies the device to be used for training.
benchmark: {"MIntRec"}, default="MIntRec"
Specifies the benchmark (dataset) to be used for training. The benchmark defines the class labels, feature dimensionalities, etc.

`IntentRecognitionLearner.fit`

IntentRecognitionLearner.fit(self, dataset, val_dataset, verbose, silent)

This method is used for training the algorithm on a training dataset and validating on a validation dataset.

Parameters:

dataset: object
Object that holds the training dataset.
val_dataset : object, default=None
Object that holds the validation dataset.
verbose : bool, default=False
Enables verbosity.
silent : bool, default=False
Enables training in the silent mode, i.e., only critical output is produced.

`IntentRecognitionLearner.eval`

IntentRecognitionLearner.eval(self, dataset, modality, verbose, silent, restore_best_model)

This method is used to evaluate a trained model on an evaluation dataset.

Parameters:

dataset : object
Object that holds the evaluation dataset.
modality: str, {'audio', 'video', 'language', 'joint'}
Specifies the modality to be used for inference. Should either match the current training mode of the learner, or for a learner trained in joint (multimodal) mode, any modality can be used for inference, although we do not recommend using only video or only audio.
verbose: bool, default=False
If True, provides detailed logs.
silent: bool, default=False
If True, run in silent mode, i.e., with only critical output.
restore_best_model : bool, default=False
If True, best model according to performance on validation set will be loaded from self.output_path. If False, current model state will be evaluated.

`IntentRecognitionLearner.infer`

IntentRecognitionLearner.infer(self, batch, modality)

This method is used to perform inference from given language sequence (text). Returns a list of engine.target.Category objects, which contains calss predictions and confidence scores for each sentence in the input sequence.

Parameters:

batch: dict
Dictionary with input data with keys corresponding to modalities, e.g. {'text': 'Hello'}.
modality: str, default='language'
Modality to be used for inference. Currently, inference from raw data is only supported for language modality (text).

`IntentRecognitionLearner.save`

IntentRecognitionLearner.save(self, path)

This method is used to save a trained model.

Parameters:

path: str
Path to save the model.

`IntentRecognitionLearner.load`

IntentRecognitionLearner.load(self, path)

This method is used to load a previously saved model.

Parameters:

path: str
Path of the model to be loaded.

`IntentRecognitionLearner.download`

IntentRecognitionLearner.download(self, path)

Downloads the provided pretrained model into 'path'.

Parameters:

path: str
Specifies the folder where data will be downloaded.

`IntentRecognitionLearner.trim`

IntentRecognitionLearner.trim(self, modality)

This method is used to convert a model trained in a multimodal manner ('joint' mode) for unimodal inference. This will drop unnecessary layers corresponding to other modalities for computational efficiency.

Parameters:

modality: str, default='language'
Modality to which to convert the model

Examples

Additional configuration parameters/hyperparameters can be specified in intent_recognition_learner/algorithm/configs/mult_bert.py.

Training, evaluation and inference example

from opendr.perception.multimodal_human_centric import IntentRecognitionLearner
from opendr.perception.multimodal_human_centric.intent_recognition_learner.algorithm.data.mm_pre import MIntRecDataset

if __name__ == '__main__':
  # Initialize the multimodal learner
  learner = IntentRecognitionLearner(text_backbone='bert-base-uncased', mode='joint', log_path='logs', cache_path='cache', results_path='results', output_path='outputs')

  # Initialize datasets
  train_dataset = MIntRecDataset(data_path='/path/to/data/', video_data_path='/path/to/video', audio_data_path='/path/to/audio', text_backbone='bert-base-uncased', split='train')
  val_dataset = MIntRecDataset(data_path='/path/to/data/', video_data_path='/path/to/video', audio_data_path='/path/to/audio', text_backbone='bert-base-uncased', split='dev')
  test_dataset = MIntRecDataset(data_path='/path/to/data/', video_data_path='/path/to/video', audio_data_path='/path/to/audio', text_backbone='bert-base-uncased', split='test')

  # Train the model
  learner.fit(dataset, val_dataset, silent=False, verbose=True)

  # Evaluate the best according to validation set model on multimodal input
  out = learner.eval(test_dataset, 'joint', restore_best_model=True)

  # Evaluate the best according to validation set model on text-only input
  out_l = learner.eval(test_dataset, 'language', restore_best_model=True)

  # Keep only the text-specific layers of the model and drop the rest
  learner.trim('language')

  # Evaluate the trimmed model. Should produce the same result as out_l.
  out_l_2 = learner.eval(test_dataset, 'language', restore_best_model=False)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

intent-recognition-learner.md

intent-recognition-learner.md

intent_recognition module

Class IntentRecognitionLearner

`IntentRecognitionLearner` constructor

`IntentRecognitionLearner.fit`

`IntentRecognitionLearner.eval`

`IntentRecognitionLearner.infer`

`IntentRecognitionLearner.save`

`IntentRecognitionLearner.load`

`IntentRecognitionLearner.download`

`IntentRecognitionLearner.trim`

Examples

Files

intent-recognition-learner.md

Latest commit

History

intent-recognition-learner.md

File metadata and controls

intent_recognition module

Class IntentRecognitionLearner

IntentRecognitionLearner constructor

IntentRecognitionLearner.fit

IntentRecognitionLearner.eval

IntentRecognitionLearner.infer

IntentRecognitionLearner.save

IntentRecognitionLearner.load

IntentRecognitionLearner.download

IntentRecognitionLearner.trim

Examples

`IntentRecognitionLearner` constructor

`IntentRecognitionLearner.fit`

`IntentRecognitionLearner.eval`

`IntentRecognitionLearner.infer`

`IntentRecognitionLearner.save`

`IntentRecognitionLearner.load`

`IntentRecognitionLearner.download`

`IntentRecognitionLearner.trim`