Your tone speaks louder than your face! Modality Order Infused Multi-modal Sarcasm Detection: MO-Sarcation
This repository contains the code for our ACM Multimedia 2023 paper "Your tone speaks louder than your face! Modality Order Infused Multi-modal Sarcasm Detection". In Proceedings of the 31st ACM International Conference on Multimedia (MM ’23), October 29-November 3, 2023, Ottawa, ON, Canada.
Figurative language is an essential component of human communication, and detecting sarcasm in text has become a challenging yet highly popular task in natural language processing. As humans, we rely on a combination of visual and auditory cues, such as facial expressions and tone of voice, to comprehend a message. Our brains are implicitly trained to integrate information from multiple senses to form a complete understanding of the message being conveyed, a process known as multi-sensory integration. The combination of different modalities not only provides additional information but also amplifies the information conveyed by each modality in relation to the others. Thus, the infusion order of different modalities also plays a significant role in multimodal processing. In this paper, we investigate the impact of different modality infusion orders for identifying sarcasm in dialogues. We propose a modality order-driven module integrated into a transformer network, MO-Sarcation that fuses modalities in an ordered manner. Our model outperforms several state-of-the-art models by 1-3% across various metrics, demonstrating the crucial role of modality order in sarcasm detection. The obtained improvements and detailed analysis show that audio tone should be infused with textual content, followed by visual information to identify sarcasm efficiently.
-
Authors: Mohit Tomar, Abhisek Tiwari, Tulika Saha and Sriparna Saha
Equal Contribution - Mohit and Abhisek have contributed equally.
-
Please find the link to download the paper - https://drive.google.com/file/d/1_hU8QB2UkvPt8qFCeGBCBfwLg6vE2ix2/view?usp=sharing
-
For running the model please refer to how_to_run_the_model.txt file.
-
For obtaining the raw data please refer to the following link - https://github.com/soujanyaporia/MUStARD/tree/f45a9e542f9b220c9594264ebb7e87eb8faf0f7d
-
Please find the link for the data used in this work - https://drive.google.com/drive/folders/1S7Tc-8D1JO-oixsgVGiSue8LBy4EpWK4?usp=sharing
If you consider this work useful, please cite it as
@inproceedings{tomar2023your,
title={Your tone speaks louder than your face! Modality Order Infused Multi-modal Sarcasm Detection},
author={Tomar, Mohit and Tiwari, Abhisek and Saha, Tulika and Saha, Sriparna},
booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
pages={3926--3933},
year={2023}
}
This code is adapted from the following Github repository https://github.com/LCS2-IIITD/MAF
For any queries, feel free to contact Mohit Tomar ([email protected])