09/12/2025 at 13.00
Towards leveraging multi-modal data to enhance single-modality classification models: An application to image analysis
Assoc. Prof. Dino Ienco
17 December 2025 at 14:00, E-lecture room (Technological Park, Teslova 30, 1st Floor, Room 38/39)
Towards leveraging multi-modal data to enhance single-modality classification models: An application to image analysis
Assoc. Prof. Dino Ienco (Member, IEEE)
received the M.Sc. and Ph.D. degrees in computer science from the University of Torino, Turin, Italy, in 2006 and 2010, respectively. In 2011, he joined TETIS, National Research Institute for Agriculture, Food and the Environment (INRAE), University of Montpellier, Montpellier, France, as a Junior Researcher. From 2024 he is the head of the EVERGREEN INRIA team at the Antenne INRIA de l’Université de Montpellier, Montpellier, France. His main research interests include machine learning, computer vision and data science with a particular emphasis on remote sensing data and Earth observation data fusion. He served on the program committee of many international conferences on data mining, machine learning, and database, including the IEEE International Conference on Data Mining (ICDM), European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), the International Joint Conference on Artificial Intelligence (IJCAI) and the Association for the Advancement of Artificial Intelligence (AAAI) conference. He served as a reviewer for many international journals in the general field of data science and remote sensing.

Cross-modal knowledge distillation (CMKD) refers to the scenario in which a learning framework must handle training and test data that exhibit a modality mismatch, more precisely, training and test data do not cover the same set of data modalities.

Traditional approaches for CMKD are based on a teacher/student paradigm where a teacher is trained on multi-modal data with the aim to successively distill knowledge from a multi-modal teacher to a single-modal student. Despite the widespread adoption of such paradigm, recent research has highlighted its inherent limitations in the context of cross-modal knowledge transfer.

Taking a step beyond the teacher/student paradigm, in this talk I will introduce a new collaborative framework for cross-modal knowledge distillation, named DisCoM-KD (Disentanglement-learning based Cross-Modal Knowledge Distillation), that explicitly models different types of per-modality information with the aim to transfer knowledge from multi-modal data to a single-modal classifier. To this end, DisCoM-KD effectively combines disentanglement representation learning with adversarial domain adaptation to simultaneously extract, for each modality, domain-invariant, domain-informative and domain-irrelevant features according to a specific downstream task.

Unlike the traditional teacher/student paradigm, our framework simultaneously learns all single-modal classifiers, eliminating the need to learn each student model separately as well as the teacher classifier. We evaluated DisCoM-KD on standard multi-modal benchmarks and compared its behaviour with recent state of the art knowledge distillation frameworks. The findings clearly demonstrate the effectiveness of DisCoM-KD over competitors considering mismatch scenarios involving both overlapping and non-overlapping modalities. These results offer insights to reconsider the traditional paradigm for distilling information from multi-modal data to single-modal neural networks.

Zoom link: https://zoom.us/j/98326976345?pwd=TlV9MGPKNNIe4qnBmV83KtCJqA9sCC.1

Assoc. Prof. Dino Ienco
Assoc. Prof. Dino Ienco (Member, IEEE)