Nikola Marić obtained a Master’s degree. He successfully defended his thesis, titled “Morphing attack detection using multimodal large language models“, conducted under the supervision of prof. dr. Vitomir Štruc and asist. Marija Ivanovska Preskar.
Congratulations!
Abstract:
Face morphing attacks pose a significant threat to biometric security systems by enabling multiple individuals to authenticate with a single compromised credential i.e., a morphed face image. This thesis investigates the use of multimodal large language models (MLLMs) for morphing attack detection, demonstrating that foundation models trained on large-scale, heterogeneous data possess latent forensic capabilities that can be adapted for specialized security tasks.
We evaluate four open-source models in a zero-shot setting, including Gemma- 3 27B, Qwen2.5-VL 32B, Llama-4 Scout 17B, and Mistral Small 3.1 24B, across diverse datasets covering landmark-based, GAN-based, and diffusion-based morphing attacks. Even without task-specific training, these models achieve measurable detection performance, confirming that multimodal language models inherently encode useful representations. To improve zero-shot detection reliability, we developed a structured forensic prompt, which guides the models through a systematic six-step procedure for detecting visual artifacts created during the blending of facial images. This structured prompting approach enhances both detection accuracy and interpretability of the outputs.
The primary contribution of the thesis lies in parameter-efficient fine-tuning through Low-Rank Adaptation (LoRA). Using only 0.61% of trainable parameters, we fine-tuned Gemma-3 12B. This fine-tuned model substantially outperformed its zero-shot counterpart, reducing the average Equal Error Rate by more than half. It achieved near-perfect detection on landmark-based morphs, competitive results on challenging GAN-based and diffusion-based morphs. Overall, this research establishes multimodal large language models as a viable and promising direction for morphing attack detection, combining generalization and interpretability with competitive performance against state-of-the-art approaches.