Aidan Bell and James Gore have an accepted paper titled “Augmenting Cross-Modal Art Retrieval: The Role of MLLM-Synthesized Captions” at ACM SIGIR 2025. SIGIR is an A* conference, according to the ICORE Conference Portal. While this marks Aidan’s first publication, James has previously published at the ACM ICTIR and ECIR Conferences.
This paper is the outcome of Aidan and James’s course project in the Information Retrieval course at the Department of Computer Science at the University of Southern Maine. Aidan and James explored the use of Multi-Modal Large Language Models (MLLMs) for the challenging task of art retrieval, where the goal is to match textual descriptions with relevant artworks. By fine-tuning two retrieval models, Long-CLIP and BLIP, on both human-annotated and MLLM-generated captions, the study finds that MLLM-generated captions can yield comparable retrieval performance to human annotations.

