Wish Suharitdamrong

ws00372@surrey.ac.uk

Academic and research departments

Surrey Institute for People-Centred Artificial Intelligence (PAI), Centre for Vision, Speech and Signal Processing (CVSSP).

About

My research project

Multi-Modal Foundation Models

This research proposal aims to address the limitations of current multimodal learning models, which often overlook fine-grained information in favour of global representations. In domains like multimedia (e.g., videos with images, audio, and transcripts) and healthcare (e.g., medical images and clinical data), multimodal data carry complex, overlapping semantic concepts. This research will develop novel self-supervised learning algorithms that focus on extracting and aligning fine-grained, multi-concept representations across modalities. By designing specialised neural architectures and loss functions, we will enhance the integration of multimodal data, enabling a deeper understanding of complex cross-modal relationships. This approach will have significant implications for fields like multimedia analysis and healthcare informatics, where detailed multimodal interpretation is essential.

Supervisors

Sara Atito Ali Ahmed

Muhammad Awais

Cookies