LGMsGWAS: Large Genome Models for Genome Wide Association Studies

A unique opportunity to work on a very interesting AI challenge aimed at deciphering the human genome, giving us better understanding and insights into diseases and their causes. You will be guided by a team of multidisciplinary experts, including world leading experts in statistical multi-omics and artificial intelligence.

Start date

1 October 2025

Duration

3.5 years

Application deadline

Funding source

EPSRC Doctoral Landscape Award

Funding information

Full fee covered, research training support grants (£3,000 in total), UKRI standard stipend (£20,780 per year).

About

Understanding of the human genome plays a critical role in understanding diseases, their causes and help greatly to cure diseases including personalised treatments and drugs design. Artificial intelligence (AI) can improve our understanding of the genome, thereby, significantly enhancing our comprehension of diseases and their cure. Particularly, large genomic models (LGMs) pretrained using self-supervised learning (SSL), the same pertaining principle behind the large language models (LLMs), can significantly enhance the genomic understanding. SSL learns from data without human supplied labels hence can be used for large scale pretraining of LLMs and LGMs. Typically, LLMs lack explainability and reasoning and can hallucinate, leading to hindrance in adaptability of LLMs in genome wide association studies (GWAS) which require high explainability and identification of particular SNPs (single nucleotide polymorphisms) to a given disease. 

The PhD will focus on developing explainable and interpretable LGMs for GWAS and other applications of genomics. The first step is to build LGMs using large amounts of multimodal genomic data from Prof. Prokopenko's group. Dr. Awais’ and Dr. Atito’s teams have core expertise in multimodal representation learning and SSL of large transformers models in healthcare, vision, speech and LLMs. The student will leverage multimodal representation learning expertise in Dr. Awais’ group and SSL expertise in Dr. Sara’s group to train a baseline LGM. In the second stage reasoning, explainability and interpretability will be incorporated into LGMs. The latter part of PhD will focus on other genomic tasks like predicting transcription factor binding sites, splice sites and identifying promoter regions and well as multimodal genomic tasks including genomic and imaging data like cancer survival prediction, cancer subtype prediction etc.

Eligibility criteria

Open to any UK or international candidates. Up to 30% of our UKRI funded studentships can be awarded to candidates paying international rate fees.

You will need to meet the minimum entry requirements for our PhD programme.

The candidate should have a MSc degree in artificial intelligence/computer science from a reputed university. We may consider Computer/Electronics engineering, mathematics background for exceptional students. To be shortlisted for interview you need to be among top students of your class and a minimum of distinction. For exceptionally bright students (e.g., position holders) we may consider BSc or equivalent in artificial intelligence/computer science from a top-ranking university. You need to have strong knowledge of latest development in AI, high ability of problem solving, and very strong coding in python and other languages and ability to understand latest AI research and publications. 

International applicants are welcome to apply, however, the competition for international students is significantly higher than UK home students. For an international applicant you need to have your education from one to the top 3 universities in your country and you need to be one of the position holders in your batch. Exceptional students how are expected to finish their studies before or by the start date can also apply. This is an interdisciplinary project so candidate must be willing to learn about genomics and related areas. The shortlisted candidates will go through multiple rounds of knockout based interviews these include, knowledge of mathematics and machine/deep learning, problem solving, live coding and research comprehension and capacity (We will ask you to present critique on 3 of 5 latest paper given to you in limited time).

How to apply

Applications should be submitted via the Vision, Speech and Signal Processing PhD programme page. In place of a research proposal you should upload a document stating the title of the project that you wish to apply for and the name of the relevant supervisor.

Studentship FAQs

Read our studentship FAQs to find out more about applying and funding.

Application deadline

Contact details

Muhammad Awais
09 BA 00
Telephone: +44 (0)1483 684344
E-mail: muhammad.awais@surrey.ac.uk
studentship-cta-strip

Studentships at Surrey

We have a wide range of studentship opportunities available.