Dr Marika Kaakinen
Academic and research departments
Faculty of Health and Medical Sciences, School of Biosciences, Centre for Mathematical and Computational Biology.About
Biography
I hold an MSc degree in statistics and a PhD in genetic and life-course epidemiology from the University of Oulu, Finland. Before joining the University of Surrey in April 2019, I worked as a Marie Curie Fellow, followed by a post as a Research Associate at Imperial College London, UK.
I develop and apply statistical analysis methods for genomic/omics research of complex human traits, including type 2 diabetes and psychiatric traits. I work with various types of omics data, including metabolomics, proteomics, gut microbiome and whole-genome sequencing data. I have developed/contributed to the following software tools: MARV and SCOPA. I have also contributed to numerous GWAS within several consortia, including DIAGRAM (DIAbetes Genetics Replication And Meta-analysis), MAGIC (Meta-Analyses of Glucose and Insulin related traits), ENGAGE (European Network of Genomic and Genetic Epidemiology), EGG (Early Growth Genetics) and SSGAC (Social Science Genetic Association Consortium).
Areas of specialism
University roles and responsibilities
- Academic Integrity Officer
- Ethics Committee member
Affiliations and memberships
ResearchResearch interests
My overall research interest is to develop and apply statistical methodology to better understand complex human traits in order to improve prevention and treatment of diseases. I have contributed to numerous genome-wide association studies (GWAS) of several complex traits, leading to the discovery of hundreds of genetic variants associated with these traits. More recently, I have developed software for multi-phenotype GWAS to improve power for the analysis as well as to discover potential pleiotropic and other multi-phenotype effects. I am keen on finding new ways to utilise the huge amounts of data that are generated constantly, by applying methods, such as machine learning or methods based on already published summary statistics.
Research collaborations
Northern Finland Birth Cohorts, University of Oulu, Finland.
Estonian Genome Center, University of Tartu, Estonia.
University of Lausanne, Switzerland.
Pondicherry University, Puducherry, India.
Stremble Ventures, AVVA Pharmaceuticals and Europan University Cyprus, Cyprus.
Imperial College London, UK.
Research interests
My overall research interest is to develop and apply statistical methodology to better understand complex human traits in order to improve prevention and treatment of diseases. I have contributed to numerous genome-wide association studies (GWAS) of several complex traits, leading to the discovery of hundreds of genetic variants associated with these traits. More recently, I have developed software for multi-phenotype GWAS to improve power for the analysis as well as to discover potential pleiotropic and other multi-phenotype effects. I am keen on finding new ways to utilise the huge amounts of data that are generated constantly, by applying methods, such as machine learning or methods based on already published summary statistics.
Research collaborations
Northern Finland Birth Cohorts, University of Oulu, Finland.
Estonian Genome Center, University of Tartu, Estonia.
University of Lausanne, Switzerland.
Pondicherry University, Puducherry, India.
Stremble Ventures, AVVA Pharmaceuticals and Europan University Cyprus, Cyprus.
Imperial College London, UK.
Supervision
Postgraduate research supervision
PhD students:
2023-present, Tingyu Guo, University of Surrey
2019-present, Igors Pupko, University of Surrey
2019-present, Liudmila Zudina, University of Surrey
Completed postgraduate research projects I have supervised
PhD students:
2015-2021, Mila Desi Anasanti, Imperial College London
MSc students:
2022, Tingyu Guo, Imperial College London
2021, Yuwei Jiao, Imperial College London
2021, Wenjie Li, Imperial College London
2020, Suruthi Shasheetharan, Imperial College London
2019, Jared Maina, Imperial College London
2018, Laurie Prelot, Imperial College London
2018, Edita Pileckyte, Imperial College London
2017, Kelsey Gibbs, Imperial College London
2016, Longda Jiang, Imperial College London
2015, Annique Claringbould, Imperial College London
Teaching
Courses I teach on
UG:
BMS1050 - Biochemistry: the many molecules of life
BMS2036 - Molecular Biology and Genetics: From Genes to Biological Function
BMS2043 - Analytical and Clinical Biochemistry
BMS3048 - BSc in Biomedical Sciences dissertation project
PG:
BMSM028
BMSM020
CPD:
Introduction to the statistical analysis of genome-wide association studies
External teaching
2016-present, Omics module for the MSc in Genomic Medicine, Imperial College London, London, UK
Publications
NA
Introduction Polygenic Score (PGS) is a valuable method for assessing the estimated genetic liability to a given outcome or genetic variability contributing to a quantitative trait. While PRSs are widely used for complex traits, their application in uncovering shared genetic predisposition between phenotypes, i.e. when genetic variants influence more than one phenotype, remains limited. Methods We developed an R package, comorbidPGS, which facilitates a systematic evaluation of shared genetic effects among (cor)related phenotypes using PGSs. The comorbidPGS package takes as input a set of Single Nucleotide Polymorphisms (SNPs) along with their established effects on the original phenotype (Po), referred to as Po-PGS. It generates a comprehensive summary of effect(s) of Po-PGS on target phenotype(s) (Pt) with customisable graphical features. Results We applied comorbidPGS to investigate the shared genetic predisposition between phenotypes defining elevated blood pressure (Systolic Blood Pressure, SBP; Diastolic Blood Pressure, DBP; Pulse Pressure, PP) and several cancers (Breast Cancer, BrC; Pancreatic Cancer, PanC; Kidney Cancer, KidC; Prostate Cancer, PrC; Colorectal Cancer, CrC) using the European ancestry UK Biobank individuals and GWAS meta-analyses summary statistics from independent set of European ancestry individuals. We report a significant association between elevated DBP and the genetic risk of PrC (β (SE)=0.066 (0.017), P-value=9.64×10^(-5)), as well as between CrC PGS and both, lower SBP (β (SE)=-0.10 [0.029], P-value=3.83×10^(-4))) and lower DBP (β (SE)=-0.055 [0.017], P-value=1.05×10^(-3)). Our analysis highlights two nominally significant relationships for individuals with genetic predisposition to elevated SBP leading to higher risk of KidC (OR [95%CI]=1.04 [1.0039-1.087], P-value=2.82×10^(-2)) and PrC (OR [95%CI]=1.02 [1.003-1.041], P-value=2.22×10^(-2)). Conclusion Using comorbidPGS, we underscore mechanistic relationships between blood pressure regulation and susceptibility to three comorbid malignancies. This package offers valuable means to evaluate shared genetic susceptibility between (cor)related phenotypes through polygenic scores.
Polycystic ovary syndrome (PCOS) is a very common endocrine condition in women in India. Gut microbiome alterations were shown to be involved in PCOS, yet it is remarkably understudied in Indian women who have a higher incidence of PCOS as compared to other ethnic populations. During the regional PCOS screening program among young women, we recruited 19 drug naive women with PCOS and 20 control women at the Sher-i-Kashmir Institute of Medical Sciences, Kashmir, North India. We profiled the gut microbiome in faecal samples by 16S rRNA sequencing and included 40/58 operational taxonomic units (OTUs) detected in at least 1/3 of the subjects with relative abundance (RA) ≥ 0.1%. We compared the RAs at a family/genus level in PCOS/non-PCOS groups and their correlation with 33 metabolic and hormonal factors, and corrected for multiple testing, while taking the variation in day of menstrual cycle at sample collection, age and BMI into account. Five genera were significantly enriched in PCOS cases: , , and previously reported for PCOS , and confirmed by different statistical models. At the family level, the relative abundance of was enriched, whereas was decreased among cases. We observed increased relative abundance of and with higher fasting blood glucose levels, and and with larger hip, waist circumference, weight, and with lower prolactin levels. We also detected a novel association between and follicle-stimulating hormone levels and between and alkaline phosphatase, independently of the BMI of the participants. Our report supports that there is a relationship between gut microbiome composition and PCOS with links to specific reproductive health metabolic and hormonal predictors in Indian women.
Obesity and type 2 diabetes (T2D) are associated with increased risk of pancreatic cancer. Here we assessed the relationship between pancreatic cancer and two distinct measures of obesity, namely total adiposity, using BMI, versus abdominal adiposity, using BMI adjusted waist-to-hip ratio (WHRadjBMI) by utilising polygenic scores (PGS) and Mendelian randomisation (MR) analyses. We constructed z-score weighted PGS for BMI and WHRadjBMI using publicly available data and tested for their association with pancreatic cancer defined in UK biobank (UKBB). Using publicly available summary statistics, we then performed bi-directional MR analyses between the two obesity traits and pancreatic cancer. PGS(BMI) was significantly (multiple testing-corrected) associated with pancreatic cancer (OR[95%CI] = 1.0804[1.025-1.14], P = 0.0037). The significance of association declined after T2D adjustment (OR[95%CI] = 1.073[1.018-1.13], P = 0.00904). PGS(WHRadjBMI) association with pancreatic cancer was at the margin of statistical significance (OR[95%CI] = 1.047[0.99-1.104], P = 0.086). T2D adjustment effectively lost any suggestive association of PGS(WHRadjBMI) with pancreatic cancer (OR[95%CI] = 1.039[0.99-1.097], P = 0.14). MR analyses showed a nominally significant causal effect of WHRadjBMI on pancreatic cancer (OR[95%CI] = 1.00095[1.00011-1.0018], P = 0.027) but not for BMI on pancreatic cancer. Overall, we show that abdominal adiposity measured using WHRadjBMI, may be a more important causal risk factor for pancreatic cancer compared to total adiposity, with T2D being a potential driver of this relationship.
Conventional measurements of fasting and postprandial blood glucose levels investigated in genome-wide association studies (GWAS) cannot capture the effects of DNA variability on 'around the clock' glucoregulatory processes. Here we show that GWAS meta-analysis of glucose measurements under nonstandardized conditions (random glucose (RG)) in 476,326 individuals of diverse ancestries and without diabetes enables locus discovery and innovative pathophysiological observations. We discovered 120 RG loci represented by 150 distinct signals, including 13 with sex-dimorphic effects, two cross-ancestry and seven rare frequency signals. Of these, 44 loci are new for glycemic traits. Regulatory, glycosylation and metagenomic annotations highlight ileum and colon tissues, indicating an underappreciated role of the gastrointestinal tract in controlling blood glucose. Functional follow-up and molecular dynamics simulations of lower frequency coding variants in glucagon-like peptide-1 receptor (GLP1R), a type 2 diabetes treatment target, reveal that optimal selection of GLP-1R agonist therapy will benefit from tailored genetic stratification. We also provide evidence from Mendelian randomization that lung function is modulated by blood glucose and that pulmonary dysfunction is a diabetes complication. Our investigation yields new insights into the biology of glucose regulation, diabetes complications and pathways for treatment stratification. Genome-wide association analyses of blood glucose measurements under nonstandardized conditions provide insights into the biology of glucose regulation, diabetes complications and pathways for treatment stratification.
Introduction The role of TOMM40-APOE 19q13.3 region variants is well documented in Alzheimer's disease (AD) but remains contentious in dementia with Lewy bodies (DLB) and Parkinson's disease dementia (PDD). Methods We dissected genetic profiles within the TOMM40-APOE region in 451 individuals from four European brain banks, including DLB and PDD cases with/without neuropathological evidence of AD-related pathology and healthy controls. Results TOMM40-L/APOE-ε4 alleles were associated with DLB (ORTOMM40-L = 3.61; P value = 3.23 × 10−9; ORAPOE-ε4 = 3.75; P value = 4.90 × 10−10) and earlier age at onset of DLB (HRTOMM40-L = 1.33, P value = .031; HRAPOE-ε4 = 1.46, P value = .004), but not with PDD. The TOMM40-L/APOE-ε4 effect was most pronounced in DLB individuals with concomitant AD pathology (ORTOMM40-L = 4.40, P value = 1.15 × 10−6; ORAPOE-ε4 = 5.65, P value = 2.97 × 10−8) but was not significant in DLB without AD. Meta-analyses combining all APOE-ε4 data in DLB confirmed our findings (ORDLB = 2.93, P value = 3.78 × 10−99; ORDLB+AD = 5.36, P value = 1.56 × 10−47). Discussion APOE-ε4/TOMM40-L alleles increase susceptibility and risk of earlier DLB onset, an effect explained by concomitant AD-related pathology. These findings have important implications in future drug discovery and development efforts in DLB.
We assessed the predictive ability of a combined genetic variant panel for the risk of recurrent pregnancy loss (RPL) through a case-control study. Our study sample was from Ukraine and included 114 cases with idiopathic RPL and 106 controls without any pregnancy losses/complications and with at least one healthy child. We genotyped variants within 12 genetic loci reflecting the main biological pathways involved in pregnancy maintenance: blood coagulation (F2, F5, F7, GP1A), hormonal regulation (ESR1, ADRB2), endometrium and placental function (ENOS, ACE), folate metabolism (MTHFR) and inflammatory response (IL6, IL8, IL10). We showed that a genetic risk score (GRS) calculated from the 12 variants was associated with an increased risk of RPL (odds ratio 1.56, 95% CI: 1.21, 2.04, p = 8.7 × 10−4). The receiver operator characteristic (ROC) analysis resulted in an area under the curve (AUC) of 0.64 (95% CI: 0.57, 0.72), indicating an improved ability of the GRS to classify women with and without RPL. Ιmplementation of the GRS approach can help define women at higher risk of complex multifactorial conditions such as RPL. Future well-powered genome-wide association studies will help in dissecting biological pathways previously unknown for RPL and further improve the identification of women with RPL susceptibility.
Pubertal growth patterns correlate with future health outcomes. However, the genetic mechanisms mediating growth trajectories remain largely unknown. Here, we modeled longitudinal height growth with Super-Imposition by Translation And Rotation (SITAR) growth curve analysis on ~ 56,000 trans-ancestry samples with repeated height measurements from age 5 years to adulthood. We performed genetic analysis on six phenotypes representing the magnitude, timing, and intensity of the pubertal growth spurt. To investigate the lifelong impact of genetic variants associated with pubertal growth trajectories, we performed genetic correlation analyses and phenome-wide association studies in the Penn Medicine BioBank and the UK Biobank. Large-scale growth modeling enables an unprecedented view of adolescent growth across contemporary and 20th-century pediatric cohorts. We identify 26 genome-wide significant loci and leverage trans-ancestry data to perform fine-mapping. Our data reveals genetic relationships between pediatric height growth and health across the life course, with different growth trajectories correlated with different outcomes. For instance, a faster tempo of pubertal growth correlates with higher bone mineral density, HOMA-IR, fasting insulin, type 2 diabetes, and lung cancer, whereas being taller at early puberty, taller across puberty, and having quicker pubertal growth were associated with higher risk for atrial fibrillation. We report novel genetic associations with the tempo of pubertal growth and find that genetic determinants of growth are correlated with reproductive, glycemic, respiratory, and cardiac traits in adulthood. These results aid in identifying specific growth trajectories impacting lifelong health and show that there may not be a single "optimal" pubertal growth pattern.
Glycemic traits are used to diagnose and monitor type 2 diabetes and cardiometabolic health. To date, most genetic studies of glycemic traits have focused on individuals of European ancestry. Here we aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P
Depression is a common comorbidity of type 2 diabetes. We assessed the causal relationships and shared genetics between them. We applied two-sample, bidirectional Mendelian randomization (MR) to assess causality between type 2 diabetes and depression. We investigated potential mediation using two-step MR. To identify shared genetics, we performed 1) genome-wide association studies (GWAS) separately and 2) multiphenotype GWAS (MP-GWAS) of type 2 diabetes (19,344 case subjects, 463,641 control subjects) and depression using major depressive disorder (MDD) (5,262 case subjects, 86,275 control subjects) and self-reported depressive symptoms (n = 153,079) in the UK Biobank. We analyzed expression quantitative trait loci (eQTL) data from public databases to identify target genes in relevant tissues. MR demonstrated a significant causal effect of depression on type 2 diabetes (odds ratio 1.26 [95% CI 1.11-1.44], P = 5.46 × 10-4) but not in the reverse direction. Mediation analysis indicated that 36.5% (12.4-57.6%, P = 0.0499) of the effect from depression on type 2 diabetes was mediated by BMI. GWAS of type 2 diabetes and depressive symptoms did not identify shared loci. MP-GWAS identified seven shared loci mapped to TCF7L2, CDKAL1, IGF2BP2, SPRY2, CCND2-AS1, IRS1, CDKN2B-AS1. MDD has not brought any significant association in either GWAS or MP-GWAS. Most MP-GWAS loci had an eQTL, including single nucleotide polymorphisms implicating the cell cycle gene CCND2 in pancreatic islets and brain and the insulin signaling gene IRS1 in adipose tissue, suggesting a multitissue and pleiotropic underlying mechanism. Our results highlight the importance to prevent type 2 diabetes at the onset of depressive symptoms and the need to maintain a healthy weight in the context of its effect on depression and type 2 diabetes comorbidity.
The current epidemics of cardiovascular and metabolic noncommunicable diseases have emerged alongside dramatic modifications in lifestyle and living environments. These correspond to changes in our “modern” postwar societies globally characterized by rural-to-urban migration, modernization of agricultural practices, and transportation, climate change, and aging. Evidence suggests that these changes are related to each other, although the social and biological mechanisms as well as their interactions have yet to be uncovered. LongITools, as one of the 9 projects included in the European Human Exposome Network, will tackle this environmental health equation linking multidimensional environmental exposures to the occurrence of cardiovascular and metabolic noncommunicable diseases.
Epidemic obesity is the most important risk factor for prediabetes and type 2 diabetes (T2D) in youth as it is in adults. Obesity shares pathophysiological mechanisms with T2D and is likely to share part of the genetic background. We aimed to test if weighted genetic risk scores (GRSs) for T2D, fasting glucose (FG) and fasting insulin (FI) predict glycaemic traits and if there is a causal relationship between obesity and impaired glucose metabolism in children and adolescents. Genotyping of 42 SNPs established by genome-wide association studies for T2D, FG and FI was performed in 1660 Italian youths aged between 2 and 19 years. We defined GRS for T2D, FG and FI and tested their effects on glycaemic traits, including FG, FI, indices of insulin resistance/beta cell function and body mass index (BMI). We evaluated causal relationships between obesity and FG/FI using one-sample Mendelian randomization analyses in both directions. GRS-FG was associated with FG (beta = 0.075 mmol/l, SE = 0.011, P = 1.58 × 10 −11) and beta cell function (beta = −0.041, SE = 0.0090 P = 5.13 × 10 −6). GRS-T2D also demonstrated an association with beta cell function (beta = −0.020, SE = 0.021 P = 0.030). We detected a causal effect of increased BMI on levels of FI in Italian youths (beta = 0.31 ln (pmol/l), 95%CI [0.078, 0.54], P = 0.0085), while there was no effect of FG/FI levels on BMI. Our results demonstrate that the glycaemic and T2D risk genetic variants contribute to higher FG and FI levels and decreased beta cell function in children and adolescents. The causal effects of adiposity on increased insulin resistance are detectable from childhood age.
Early childhood growth patterns are associated with adult health, yet the genetic factors and the developmental stages involved are not fully understood. Here, we combine genome-wide association studies with modeling of longitudinal growth traits to study the genetics of infant and child growth, followed by functional, pathway, genetic correlation, risk score, and colocalization analyses to determine how developmental timings, molecular pathways, and genetic determinants of these traits overlap with those of adult health. We found a robust overlap between the genetics of child and adult body mass index (BMI), with variants associated with adult BMI acting as early as 4 to 6 years old. However, we demonstrated a completely distinct genetic makeup for peak BMI during infancy, influenced by variation at the LEPR/LEPROT locus. These findings suggest that different genetic factors control infant and child BMI. In light of the obesity epidemic, these findings are important to inform the timing and targets of prevention strategies.
Differences between sexes contribute to variation in the levels of fasting glucose and insulin. Epidemiological studies established a higher prevalence of impaired fasting glucose in men and impaired glucose tolerance in women, however, the genetic component underlying this phenomenon is not established. We assess sex-dimorphic (73,089/50,404 women and 67,506/47,806 men) and sex-combined (151,188/105,056 individuals) fasting glucose/fasting insulin genetic effects via genome-wide association study meta-analyses in individuals of European descent without diabetes. Here we report sex dimorphism in allelic effects on fasting insulin at IRS1 and ZNF12 loci, the latter showing higher RNA expression in whole blood in women compared to men. We also observe sex-homogeneous effects on fasting glucose at seven novel loci. Fasting insulin in women shows stronger genetic correlations than in men with waist-to-hip ratio and anorexia nervosa. Furthermore, waist-to-hip ratio is causally related to insulin resistance in women, but not in men. These results position dissection of metabolic and glycemic health sex dimorphism as a steppingstone for understanding differences in genetic effects between women and men in related phenotypes.
Additional publications
A full list of my publications can be found under my Google Scholar profile https://scholar.google.com/citations?user=MBCp0McAAAAJ&hl=fi