Hongxin Gao

Postgraduate Research Student

B.B.A. · B.Eng. · M.S.

h.gao@surrey.ac.uk

Academic and research departments

School of Health Sciences, Digital Health Expert Group.

About

My research project

Developing Machine Learning Tools for Predicting Cognitive Impairment from Survey Response Behaviours

Underdiagnosis of cognitive impairment (CI), including mild CI (MCI) and dementia, has become a public health challenge. With the emergence of new treatments aimed at alleviating early cognitive decline, the demand for assessments may increase sharply, putting pressure on our healthcare system.

His PhD project focuses on combining psychometric methods with data science techniques to facilitate early identification of CI in non-clinical settings, such as communities. The project aims to develop behavioural markers of early cognitive deficits by identifying latent inconsistent responses in large-scale surveys, and employing them to establish a deep learning-based prediction tool.

Supervisors

Haomiao Jin

Jenny Harris

Research

Research interests

Health Informatics · Deep Learning · Large Language Model · Agile Web Development

Research projects

Testing early markers of cognitive decline and dementia derived from survey response behaviors

Researcher - Subaward from US NIH/NIA

University of Surrey PI: Jin

Project PI: Schneider

Publications

H. Gao, J. Harris, D. Maupin, S. Schneider, R. Hernandez, H. Jin (2025)Early Identification of Cognitive Impairment in Older Adults From Advanced Activities of Daily Living (AADL): A Scoping Review, In: Public Health246105822 Elsevier

DOI: 10.1016/j.puhe.2025.105822

Rapid technological advancements and societal shifts have reshaped older adults’ daily lives. Emerging evidence suggests that Advanced Activities of Daily Living (AADL)—complex tasks beyond basic Activities of Daily Living (ADL) and Instrumental ADL (IADL)—may signal cognitive impairment earlier than traditional measures. This scoping review aims to identify AADL that could serve as early indicators of cognitive impairment. Scoping review. This review followed Arksey and O’Malley’s framework and adhered to the PRISMA-ScR guideline. We systematically searched Web of Science, PubMed, and PsycINFO for relevant studies published between 2013 and 2023. From 16,185 initially identified articles, 45 studies conducted across 11 countries were included after thorough screening and eligibility assessment. These studies examined eight domains of AADL (e.g., digital technology use, social engagement, cultural activities, naturalistic driving), across cognitive impairment stages from mild cognitive impairment (MCI) to dementia. Findings highlighted that frequent engagement in certain leisure, social, and intellectually stimulating activities (e.g., reading, playing board games, volunteering) was consistently associated with a reduced risk of cognitive decline. Digital behavioural indicators (e.g., computer usage patterns, naturalistic driving metrics, survey response times) showed promising predictive potential for early cognitive impairment identification, with several studies reporting good predictive accuracy (e.g., AUC > 0.80). AADL indicators hold promise for early detection and potential prevention of cognitive impairment among older adults. Future research should explore the feasibility, acceptability, and clinical integration of AADL assessments into routine geriatric care and public health practice, particularly in low- and middle-income countries.

Raymond Hernandez, Arthur A Stone, Elizabeth Zelinski, Erik Meijer, Titus Galama, Jessica Faul, Arie Kapteyn, Doerte U Junghaenel, Haomiao Jin, Margaret Gatz, Pey-Jiuan Lee, Daniel Maupin, Hongxin Gao, Bart Orriens, Stefan Schneider (2025)Evidence Supports the Validity and Reliability of Response Times from a Brief Survey as a Digital Biomarker for Processing Speed in a Large Panel Study, In: American journal of epidemiology(2025)kwae478

DOI: 10.1093/aje/kwae478

Survey response times (RTs) have hitherto untapped potential to allow researchers to gain more detailed insights into the cognitive performance of participants in online panel studies. We examined if RTs recorded from a brief online survey could serve as a digital biomarker for processing speed. Data from 9,893 adults enrolled in the nationally representative Understanding America Study were used in the analyses. Hypotheses included that people's average survey RTs would have a large correlation with an established processing speed test, small to moderate correlations with other cognitive tests, and associations with functional impairment. We also hypothesized that survey RTs would have sensitivity to various participant characteristics comparable to the established processing speed test's sensitivity (e.g., similar standardized means by gender). Overall, results support the validity and reliability of people's average RTs to survey items as a digital biomarker for processing speed. The correlation between survey RTs (reverse scored) and the formal processing speed test was 0.61 (p

Stefan Schneider, Raymond Hernandez, Doerte U. Junghaenel, Haomiao Jin, Pey-Jiuan Lee, Hongxin Gao, Danny Maupin, Bart Orriens, Erik Meijer, Arthur A. Stone (2024)Can you tell people's cognitive ability level from their response patterns in questionnaires?, In: Behavior Research Methods Springer

DOI: 10.3758/s13428-024-02388-2

Questionnaires are ever present in survey research. In this study, we examined whether an indirect indicator of general cognitive ability could be developed based on response patterns in questionnaires. We drew on two established phenomena characterizing connections between cognitive ability and people’s performance on basic cognitive tasks, and examined whether they apply to questionnaires responses. (1) The worst performance rule (WPR) states that people’s worst performance on multiple sequential tasks is more indicative of their cognitive ability than their average or best performance. (2) The task complexity hypothesis (TCH) suggests that relationships between cognitive ability and performance increase with task complexity. We conceptualized items of a questionnaire as a series of cognitively demanding tasks. A graded response model was used to estimate respondents’ performance for each item based on the difference between the observed and model-predicted response (“response error” scores). Analyzing data from 102 items (21 questionnaires) collected from a large-scale nationally representative sample of people aged 50+ years, we found robust associations of cognitive ability with a person’s largest but not with their smallest response error scores (supporting the WPR), and stronger associations of cognitive ability with response errors for more complex than for less complex questions (supporting the TCH). Results replicated across two independent samples and six assessment waves. A latent variable of response errors estimated for the most complex items correlated .50 with a latent cognitive ability factor, suggesting that response patterns can be utilized to extract a rough indicator of general cognitive ability in survey research.

Danny Maupin, Hongxin Gao, Emma Nichols, Alden Gross, Erik Meijer, Haomiao Jin (2025)Dementia ascertainment in India and development of nation‐specific cutoffs: A machine learning and diagnostic analysis, In: Alzheimer's & dementia : diagnosis, assessment & disease monitoring17(1)e70049 John Wiley & Sons, Inc

DOI: 10.1002/dad2.70049

Introduction Cognitive assessments are useful in ascertaining dementia but may be influenced by patient characteristics. India's distinct culture and demographics warrant investigation into population‐specific cutoffs. Methods Data were utilized from the Longitudinal Aging Study in India‐Diagnostic Assessment of Dementia (n = 2528). Dementia ascertainment was conducted by an online panel. A machine learning (ML) model was trained on these classifications, with explainable artificial intelligence to assess feature importance and inform cutoffs that were assessed across demographic groups. Results The Informant Questionnaire of Cognitive Decline in the Elderly (IQCODE) and Hindi Mini‐Mental State Examination (HMSE) were identified as the most impactful assessments with optimal cutoffs of 3.8 and 25, respectively. Discussion An ML assessment of clinician dementia ratings identified IQCODE and HMSE to be the most impactful assessments. Optimal cutoffs of 3.8 and 25 were identified and performed excellently in the overall sample, though did decrease in specific, more difficult‐to‐diagnose subgroups. Highlights Pioneers use of explainable artificial intelligence in the diagnosis of dementia. Creates assessment cutoffs specific to the nation of India. Highlights differences in cutoffs across nations.

Hongxin Gao, Stefan Schneider, Raymond Hernandez, Jenny Harris, Danny Maupin, Doerte U Junghaenel, Haomiao Jin (2024)MACHINE LEARNING MODELING OF LOW QUALITY SURVEY RESPONSES TO PREDICT COGNITIVE IMPAIRMENT AMONG OLDER ADULTS, In: Innovation in aging8(Suppl 1)pp. 309-310 Oxford University Press

DOI: 10.1093/geroni/igae098.1011

Early identification of cognitive impairment (CI) is critical for managing Alzheimer’s disease and other dementia. Leveraging emerging evidence on the relationship between subtle errors in survey responses and CI, this study uses a novel informatics approach that combines machine learning with psychometric methods to develop a risk prediction model for identification of CI, including mild CI and dementia, in the general older adult population. The study is based on a sample of 12,942 participants aged 50 and above in the Health and Retirement Study, and psychometric indices of low-quality responses (LQR) in a range of different surveys are created as predictors. Our analysis shows an area under the curve (AUC) of 0.66 for identifying current CI and 0.70 for predicting dementia or mortality in the next 10 years. Also, the subgroup analysis shows the LQR indices have better predictive performance in the 50-59 (AUC=0.72), and 60-69 (AUC=0.71) age groups, suggesting the model may be more sensitive to early cognitive deficits. A unique feature of this tool is that it does not require the underlying surveys to be directly relevant to CI; thus, health professionals, especially those working in community settings like health and social workers, may use the tool to assist identifying older adults at risk of CI based on questionnaires of other aspects of their life, such as quality of life and personality. It may also be useful for aging researchers who intend to identify high-risk populations from survey data that do not include direct assessment of CI.

Hongxin Gao, Stefan Schneider, Raymond Hernandez, Jenny Harris, Danny Maupin, Doerte U Junghaenel, Arie Kapteyn, Arthur Stone, Elizabeth Zelinski, Erik Meijer, Pey-Jiuan Lee, Bart Orriens, Haomiao Jin (2024)Early Identification of Cognitive Impairment in Community Environments Through Modeling Subtle Inconsistencies in Questionnaire Responses: Machine Learning Model Development and Validation, In: JMIR formative research8

DOI: 10.2196/54335

Background The underdiagnosis of cognitive impairment hinders timely intervention of dementia. Health professionals working in the community play a critical role in the early detection of cognitive impairment, yet still face several challenges such as a lack of suitable tools, necessary training, and potential stigmatization. Objective This study explored a novel application integrating psychometric methods with data science techniques to model subtle inconsistencies in questionnaire response data for early identification of cognitive impairment in community environments. Methods This study analyzed questionnaire response data from participants aged 50 years and older in the Health and Retirement Study (waves 8-9, n=12,942). Predictors included low-quality response indices generated using the graded response model from four brief questionnaires (optimism, hopelessness, purpose in life, and life satisfaction) assessing aspects of overall well-being, a focus of health professionals in communities. The primary and supplemental predicted outcomes were current cognitive impairment derived from a validated criterion and dementia or mortality in the next ten years. Seven predictive models were trained, and the performance of these models was evaluated and compared. Results The multilayer perceptron exhibited the best performance in predicting current cognitive impairment. In the selected four questionnaires, the area under curve values for identifying current cognitive impairment ranged from 0.63 to 0.66 and was improved to 0.71 to 0.74 when combining the low-quality response indices with age and gender for prediction. We set the threshold for assessing cognitive impairment risk in the tool based on the ratio of underdiagnosis costs to overdiagnosis costs, and a ratio of 4 was used as the default choice. Furthermore, the tool outperformed the efficiency of age or health-based screening strategies for identifying individuals at high risk for cognitive impairment, particularly in the 50- to 59-year and 60- to 69-year age groups. The tool is available on a portal website for the public to access freely. Conclusions We developed a novel prediction tool that integrates psychometric methods with data science to facilitate “passive or backend” cognitive impairment assessments in community settings, aiming to promote early cognitive impairment detection. This tool simplifies the cognitive impairment assessment process, making it more adaptable and reducing burdens. Our approach also presents a new perspective for using questionnaire data: leveraging, rather than dismissing, low-quality data.

Raymond Hernandez, Haomiao Jin, Pey-Jiuan Lee, Stefan Schneider, Doerte U Junghaenel, Arthur A Stone, Erik Meijer, Hongxin Gao, Daniel James Maupin, Elizabeth M Zelinski (2024)Attrition from longitudinal ageing studies and performance across domains of cognitive functioning: an individual participant data meta-analysis, In: BMJ open14(3)e079241

DOI: 10.1136/bmjopen-2023-079241

This paper examined the magnitude of differences in performance across domains of cognitive functioning between participants who attrited from studies and those who did not, using data from longitudinal ageing studies where multiple cognitive tests were administered. Individual participant data meta-analysis. Data are from 10 epidemiological longitudinal studies on ageing (total n=209 518) from several Western countries (UK, USA, Mexico, etc). Each study had multiple waves of data (range of 2-17 waves), with multiple cognitive tests administered at each wave (range of 4-17 tests). Only waves with cognitive tests and information on participant dropout at the immediate next wave for adults aged 50 years or older were used in the meta-analysis. For each pair of consecutive study waves, we compared the difference in cognitive scores (Cohen's d) between participants who dropped out at the next study wave and those who remained. Note that our operationalisation of dropout was inclusive of all causes (eg, mortality). The proportion of participant dropout at each wave was also computed. The average proportion of dropouts between consecutive study waves was 0.26 (0.18 to 0.34). People who attrited were found to have significantly lower levels of cognitive functioning in all domains (at the wave 2-3 years before attrition) compared with those who did not attrit, with small-to-medium effect sizes (overall d=0.37 (0.30 to 0.43)). Older adults who attrited from longitudinal ageing studies had lower cognitive functioning (assessed at the timepoint before attrition) across all domains as compared with individuals who remained. Cognitive functioning differences may contribute to selection bias in longitudinal ageing studies, impeding accurate conclusions in developmental research. In addition, examining the functional capabilities of attriters may be valuable for determining whether attriters experience functional limitations requiring healthcare attention.

Hongxin Gao, Stefanie Kuenzel, Xiao-Yu Zhang (2022)A Hybrid ConvLSTM-Based Anomaly Detection Approach for Combating Energy Theft, In: IEEE Transactions on Instrumentation and Measurement712517110pp. 1-10 Institute of Electrical and Electronics Engineers (IEEE)

DOI: 10.1109/TIM.2022.3201569

In a conventional power grid, energy theft is difficult to detect due to limited communication and data transition. The smart meter along with big data mining technology leads to significant technological innovation in the field of energy theft detection (ETD). This article proposes a convolutional long short-term memory (ConvLSTM)-based ETD model to identify electricity theft users. In this work, electricity consumption data are reshaped quarterly into a 2-D matrix and used as the sequential input to the ConvLSTM. The convolutional neural network (CNN) embedded into the long short-term memory (LSTM) can better learn the features of the data on different quarters, months, weeks, and days. Besides, the proposed model incorporates batch normalization. This technique allows the proposed ETD model to support raw format electricity consumption data input, reducing training time and increasing the efficiency of model deployment. The result of the case study shows that the proposed ConvLSTM model exhibits good robustness. It outperforms the multilayer perceptron (MLP) and CNN-LSTM in terms of performance metrics and model generalization capability. Moreover, the result also demonstrates that K -fold cross validation can improve the ETD prediction accuracy.