Hongxin Gao


Postgraduate Research Student
B.B.A. · B.Eng. · M.S.

Academic and research departments

School of Health Sciences, Digital Health Expert Group.

About

My research project

Research

Research interests

Research projects

Publications

Hongxin Gao, Stefan Schneider, Raymond Hernandez, Jenny Harris, Danny Maupin, Doerte U Junghaenel, Haomiao Jin (2024)MACHINE LEARNING MODELING OF LOW QUALITY SURVEY RESPONSES TO PREDICT COGNITIVE IMPAIRMENT AMONG OLDER ADULTS, In: Innovation in aging8(Suppl 1)pp. 309-310 Oxford University Press

Early identification of cognitive impairment (CI) is critical for managing Alzheimer’s disease and other dementia. Leveraging emerging evidence on the relationship between subtle errors in survey responses and CI, this study uses a novel informatics approach that combines machine learning with psychometric methods to develop a risk prediction model for identification of CI, including mild CI and dementia, in the general older adult population. The study is based on a sample of 12,942 participants aged 50 and above in the Health and Retirement Study, and psychometric indices of low-quality responses (LQR) in a range of different surveys are created as predictors. Our analysis shows an area under the curve (AUC) of 0.66 for identifying current CI and 0.70 for predicting dementia or mortality in the next 10 years. Also, the subgroup analysis shows the LQR indices have better predictive performance in the 50-59 (AUC=0.72), and 60-69 (AUC=0.71) age groups, suggesting the model may be more sensitive to early cognitive deficits. A unique feature of this tool is that it does not require the underlying surveys to be directly relevant to CI; thus, health professionals, especially those working in community settings like health and social workers, may use the tool to assist identifying older adults at risk of CI based on questionnaires of other aspects of their life, such as quality of life and personality. It may also be useful for aging researchers who intend to identify high-risk populations from survey data that do not include direct assessment of CI.

Raymond Hernandez, Arthur A Stone, Elizabeth Zelinski, Erik Meijer, Titus Galama, Jessica Faul, Arie Kapteyn, Doerte U Junghaenel, Haomiao Jin, Margaret Gatz, Pey-Jiuan Lee, Daniel Maupin, Hongxin Gao, Bart Orriens, Stefan Schneider (2025)Evidence Supports the Validity and Reliability of Response Times from a Brief Survey as a Digital Biomarker for Processing Speed in a Large Panel Study, In: American journal of epidemiology

Survey response times (RTs) have hitherto untapped potential to allow researchers to gain more detailed insights into the cognitive performance of participants in online panel studies. We examined if RTs recorded from a brief online survey could serve as a digital biomarker for processing speed. Data from 9,893 adults enrolled in the nationally representative Understanding America Study were used in the analyses. Hypotheses included that people's average survey RTs would have a large correlation with an established processing speed test, small to moderate correlations with other cognitive tests, and associations with functional impairment. We also hypothesized that survey RTs would have sensitivity to various participant characteristics comparable to the established processing speed test's sensitivity (e.g., similar standardized means by gender). Overall, results support the validity and reliability of people's average RTs to survey items as a digital biomarker for processing speed. The correlation between survey RTs (reverse scored) and the formal processing speed test was 0.61 (p

Hongxin Gao, Stefan Schneider, Raymond Hernandez, Jenny Harris, Danny Maupin, Doerte U Junghaenel, Arie Kapteyn, Arthur Stone, Elizabeth Zelinski, Erik Meijer, Pey-Jiuan Lee, Bart Orriens, Haomiao Jin (2024)Early Identification of Cognitive Impairment in Community Environments Through Modeling Subtle Inconsistencies in Questionnaire Responses: Machine Learning Model Development and Validation, In: JMIR formative research8

Background The underdiagnosis of cognitive impairment hinders timely intervention of dementia. Health professionals working in the community play a critical role in the early detection of cognitive impairment, yet still face several challenges such as a lack of suitable tools, necessary training, and potential stigmatization. Objective This study explored a novel application integrating psychometric methods with data science techniques to model subtle inconsistencies in questionnaire response data for early identification of cognitive impairment in community environments. Methods This study analyzed questionnaire response data from participants aged 50 years and older in the Health and Retirement Study (waves 8-9, n=12,942). Predictors included low-quality response indices generated using the graded response model from four brief questionnaires (optimism, hopelessness, purpose in life, and life satisfaction) assessing aspects of overall well-being, a focus of health professionals in communities. The primary and supplemental predicted outcomes were current cognitive impairment derived from a validated criterion and dementia or mortality in the next ten years. Seven predictive models were trained, and the performance of these models was evaluated and compared. Results The multilayer perceptron exhibited the best performance in predicting current cognitive impairment. In the selected four questionnaires, the area under curve values for identifying current cognitive impairment ranged from 0.63 to 0.66 and was improved to 0.71 to 0.74 when combining the low-quality response indices with age and gender for prediction. We set the threshold for assessing cognitive impairment risk in the tool based on the ratio of underdiagnosis costs to overdiagnosis costs, and a ratio of 4 was used as the default choice. Furthermore, the tool outperformed the efficiency of age or health-based screening strategies for identifying individuals at high risk for cognitive impairment, particularly in the 50- to 59-year and 60- to 69-year age groups. The tool is available on a portal website for the public to access freely. Conclusions We developed a novel prediction tool that integrates psychometric methods with data science to facilitate “passive or backend” cognitive impairment assessments in community settings, aiming to promote early cognitive impairment detection. This tool simplifies the cognitive impairment assessment process, making it more adaptable and reducing burdens. Our approach also presents a new perspective for using questionnaire data: leveraging, rather than dismissing, low-quality data.

Stefan Schneider, Raymond Hernandez, Doerte U. Junghaenel, Haomiao Jin, Pey-Jiuan Lee, Hongxin Gao, Danny Maupin, Bart Orriens, Erik Meijer, Arthur A. Stone (2024)Can you tell people's cognitive ability level from their response patterns in questionnaires?, In: Behavior Research Methods Springer

Questionnaires are ever present in survey research. In this study, we examined whether an indirect indicator of general cognitive ability could be developed based on response patterns in questionnaires. We drew on two established phenomena characterizing connections between cognitive ability and people’s performance on basic cognitive tasks, and examined whether they apply to questionnaires responses. (1) The worst performance rule (WPR) states that people’s worst performance on multiple sequential tasks is more indicative of their cognitive ability than their average or best performance. (2) The task complexity hypothesis (TCH) suggests that relationships between cognitive ability and performance increase with task complexity. We conceptualized items of a questionnaire as a series of cognitively demanding tasks. A graded response model was used to estimate respondents’ performance for each item based on the difference between the observed and model-predicted response (“response error” scores). Analyzing data from 102 items (21 questionnaires) collected from a large-scale nationally representative sample of people aged 50+ years, we found robust associations of cognitive ability with a person’s largest but not with their smallest response error scores (supporting the WPR), and stronger associations of cognitive ability with response errors for more complex than for less complex questions (supporting the TCH). Results replicated across two independent samples and six assessment waves. A latent variable of response errors estimated for the most complex items correlated .50 with a latent cognitive ability factor, suggesting that response patterns can be utilized to extract a rough indicator of general cognitive ability in survey research.

Raymond Hernandez, Haomiao Jin, Pey-Jiuan Lee, Stefan Schneider, Doerte U Junghaenel, Arthur A Stone, Erik Meijer, Hongxin Gao, Daniel James Maupin, Elizabeth M Zelinski (2024)Attrition from longitudinal ageing studies and performance across domains of cognitive functioning: an individual participant data meta-analysis, In: BMJ open14(3)e079241

This paper examined the magnitude of differences in performance across domains of cognitive functioning between participants who attrited from studies and those who did not, using data from longitudinal ageing studies where multiple cognitive tests were administered. Individual participant data meta-analysis. Data are from 10 epidemiological longitudinal studies on ageing (total n=209 518) from several Western countries (UK, USA, Mexico, etc). Each study had multiple waves of data (range of 2-17 waves), with multiple cognitive tests administered at each wave (range of 4-17 tests). Only waves with cognitive tests and information on participant dropout at the immediate next wave for adults aged 50 years or older were used in the meta-analysis. For each pair of consecutive study waves, we compared the difference in cognitive scores (Cohen's d) between participants who dropped out at the next study wave and those who remained. Note that our operationalisation of dropout was inclusive of all causes (eg, mortality). The proportion of participant dropout at each wave was also computed. The average proportion of dropouts between consecutive study waves was 0.26 (0.18 to 0.34). People who attrited were found to have significantly lower levels of cognitive functioning in all domains (at the wave 2-3 years before attrition) compared with those who did not attrit, with small-to-medium effect sizes (overall d=0.37 (0.30 to 0.43)). Older adults who attrited from longitudinal ageing studies had lower cognitive functioning (assessed at the timepoint before attrition) across all domains as compared with individuals who remained. Cognitive functioning differences may contribute to selection bias in longitudinal ageing studies, impeding accurate conclusions in developmental research. In addition, examining the functional capabilities of attriters may be valuable for determining whether attriters experience functional limitations requiring healthcare attention.

Hongxin Gao, Stefanie Kuenzel, Xiao-Yu Zhang (2022)A Hybrid ConvLSTM-Based Anomaly Detection Approach for Combating Energy Theft, In: IEEE Transactions on Instrumentation and Measurement712517110pp. 1-10 Institute of Electrical and Electronics Engineers (IEEE)

In a conventional power grid, energy theft is difficult to detect due to limited communication and data transition. The smart meter along with big data mining technology leads to significant technological innovation in the field of energy theft detection (ETD). This article proposes a convolutional long short-term memory (ConvLSTM)-based ETD model to identify electricity theft users. In this work, electricity consumption data are reshaped quarterly into a 2-D matrix and used as the sequential input to the ConvLSTM. The convolutional neural network (CNN) embedded into the long short-term memory (LSTM) can better learn the features of the data on different quarters, months, weeks, and days. Besides, the proposed model incorporates batch normalization. This technique allows the proposed ETD model to support raw format electricity consumption data input, reducing training time and increasing the efficiency of model deployment. The result of the case study shows that the proposed ConvLSTM model exhibits good robustness. It outperforms the multilayer perceptron (MLP) and CNN-LSTM in terms of performance metrics and model generalization capability. Moreover, the result also demonstrates that K -fold cross validation can improve the ETD prediction accuracy.