Xiaowei Gu

About

Areas of specialism

Machine Learning; Artificial Intelligence; Data Analytics; Signal Processing

My qualifications

2021
Fellow of HEA

Research

Research interests

Research projects

Supervision

Postgraduate research supervision

Publications

Highlights

My Google Scholar Page

My ResearchGate Page

My GitHub Page

Xiaowei Gu (2024)An Autonomous Centreless Approach to Chunk-Wise Data Partitioning, In: Evolving systems Springer

In this paper, a novel autonomous centreless algorithm is proposed for data partitioning. The proposed algorithm firstly constructs the nearest neighbour affinity graph and identifies the local peaks of data density to build micro-clusters. Unlike the vast majority of partitional clustering algorithms, the proposed algorithm does not rely on singleton prototypes, namely, centres or medoids of the micro-clusters to partition the data space. Instead, these micro-clusters are directly utilised to attract nearby data samples to form shape-free Voronoi tessellations, hence, being centreless and robust to noisy data. A fusion scheme is further implemented to fuse these data clouds with higher intra-cluster similarity together to attain a more compact partitioning of data. The proposed algorithm is able to perform data partitioning on a chunk-wise basis and is highly computationally efficient with the default distance measure. Therefore, it is suitable for both static data partitioning in offline scenarios and streaming data partitioning in online scenarios. Numerical examples on a variety of benchmark datasets demonstrate the efficacy of the proposed algorithm.

Xiaowei Gu, Gareth Howells, Haiyue Yuan (2024)A Soft Prototype-based Autonomous Fuzzy Inference System for Network Intrusion Detection, In: Information sciences Elsevier

Nowadays, cyber-attacks have become a common and persistent issue affecting various human activities in modern societies. Due to the continuously evolving landscape of cyber-attacks and the growing concerns around " black box " models, there has been a strong demand for novel explainable and interpretable intrusion detection systems with online learning abilities. In this paper, a novel soft prototype-based autonomous fuzzy inference system (SPAFIS) is proposed for network intrusion detection. SPAFIS learns from network traffic data streams online on a chunk-by-chunk basis and autonomously identifies a set of meaningful, human-interpretable soft prototypes to build an IF-THEN fuzzy rule base for classification. Thanks to the utilization of soft prototypes, SPAFIS can precisely capture the underlying data structure and local patterns, and perform internal reasoning and decision-making in a human-interpretable manner based on the ensemble properties and mutual distances of data. To maintain a healthy and compact knowledge base, a pruning scheme is further introduced to SPAFIS, allowing itself to periodically examine the learned solution and remove redundant soft prototypes from its knowledge base. Numerical examples on public network intrusion detection datasets demonstrated the efficacy of the proposed SPAFIS in both offline and online application scenarios, outperforming the state-of-the-art alternatives. Thanks to the rapid development in electronic manufacturing and information technology, the Internet has become an essential part of everyday life for billions of individuals in modern societies. The Internet has greatly transformed the way people communicate, network and access information. However, the ongoing digitalization in the world has also led to a significant rise in cyber-attacks. According to the Cyber Security Breaches Survey published by the UK government in April 2023 [1], 59% of medium businesses, 69% of large businesses and 56% of high-income charities have encountered cybersecurity breaches and/or cyber-attacks in the last 12 months. Nowadays, the escalating cyber-attacks have posed a major and persistent threat to individuals, businesses and organizations on the Internet. The need for effective techniques to protect information security is highly pronounced. Intrusion detection systems (IDSs) are one of the most effective security techniques to prevent cyber-attacks [2]. The function of an IDS is to monitor the network and identify malicious activities. Traditional IDSs are primarily based on signatures. Such IDSs utilize pattern matching methods to compare current activities against signatures of previous intrusions stored in the database [3]. Signature-based IDSs are highly effective in detecting known attacks, but they are unable to detect novel attacks because of the lack of matching signature in the database. As the technological evolution of cybercrime has made cyber-attacks more sophisticated and difficult to detect, traditional signature-based IDSs have become insufficient in real-world scenarios [4]. Machine learning techniques are capable of learning normal and malicious patterns from empirically observed network activities to constructing accurate predictive models with less human involvement [4]. Conventional machine learning methods, such as decision tree (DT) [5], random forest (RF) [6], support vector machine (SVM) [7], k-nearest neighbour (KNN) [8], etc., have been extensively used for identifying cyber-attacks. IDSs based on conventional machine learning have achieved many successes, but they generally struggle with large-scale, complex intrusion detection problems [9]. Due to the evolving landscape of cyber-attacks, characterized by the increasing sophistication and complexity, there has been a rapidly growing demand for IDSs that leverage more advanced machine learning techniques.

Zhen Mei, Tao Zhao, Xiaowei Gu (2024)A Dynamic Evolving Fuzzy System for Streaming Data Prediction, In: IEEE Transactions on Fuzzy Systems(8) Institute of Electrical and Electronics Engineers (IEEE)

This paper proposes a dynamic evolving fuzzy system (DEFS) for streaming data prediction. DEFS utilises the enhanced data potential and prediction errors of individual local models as the main criteria for fuzzy rule generation. A vital feature of the proposed system is its novel rule merging scheme that can self-adjust its tolerance towards the degree of similarity between two similar fuzzy rules according to the size of the rule base. To better handle the shifts and drifts in the data patterns, a novel rule quality measure based on both the utility values and the prediction accuracy of individual fuzzy rules is further introduced to help DEFS identify these less activated fuzzy rules with poorer descriptive capabilities and, thereby, maintaining a healthier fuzzy rule base by removing these stale rules. Very importantly, the thresholds used by DEFS are self-adaptive towards the input data. The adaptive thresholds can help DEFS to precisely capture the underlying structure and dynamically changing patterns of streaming data, enabling the system performing accurate approximation reasoning. Numerical examples based on several popular benchmark problems show the superior performance of DEFS over the state-of-the-art evolving fuzzy systems. The prediction performance of the proposed method is at least 2.88% better than the best-performing comparative EFSs on each individual regression benchmark problem considered in this study, and the average performance improvement across all the numerical experiments is approximately 30%.

Muhammad Yunus Bin Iqbal Basheer, Azliza Mohd Ali, Nurzeatul Hamimah Abdul Hamid, Muhammad Azizi Mohd Ariffin, Rozianawaty Osman, Sharifalillah Nordin, Xiaowei Gu (2024)Autonomous Anomaly Detection for Streaming Data, In: Knowledge-Based Systems284111235 Elsevier

Anomaly detection from data streams is a hotly studied topic in the machine learning domain. It is widely considered a challenging task because the underlying patterns exhibited by the streaming data may dynamically change at any time. In this paper, a new algorithm is proposed to detect anomalies autonomously for streaming data. The proposed algorithm is nonparametric and does not require any threshold to be preset by users. The algorithmic procedure of the proposed algorithm is composed of the following three complementary stages. Firstly, the potentially anomalous samples that represent highly different patterns from others are identified from data streams based on data density. Then, these potentially anomalous samples are clustered online using the evolving autonomous data partitioning algorithm. Finally, true anomalies are identified from these minor clusters with the least amounts of samples associated with them. Numerical examples based on three benchmark datasets demonstrated the potential of the proposed algorithm as a highly effective approach for anomaly detection from data streams.

Hongxing Cui, Danling Tang, Huizeng Liu, Hongbin Liu, Yi Sui, Yangchen Lai, Xiaowei Gu (2024)Modeling Ocean Cooling Induced by Tropical Cyclone Wind Pump Using Explainable Machine Learning Framework, In: IEEE Transactions on Geoscience and Remote Sensing624202317pp. 1-17 Institute of Electrical and Electronics Engineers (IEEE)

Tropical cyclones (TCs), with an intensive wind pump impact, induce sea surface temperature cooling (SSTC) on the upper ocean. SSTC is a pronounced indicator to reveal TC evolution and oceanic conditions. However, there are few effective methods for accurately approximating the amplitude of the spatial structure of TC-induced SSTC. This study proposes a novel explainable machine learning framework to model and interpret the amplitude of the spatial structure of SSTC over the northwest Pacific (NWP). In particular, 12 predictors related to TC characteristics and pre-storm ocean states are considered as inputs. A composite analysis technique is used to characterize the amplitude of the spatial structure of SSTC across the TC track. Extreme gradient boosting (XGBoost) is utilized to predict the amplitude of SSTC from the 12 predictors. To better interpret the ocean-atmosphere interaction, a SHapely Additive explanations (SHAP) method is further employed to identify the contributions of predictors in determining the amplitude of the TC-induced SSTC, bringing the attribute-oriented explainability to the proposed method. The results showed that the proposed method could accurately predict the amplitude of the spatial structure of SSTC for different TC intensity groups and outperforms a numerical model. The proposed method also serves as an effective tool for reconstructing composite maps of both interannual and seasonal evolutions of SSTC spatial structure. The study offers insight into applying machine learning to model and interpret the responses of oceanic conditions triggered by extreme weather conditions (e.g., TCs).

Xiaowei Gu, Plamen P. Angelov, Qiang Shen (2024)Semi-Supervised Fuzzily Weighted Adaptive Boosting for Classification, In: IEEE Transactions on Fuzzy Systems42(4)pp. 2318-2330 Institute of Electrical and Electronics Engineers (IEEE)

Fuzzy systems offer a formal and practically popular methodology for modelling nonlinear problems with inherent uncertainties, entailing strong performance and model interpretability. Particularly, semi-supervised boosting is widely recognised as a powerful approach for creating stronger ensemble classification models in the absence of sufficient labelled data without introducing any modification to the employed base classifiers. However, the potential of fuzzy systems in semi-supervised boosting has not been systematically explored yet. In this study, a novel semi-supervised boosting algorithm devised for zero-order evolving fuzzy systems is proposed. It ensures both the consistence amongst predictions made by individual base classifiers at successive boosting iterations and the respective levels of confidence towards their predictions throughout the process of sample weight updating and ensemble output generation. In so doing, the base classifiers are empowered to gradually focus more on challenging samples that are otherwise hard to generalise, enabling the development of more precise integrated classification boundaries. Numerical evaluations on a range of benchmark problems are carried out, demonstrating the efficacy of the proposed semi-supervised boosting algorithm for constructing ensemble fuzzy classifiers with high accuracy.