Dr Pedro Porto Buarque de Gusmão

Lecturer in Computer Science

PhD

p.gusmao@surrey.ac.uk

Academic and research departments

Computer Science Research Centre, School of Computer Science and Electronic Engineering.

Publications

Pedro Porto Buarque De Gusmao, Yasar Abbas Ur Rehman, Yan Gao, Jiajun Shen, Pedro Porto Buarque de Gusmao, Nicholas Lane (2022)Federated Self-supervised Learning for Video Understanding, In: COMPUTER VISION, ECCV 2022, PT XXXI13691pp. 506-522 Springer Nature

DOI: 10.1007/978-3-031-19821-2_29

The ubiquity of camera-enabled mobile devices has lead to large amounts of unlabelled video data being produced at the edge. Although various self-supervised learning (SSL) methods have been proposed to harvest their latent spatio-temporal representations for task-specific training, practical challenges including privacy concerns and communication costs prevent SSL from being deployed at large scales. To mitigate these issues, we propose the use of Federated Learning (FL) to the task of video SSL. In this work, we evaluate the performance of current state-of-the-art (SOTA) video-SSL techniques and identify their shortcomings when integrated into the large-scale FL setting simulated with kinetics-400 dataset. We follow by proposing a novel federated SSL framework for video, dubbed FedVSSL, that integrates different aggregation strategies and partial weight updating. Extensive experiments demonstrate the effectiveness and significance of FedVSSL as it outperforms the centralized SOTA for the downstream retrieval task by 6.66% on UCF-101 and 5.13% on HMDB-51.

Kin Wai Lau, Yasar Abbas Ur Rehman, Pedro Porto Buarque de Gusmão, Lai-Man Po, Lan Ma, Yuyang Xie (2025)FedRepOpt: Gradient Re-parametrized Optimizers in Federated Learning, In: Minsu Cho, Ivan Laptev, Du Tran, Angela Yao, Hongbin Zha (eds.), Computer Vision – ACCV 2024pp. 74-90 Springer Nature Singapore

DOI: 10.1007/978-981-96-0966-6_5

Federated Learning (FL) has emerged as a privacy-preserving method for training machine learning models in a distributed manner on edge devices. However, on-device models face inherent computational power and memory limitations, potentially resulting in constrained gradient updates. As the model’s size increases, the frequency of gradient updates on edge devices decreases, ultimately leading to suboptimal training outcomes during any particular FL round. This limits the feasibility of deploying advanced and large-scale models on edge devices, hindering the potential for performance enhancements. To address this issue, we propose FedRepOpt, a gradient re-parameterized optimizer for FL. The gradient re-parameterized method allows training a simple local model with a similar performance as a complex model by modifying the optimizer’s gradients according to a set of model-specific hyperparameters obtained from the complex models. In this work, we focus on VGG-style and Ghost-style models in the FL environment. Extensive experiments demonstrate that models using FedRepOpt obtain a significant boost in performance of 16.7%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$16.7\%$$\end{document} and 11.4%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$11.4\%$$\end{document} compared to the RepGhost-style and RepVGG-style networks, while also demonstrating a faster convergence time of 11.7%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$11.7\%$$\end{document} and 57.4%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$57.4\%$$\end{document} compared to their complex structure. Codes are available at https://github.com/StevenLauHKHK/FedRepOpt.

Julia Lambret Frotte, Pedro P. Buarque de Gusmao, Georgia Smith, Shuen-Fang Lo, Su-May Yu, Ross W. Hendron, Steven Kelly, Jane A. Langdale (2024)Increased chloroplast occupancy in bundle sheath cells of rice hap3H mutants revealed by Chloro-Count: a new deep learning-based tool, In: The New phytologist Wiley

DOI: 10.1111/nph.20332

There is an increasing demand to boost photosynthesis in rice to increase yield potential. Chloroplasts are the site of photosynthesis, and increasing their number and size is a potential route to elevate photosynthetic activity. Notably, bundle sheath cells do not make a significant contribution to overall carbon fixation in rice, and thus, various attempts are being made to increase chloroplast content specifically in this cell type. In this study, we developed and applied a deep learning tool, Chloro-Count, and used it to quantify chloroplast dimensions in bundle sheath cells of OsHAP3H gain- and loss-of-function mutants in rice. Loss of OsHAP3H increased chloroplast occupancy in bundle sheath cells by 50%. When grown in the field, mutants exhibited increased numbers of tillers and panicles. The implementation of Chloro-Count enabled precise quantification of chloroplasts in loss- and gain-of-function OsHAP3H mutants and facilitated a comparison between 2D and 3D quantification methods. Collectively, our observations revealed that a mechanism operates in bundle sheath cells to restrict chloroplast occupancy as cell dimensions increase. That mechanism is unperturbed in Oshap3H mutants but loss of OsHAP3H function leads to an increase in chloroplast numbers. The use of Chloro-Count also revealed that 2D quantification is compromised by the positioning of chloroplasts within the cell.

Yasar Abbas Ur Rehman, Yan Gao, Pedro Porto Buarque De Gusmao, Mina Alibeigi, Jiajun Shen, Nicholas D. Lane (2024)L-DAWA: Layer-wise Divergence Aware Weight Aggregation in Federated Self-Supervised Visual Representation Learning, In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV23)pp. 16418-16427 Institute of Electrical and Electronics Engineers (IEEE)

DOI: 10.1109/ICCV51070.2023.01509

The ubiquity of camera-enabled devices has led to large amounts of unlabeled image data being produced at the edge. The integration of self-supervised learning (SSL) and federated learning (FL) into one coherent system can potentially offer data privacy guarantees while also advancing the quality and robustness of the learned visual representations without needing to move data around. However, client bias and divergence during FL aggregation caused by data heterogeneity limits the performance of learned visual representations on downstream tasks. In this paper, we propose a new aggregation strategy termed Layer-wise Divergence Aware Weight Aggregation (L-DAWA) to mitigate the influence of client bias and divergence during FL aggregation. The proposed method aggregates weights at the layer-level according to the measure of angular divergence between the clients' model and the global model. Extensive experiments with cross-silo and cross-device settings on CIFAR-10/100 and Tiny ImageNet datasets demonstrate that our methods are effective and obtain new SOTA performance on both contrastive and non-contrastive SSL approaches.

Johan Wahlstrom, Manon Kok, Pedro Porto Buarque de Gusmao, Traian E. Abrudan, Niki Trigoni, Andrew Markham (2020)Sensor Fusion for Magneto-Inductive Navigation, In: IEEE sensors journal20(1)8844709pp. 386-396

DOI: 10.1109/JSEN.2019.2942451

Muhamad Risqi U. Saputra, Chris Xiaoxuan Lu, Pedro Porto B. Porto Buarque de Gusmao, Bing Wang, Andrew Markham, Niki Trigoni (2022)Graph-Based Thermal-Inertial SLAM With Probabilistic Neural Networks, In: IEEE transactions on robotics38(3)pp. 1875-1893 IEEE

DOI: 10.1109/TRO.2021.3120036

Simultaneous localization and mapping (SLAM) system typically employs vision-based sensors to observe the surrounding environment. However, the performance of such systems highly depends on the ambient illumination conditions. In scenarios with adverse visibility or in the presence of airborne particulates (e.g., smoke, dust, etc.), alternative modalities such as those based on thermal imaging and inertial sensors are more promising. In this article, we propose the first complete thermal-inertial SLAM system that combines neural abstraction in the SLAM front end with robust pose-graph optimization in the SLAM back end. We model the sensor abstraction in the front end by employing probabilistic deep learning parameterized by mixture density networks (MDNs). Our key strategies to successfully model this encoding from thermal imagery are the usage of normalized 14-b radiometric data, the incorporation of hallucinated visual (RGB) features, and the inclusion of feature selection to estimate the MDN parameters. To enable a full SLAM system, we also design an efficient global image descriptor that is able to detect loop closures from thermal embedding vectors. We performed extensive experiments and analysis using three datasets, namely self-collected ground robot and hand-held data taken in indoor environment, and one public dataset (SubT-tunnel) collected in underground tunnel. Finally, we demonstrate that an accurate thermal-inertial SLAM system can be realized in conditions of both benign and adverse visibility.

Johan Wahlstrom, Pedro Porto Buarque de Gusmao, Andrew Markham, Niki Trigoni, Pedro Porto Buarque De Gusmao (2019)Map-aided Navigation for Emergency Searches, In: 2019 15TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SENSOR SYSTEMS (DCOSS)8804785pp. 25-32 IEEE

DOI: 10.1109/DCOSS.2019.00027

Real-time positioning of emergency personnel has been an active research topic for many years. However, studies on how to improve navigation accuracy by using prior information on the idiosyncratic motion characteristics of firefighters are scarce. This paper presents an algorithm for generating pseudo observations of position and orientation based on standard search patterns used by firefighters. The iterative closest point algorithm is used to compare walking trajectories estimated from inertial odometry with search patterns generated from digital maps. The resulting fitting errors are then used to integrate the pseudo observations into a map-aided navigation filter. Specifically, we present a sequential Monte Carlo solution where the pattern comparison is used to both update particle weights and create new particle samples. Experimental results involving professional firefighters demonstrate that the proposed pseudo observations can achieve a stable localization error of about one meter, and offer increased robustness in the presence of map errors.

Alex Iacob, Pedro Porto Buarque Gusmão, Nicholas Lane (2023)Can Fair Federated Learning Reduce the need for Personalisation?, In: Proceedings of the 3rd Workshop on Machine Learning and Systemspp. 131-139 ACM

DOI: 10.1145/3578356.3592592

Federated Learning (FL) enables training ML models on edge clients without sharing data. However, the federated model's performance on local data varies, disincentivising the participation of clients who benefit little from FL. Fair FL reduces accuracy disparity by focusing on clients with higher losses while personalisation locally fine-tunes the model. Personalisation provides a participation incentive when an FL model underperforms relative to one trained locally. For situations where the federated model provides a lower accuracy than a model trained entirely locally by a client, personalisation improves the accuracy of the pre-trained federated weights to be similar to or exceed those of the local client model. This paper evaluates two Fair FL (FFL) algorithms as starting points for personalisation. Our results show that FFL provides no benefit to relative performance in a language task and may double the number of underperforming clients for an image task. Instead, we propose Personalisation-aware Federated Learning (PaFL) as a paradigm that pre-emptively uses personalisation losses during training. Our technique shows a 50% reduction in the number of underperforming clients for the language task while lowering the number of underperforming clients in the image task instead of doubling it. Thus, evidence indicates that it may allow a broader set of devices to benefit from FL and represents a promising avenue for future experimentation and theoretical analysis.

Yasin Almalioglu, Mehmet Turan, Muhamad Risqi U. Saputra, Pedro P.B. de Gusmão, Andrew Markham, Niki Trigoni (2022)SelfVIO: Self-supervised deep monocular Visual–Inertial Odometry and depth estimation, In: Neural networks150119pp. 119-136 Elsevier Ltd

DOI: 10.1016/j.neunet.2022.03.005

In the last decade, numerous supervised deep learning approaches have been proposed for visual–inertial odometry (VIO) and depth map estimation, which require large amounts of labelled data. To overcome the data limitation, self-supervised learning has emerged as a promising alternative that exploits constraints such as geometric and photometric consistency in the scene. In this study, we present a novel self-supervised deep learning-based VIO and depth map recovery approach (SelfVIO) using adversarial training and self-adaptive visual–inertial sensor fusion. SelfVIO learns the joint estimation of 6 degrees-of-freedom (6-DoF) ego-motion and a depth map of the scene from unlabelled monocular RGB image sequences and inertial measurement unit (IMU) readings. The proposed approach is able to perform VIO without requiring IMU intrinsic parameters and/or extrinsic calibration between IMU and the camera. We provide comprehensive quantitative and qualitative evaluations of the proposed framework and compare its performance with state-of-the-art VIO, VO, and visual simultaneous localization and mapping (VSLAM) approaches on the KITTI, EuRoC and Cityscapes datasets. Detailed comparisons prove that SelfVIO outperforms state-of-the-art VIO approaches in terms of pose estimation and depth recovery, making it a promising approach among existing methods in the literature.