James Ross

Postgraduate Research Student - Autonomous Vehicles

MEng

j.ross@surrey.ac.uk

Academic and research departments

Centre for Vision, Speech and Signal Processing (CVSSP), Faculty of Engineering and Physical Sciences.

About

My research project

Computer vision/deep learning for autonomous vehicle navigation and control

Developing novel methods and systems for autonomous vehicle navigation and control, using deep learning with monocular vision to enhance spatial reasoning and scene understanding in challenging domains.

Supervisors

Richard Bowden

Oscar Mendez Maldonado

My qualifications

13 July 2020

Master of Engineering with Honours in Aerospace Engineering (Class I)

University of Liverpool

Research

Research interests

Research focuses on the application of computer vision and deep learning to solve problems in autonomous robotics and transfer these techniques across multiple domains.

Research interests include:

Monocular Bird's Eye View (BEV) prediction
Simultaneous Localisation and Mapping (SLAM)
Generative Adversarial Networks (GANs), Simulation and Domain Transfer
Appearance transfer and conditional diffusion models

Publications

James Ross, Nimet Kaygusuz, Oscar Alejandro Mendez Maldonado, Richard Bowden Campus Map

Campus Map is an automotive dataset comprising 22 driving and parking sequences collected around the University of Surrey campus and its car park. 6 cameras provide 360° coverage around the vehicle. Trajectories have been reconstructed from 64-beam LiDAR and 5 Hz GPS.

James Ross, Oscar Mendez, Avishkar Saha, Mark Johnson, Richard Bowden (2023)BEV-SLAM: Building a Globally-Consistent World Map Using Monocular Vision, In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)pp. 3830-3836 Institute of Electrical and Electronics Engineers (IEEE)

DOI: 10.1109/IROS47612.2022.9981258

The ability to produce large-scale maps for nav-igation, path planning and other tasks is a crucial step for autonomous agents, but has always been challenging. In this work, we introduce BEV-SLAM, a novel type of graph-based SLAM that aligns semantically-segmented Bird's Eye View (BEV) predictions from monocular cameras. We introduce a novel form of occlusion reasoning into BEV estimation and demonstrate its importance to aid spatial aggregation of BEV predictions. The result is a versatile SLAM system that can operate across arbitrary multi-camera configurations and can be seamlessly integrated with other sensors. We show that the use of multiple cameras significantly increases performance, and achieves lower relative error than high-performance GPS. The resulting system is able to create large, dense, globally-consistent world maps from monocular cameras mounted around an ego vehicle. The maps are metric and correctly-scaled, making them suitable for downstream navigation tasks.