Dr Jean-Yves Guillemaut


Senior Lecturer in 3D Computer Vision
MEng (hons), PhD, MIEEE, MBMVA, FHEA
+44 (0)1483 686042
32 BA 00

About

Areas of specialism

3D Computer Vision; 3D Reconstruction; Computational Photography; Virtual and Augmented Reality; Lightfield Imaging; 3D Video; Artificial Intelligence

University roles and responsibilities

  • Senior Lecturer in 3D Computer Vision
  • CVSSP Postgraduate Research Director
  • Department Prizes Officer
  • Professional Training Tutor
  • MSc Personal Tutor

    My qualifications

    2014
    Graduate Certificate in Learning and Teaching
    University of Surrey
    2005
    PhD degree in 3D Computer Vision
    University of Surrey
    2001
    MEng degree (first class honours) with specialisation in Automatic Control and Robotics
    Ecole Centrale de Nantes

    Previous roles

    2012 - 2018
    Lecturer in 3D Computer Vision
    University of Surrey
    2012 - 2017
    CVSSP External Seminar Organiser
    University of Surrey
    2005 - 2012
    Research Fellow
    University of Surrey

    Research

    Research interests

    Research projects

    Research collaborations

    Indicators of esteem

    • Best Poster Award at European Conference on Visual Media Production (CVMP 2016)

    • Best Student Paper Award at Int. Conference on Computer Vision Theory and Applications (VISAPP 2014)

    • University of Surrey Faculty of Engineering and Physical Sciences Researcher of the Year Award (2012)

    • Honorable Mention for the Best Paper Award at ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2012

    • Best Poster Prize at EPSRC/BMVA Summer School on Computer Vision 2002

    Supervision

    Postgraduate research supervision

    Completed postgraduate research projects I have supervised

    Teaching

    Publications

    GIANMARCO ADDARI, JEAN-YVES GUILLEMAUT (2023)Full 3D Helmholtz Stereopsis Datasets, In: A Family of Approaches for Full 3D Reconstruction of Objects with Complex Surface Reflectance Centre for Vision, Speech and Signal Processing

    These are the datasets presented in the paper “A Family of Approaches for Full 3D Reconstruction of Objects with Complex Surface Reflectance”. The datasets are intended for full 3D scene reconstruction using Helmholtz stereopsis. The datasets contain two synthetic scenes (“Bunny” and “Armadillo”) rendered using POV-Ray and six real scenes (“Bee”, “Fox”, “Corgi”, “Duck”, “Llama” and “Giraffe”) captured using a pair of Canon EOS 5DS cameras fitted with Canon Macro Ring Lite MR-14 EX II flashes. The models used to render the synthetic scenes are the Stanford Bunny and Armadillo available at: http://graphics.stanford.edu/data/3Dscanrep/. All scenes are imaged under reciprocal conditions to allow scene reconstruction using Helmholtz stereopsis. The datasets also include calibration, segmentation and, for some of the scenes, ground truth information to allow benchmarking of the reconstruction algorithms. See dataset landing page for further details.

    Farshad Einabadi, Jean-Yves Guillemaut, Adrian Hilton (2024)Learning Self-Shadowing for Clothed Human Bodies, In: Eurographics Symposium on Rendering The Eurographics Association

    This paper proposes to learn self-shadowing on full-body, clothed human postures from monocular colour image input, by supervising a deep neural model. The proposed approach implicitly learns the articulated body shape in order to generate self-shadow maps without seeking to reconstruct explicitly or estimate parametric 3D body geometry. Furthermore, it is generalisable to different people without per-subject pre-training, and has fast inference timings. The proposed neural model is trained on self-shadow maps rendered from 3D scans of real people for various light directions. Inference of shadow maps for a given illumination is performed from only 2D image input. Quantitative and qualitative experiments demonstrate comparable results to the state of the art whilst being monocular and achieving a considerably faster inference time. We provide ablations of our methodology and further show how the inferred self-shadow maps can benefit monocular full-body human relighting.

    Yue Zhang, AKIN CALISKAN, Adrian Douglas Mark Hilton, Jean-Yves Guillemaut Multi-View Labelling (MVL) Dataset University of Surrey

    To overcome the shortage of real-world multi-view multiple people, we introduce a new synthetic multi-view multiple people labelling dataset named Multi-View 3D Humans (MV3DHumans). This dataset is a large-scale synthetic image dataset that was generated for multi-view multiple people detection, labelling and segmentation tasks. The MV3DHumans dataset contains 1200 scenes captured by multiple cameras, with 4, 6, 8 or 10 people in each scene. Each scene is captured by 16 cameras with overlapping field of views. The MV3DHumans dataset provides RGB images with resolution of 640 × 480. Ground truth annotations including bounding boxes, instance masks and multi-view correspondences, as well as camera calibrations are provided in the dataset.

    M Klaudiny, M Tejera, C Malleson, J-Y Guillemaut, A Hilton (2020)SCENE Digital Cinema Datasets University of Surrey
    MATTHEW JAMES BAILEY, Adrian Hilton, Jean-Yves Guillemaut Finite Aperture Stereo Datasets, In: Finite Aperture Stereo: 3D Reconstruction of Macro-Scale Scenes CVSSP

    This landing page contains the datasets presented in the paper "Finite Aperture Stereo". The datasets are intended for defocus-based 3D reconstruction and analysis. Each download link contains images of a static scene, captured from multiple viewpoints and with different focus settings. The captured objects exhibit a range of reflectance properties and are physically small in scale. Calibration images are also available. A CC BY-NC licence is in effect. Use of this data must be for non-commercial research purposes. Acknowledgement must be given to the original authors by referencing the dataset DOI, the dataset web address, and the aforementioned publication. Re-distribution of this data is prohibited. Before downloading, you must agree with these conditions as presented on the dataset webpage.

    Farshad Einabadi, JEAN-YVES GUILLEMAUT, Adrian Hilton 3D Virtual Human Shadow (3DVHshadow), In: Learning Projective Shadow Textures for Neural Rendering of Human Cast Shadows from Silhouettes Centre for Vision, Speech and Signal Processing (CVSSP)

    3DVHshadow contains images of diverse synthetic humans generated to evaluate the performance of cast hard shadow algorithms for humans. Each dataset entry includes (a) a rendering of the subject from the camera view point, (b) its binary segmentation mask, and (c) its binary cast shadow mask on a planar surface -- in total 3 images. The respective rendering metadata such as point light source position, camera pose, camera calibration, etc. is also provided alongside the images. Please refer to the corresponding publication for details of the dataset generation.

    Nadejda Roubtsova, Jean-Yves Guillemaut (2020)Helmholtz Stereopsis Synthetic Dataset University of Surrey
    Jack Oliver Hilliard, Adrian Hilton, Jean-Yves Guillemaut (2023)HDR Illumination Outpainting with a Two-Stage GAN Model, In: Marco Volino, Armin Mustafa, Peter Vangorp (eds.), Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production1pp. 1-9 ACM

    In this paper we present a method for single-view illumination estimation of indoor scenes, using image-based lighting, that incorporates state-of-the-art outpainting methods. Recent advancements in illumination estimation have focused on improving the detail of the generated environment map so it can realistically light mirror reflective surfaces. These generated maps often include artefacts at the borders of the image where the panorama wraps around. In this work we make the key observation that inferring the panoramic HDR illumination of a scene from a limited field of view LDR input can be framed as an outpainting problem (whereby the original image must be expanded beyond its original borders). We incorporate two key techniques used in outpainting tasks: i) separating the generation into multiple networks (a diffuse lighting network and a high-frequency detail network) to reduce the amount to be learnt by a single network, ii) utilising an inside-out method of processing the input image to reduce the border artefacts. Further to incorporating these outpainting methods we also introduce circular padding before the network to help remove the border artefacts. Results show the proposed approach is able to relight diffuse, specular and mirror surfaces more accurately than existing methods in terms of the position of the light sources and pixelwise accuracy, whilst also reducing the artefacts produced at the borders of the panorama.

    Daniel La'ah Ayuba, Jean-Yves Guillemaut, Belen Marti-Cardona, Oscar Mendez (2024)HyperKon: A Self-Supervised Contrastive Network for Hyperspectral Image Analysis, In: Remote sensing (Basel, Switzerland)16(18)3399 Mdpi

    The use of a pretrained image classification model (trained on cats and dogs, for example) as a perceptual loss function for hyperspectral super-resolution and pansharpening tasks is surprisingly effective. However, RGB-based networks do not take full advantage of the spectral information in hyperspectral data. This inspired the creation of HyperKon, a dedicated hyperspectral Convolutional Neural Network backbone built with self-supervised contrastive representation learning. HyperKon uniquely leverages the high spectral continuity, range, and resolution of hyperspectral data through a spectral attention mechanism. We also perform a thorough ablation study on different kinds of layers, showing their performance in understanding hyperspectral layers. Notably, HyperKon achieves a remarkable 98% Top-1 retrieval accuracy and surpasses traditional RGB-trained backbones in both pansharpening and image classification tasks. These results highlight the potential of hyperspectral-native backbones and herald a paradigm shift in hyperspectral image analysis.

    Marco Volino, Armin Mustafa, Jean-Yves Guillemaut, Adrian Hilton (2019)Light Field Compression using Eigen Textures, In: 2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019)pp. 482-490 IEEE

    Light fields are becoming an increasingly popular method of digital content production for visual effects and virtual/augmented reality as they capture a view dependent representation enabling photo realistic rendering over a range of viewpoints. Light field video is generally captured using arrays of cameras resulting in tens to hundreds of images of a scene at each time instance. An open problem is how to efficiently represent the data preserving the view-dependent detail of the surface in such a way that is compact to store and efficient to render. In this paper we show that constructing an Eigen texture basis representation from the light field using an approximate 3D surface reconstruction as a geometric proxy provides a compact representation that maintains view-dependent realism. We demonstrate that the proposed method is able to reduce storage requirements by > 95% while maintaining the visual quality of the captured data. An efficient view-dependent rendering technique is also proposed which is performed in eigen space allowing smooth continuous viewpoint interpolation through the light field.

    J-Y Guillemaut, J Illingworth (2008)The normalised image of the absolute conic and its application for zooming camera calibration, In: PATTERN RECOGNITION41(12)pp. 3624-3635 PERGAMON-ELSEVIER SCIENCE LTD
    Ye Ling, David M. Frohlich, Tom H. Williamson, Jean-Yves Guillemaut (2023)A toolkit of approaches for digital mapping and correction of visual distortion Association for Computing Machinery (ACM)

    Visual distortion, known as metamorphopsia, is a serious visual deficit with no effective clinical treatment and which cannot be corrected by traditional optical glasses. In this paper, we introduce a toolkit of approaches for digitally mapping and correcting visual distortion, that might eventually be incorporated in a low vision aid VR headset. We describe three different approaches spanning data-driven and generative designs and leveraging either uniocular or binocular cues. We present our proposed demonstrator, our evaluation roadmap, and challenges for the field. Initial tests with simulated data demonstrate the effectiveness of the approach. Once clinically validated, we hope these approaches will enable accurate mapping of visual distortion and eventually lead to the development of ‘digital glasses’ capable of correcting the effects of metamorphopsia and restoring healthy vision.

    Yue Zhang, Akin Caliskan, Adrian Hilton, Jean-Yves Guillemaut (2021)A Novel Multi-View Labelling Network Based on Pairwise Learning, In: 2021 IEEE International Conference on Image Processing (ICIP)2021-pp. 3682-3686 IEEE

    Correct labelling of multiple people from different viewpoints in complex scenes is a challenging task due to occlusions, visual ambiguities, as well as variations in appearance and illumination. In recent years, deep learning approaches have proved very successful at improving the performance of a wide range of recognition and labelling tasks such as person re-identification and video tracking. However, to date, applications to multi-view tasks have proved more challenging due to the lack of suitably labelled multi-view datasets, which are difficult to collect and annotate. The contributions of this paper are two-fold. First, a synthetic dataset is generated by combining 3D human models and panoramas along with human poses and appearance detail rendering to overcome the shortage of real dataset for multi-view labelling. Second, a novel framework named Multi-View Labelling network (MVL-net) is introduced to leverage the new dataset and unify the multi-view multiple people detection, segmentation and labelling tasks in complex scenes. To the best of our knowledge, this is the first work using deep learning to train a multi-view labelling network. Experiments conducted on both synthetic and real datasets demonstrate that the proposed method outperforms the existing state-of-the-art approaches.

    Michaela Spiteri, Jean-Yves Guillemaut, David Windridge, Shivaram Avula, Ram Kumar, Emma Lewis (2019)Fully-Automated Identification of Imaging Biomarkers for Post-Operative Cerebellar Mutism Syndrome Using Longitudinal Paediatric MRI, In: Neuroinformatics18(1)pp. 151-162 Springer US

    Post-operative cerebellar mutism syndrome (POPCMS) in children is a post- surgical complication which occurs following the resection of tumors within the brain stem and cerebellum. High resolution brain magnetic resonance (MR) images acquired at multiple time points across a patient’s treatment allow the quantification of localized changes caused by the progression of this syndrome. However, MR images are not necessarily acquired at regular intervals throughout treatment and are often not volumetric. This restricts the analysis to 2D space and causes difficulty in intra- and inter-subject comparison. To address these challenges, we have developed an automated image processing and analysis pipeline. Multi-slice 2D MR image slices are interpolated in space and time to produce a 4D volumetric MR image dataset providing a longitudinal representation of the cerebellum and brain stem at specific time points across treatment. The deformations within the brain over time are represented using a novel metric known as the Jacobian of deformations determinant. This metric, together with the changing grey-level intensity of areas within the brain over time, are analyzed using machine learning techniques in order to identify biomarkers that correspond with the development of POPCMS following tumor resection. This study makes use of a fully automated approach which is not hypothesis-driven. As a result, we were able to automatically detect six potential biomarkers that are related to the development of POPCMS following tumor resection in the posterior fossa.

    Timothy H M Fung, Neville C R A John, Jean-Yves Guillemaut, David Yorston, David Frohlich, David H W Steel, Tom H Williamson, (2023)Artificial intelligence using deep learning to predict the anatomical outcome of rhegmatogenous retinal detachment surgery: a pilot study, In: Graefe's archive for clinical and experimental ophthalmology261(3)pp. 715-721

    To develop and evaluate an automated deep learning model to predict the anatomical outcome of rhegmatogenous retinal detachment (RRD) surgery. Six thousand six hundred and sixty-one digital images of RRD treated by vitrectomy and internal tamponade were collected from the British and Eire Association of Vitreoretinal Surgeons database. Each image was classified as a primary surgical success or a primary surgical failure. The synthetic minority over-sampling technique was used to address class imbalance. We adopted the state-of-the-art deep convolutional neural network architecture Inception v3 to train, validate, and test deep learning models to predict the anatomical outcome of RRD surgery. The area under the curve (AUC), sensitivity, and specificity for predicting the outcome of RRD surgery was calculated for the best predictive deep learning model. The deep learning model was able to predict the anatomical outcome of RRD surgery with an AUC of 0.94, with a corresponding sensitivity of 73.3% and a specificity of 96%. A deep learning model is capable of accurately predicting the anatomical outcome of RRD surgery. This fully automated model has potential application in surgical care of patients with RRD.

    H Kim, Jean-Yves Guillemaut, T Takai, M Sarim, A Hilton (2012)Outdoor Dynamic 3D Scene Reconstruction, In: H Gharavi (eds.), IEEE Transactions on Circuits and Systems for Video Technology22(11)pp. 1611-1622 IEEE

    Existing systems for 3D reconstruction from multiple view video use controlled indoor environments with uniform illumination and backgrounds to allow accurate segmentation of dynamic foreground objects. In this paper we present a portable system for 3D reconstruction of dynamic outdoor scenes which require relatively large capture volumes with complex backgrounds and non-uniform illumination. This is motivated by the demand for 3D reconstruction of natural outdoor scenes to support film and broadcast production. Limitations of existing multiple view 3D reconstruction techniques for use in outdoor scenes are identified. Outdoor 3D scene reconstruction is performed in three stages: (1) 3D background scene modelling using spherical stereo image capture; (2) multiple view segmentation of dynamic foreground objects by simultaneous video matting across multiple views; and (3) robust 3D foreground reconstruction and multiple view segmentation refinement in the presence of segmentation and calibration errors. Evaluation is performed on several outdoor productions with complex dynamic scenes including people and animals. Results demonstrate that the proposed approach overcomes limitations of previous indoor multiple view reconstruction approaches enabling high-quality free-viewpoint rendering and 3D reference models for production.

    Jean-Yves Guillemaut, J Kilner, J Starck, Adrian Hilton (2007)Dynamic feathering: Minimising blending artefacts in view-dependent rendering, In: IET Conference Publications534(534 CP)

    Conventional view-dependent texture mapping techniques produce composite images by blending subsets of input images, weighted according to their relative influence at the rendering viewpoint, over regions where the views overlap. Geometric or camera calibration errors often result in a los s of detail due to blurring or double exposure artefacts which tends to be exacerbated by the number of blending views considered. We propose a novel view-dependent rendering technique which optimises the blend region dynamically at rendering time, and reduces the adverse effects of camera calibration or geometric errors otherwise observed. The technique has been successfully integrated in a rendering pipeline which operates at interactive frame rates. Improvement over state-of-the-art view-dependent texture mapping techniques are illustrated on a synthetic scene as well as real imagery of a large scale outdoor scene where large camera calibration and geometric errors are present.

    Matthew James Bailey, Adrian Douglas Mark Hilton, Jean-Yves Guillemaut (2022)Finite Aperture Stereo, In: Finite Aperture Stereo Datasets Springer Nature

    Multi-view stereo remains a popular choice when recovering 3D geometry, despite performance varying dramatically according to the scene content. Moreover, typical pinhole camera assumptions fail in the presence of shallow depth of field inherent to macro-scale scenes; limiting application to larger scenes with diffuse reflectance. However, the presence of defocus blur can itself be considered a useful reconstruction cue, particularly in the presence of view-dependent materials. With this in mind, we explore the complimentary nature of stereo and defocus cues in the context of multi-view 3D reconstruction; and propose a complete pipeline for scene modelling from a finite aperature camera that encompasses image formation, camera calibration and reconstruction stages. As part of our evaluation, an ablation study reveals how each cue contributes to the higher performance observed over a range of complex materials and geometries. Though of lesser concern with large apertures, the effects of image noise are also considered. By introducing pre-trained deep feature extraction into our cost function, we show a step improvement over per-pixel comparisons; as well as verify the cross-domain applicability of networks using largely in-focus training data applied to defocused images. Finally, we compare to a number of modern multi-view stereo methods, and demonstrate how the use of both cues leads to a significant increase in performance across several synthetic and real datasets.

    M Sarim, A Hilton, J Guillemaut (2009)Non-parametric patch based video matting

    In computer vision, matting is the process of accurate foreground estimation in images and videos. In this paper we presents a novel patch based approach to video matting relying on non-parametric statistics to represent image variations in appearance. This overcomes the limitation of parametric algorithms which only rely on strong colour correlation between the nearby pixels. Initially we construct a clean background by utilising the foreground object’s movement across the background. For a given frame, a trimap is constructed using the background and the last frame’s trimap. A patch-based approach is used to estimate the foreground colour for every unknown pixel and finally the alpha matte is extracted. Quantitative evaluation shows that the technique performs better, in terms of the accuracy and the required user interaction, than the current state-of-the-art parametric approaches.

    Evren Imre, Jean-Yves Guillemaut, Adrian Hilton (2012)Through-the-Lens Multi-Camera Synchronisation and Frame-Drop Detection for 3D Reconstruction, In: SECOND JOINT 3DIM/3DPVT CONFERENCE: 3D IMAGING, MODELING, PROCESSING, VISUALIZATION & TRANSMISSION (3DIMPVT 2012)pp. 395-402 IEEE

    Synchronisation is an essential requirement for multiview 3D reconstruction of dynamic scenes. However, the use of HD cameras and large set-ups put a considerable stress on hardware and cause frame drops, which is usually detected by manually verifying very large amounts of data. This paper improves [9], and extends it with frame-drop detection capability. In order to spot frame-drop events, the algorithm fits a broken line to the frame index correspondences for each camera pair, and then fuses the pairwise drop hypotheses into a consistent, absolute frame-drop estimate. The success and the practical utility of the the improved pipeline is demonstrated through a number of experiments, including 3D reconstruction and free-viewpoint video rendering tasks.

    A Mustafa, H Kim, J-Y Guillemaut, ADM Hilton (2016)Temporally coherent 4D reconstruction of complex dynamic scenes, In: CVPR 2016 Proceedings

    This paper presents an approach for reconstruction of 4D temporally coherent models of complex dynamic scenes. No prior knowledge is required of scene structure or camera calibration allowing reconstruction from multiple moving cameras. Sparse-to-dense temporal correspondence is integrated with joint multi-view segmentation and reconstruction to obtain a complete 4D representation of static and dynamic objects. Temporal coherence is exploited to overcome visual ambiguities resulting in improved reconstruction of complex scenes. Robust joint segmentation and reconstruction of dynamic objects is achieved by introducing a geodesic star convexity constraint. Comparative evaluation is performed on a variety of unstructured indoor and outdoor dynamic scenes with hand-held cameras and multiple people. This demonstrates reconstruction of complete temporally coherent 4D scene models with improved nonrigid object segmentation and shape reconstruction.

    D Casas, M Tejera, Jean-Yves Guillemaut, A Hilton (2012)4D parametric motion graphs for interactive animation, In: I3D '12 Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Gamespp. 103-110 ACM

    A 4D parametric motion graph representation is presented for interactive animation from actor performance capture in a multiple camera studio. The representation is based on a 4D model database of temporally aligned mesh sequence reconstructions for multiple motions. High-level movement controls such as speed and direction are achieved by blending multiple mesh sequences of related motions. A real-time mesh sequence blending approach is introduced which combines the realistic deformation of previous non-linear solutions with efficient online computation. Transitions between different parametric motion spaces are evaluated in real-time based on surface shape and motion similarity. 4D parametric motion graphs allow real-time interactive character animation while preserving the natural dynamics of the captured performance. © 2012 ACM.

    Gianmarco Addari, Jean-Yves Guillemaut (2019)Towards Globally Optimal full 3D reconstruction of scenes with complex reflectance using Helmholtz Stereopsis, In: CVMP '19: Proceedings of the 16th ACM SIGGRAPH European Conference on Visual Media Production8pp. 1-10 Association for Computing Machinery (ACM)

    Many 3D reconstruction techniques are based on the assumption of prior knowledge of the object's surface reflectance, which severely restricts the scope of scenes that can be reconstructed. In contrast, Helmholtz Stereopsis (HS) employs Helmholtz Reciprocity to compute the scene geometry regardless of its Bidirectional Reflectance Distribution Function (BRDF). Despite this advantage, most HS implementations to date have been limited to 2.5D reconstruction, with the few extensions to full 3D being generally limited to a local refinement due to the nature of the optimisers they rely on. In this paper, we propose a novel approach to full 3D HS based on Markov Random Field (MRF) optimisation. After defining a solution space that contains the surface of the object, the energy function to be minimised is computed based on the HS quality measure and a normal consistency term computed across neighbouring surface points. This new method offers several key advantages with respect to previous work: the optimisation is performed globally instead of locally; a more discriminative energy function is used, allowing for better and faster convergence; a novel visibility handling approach to take advantage of Helmholtz reciprocity is proposed; and surface integration is performed implicitly as part of the optimisation process, thereby avoiding the need for an additional step. The approach is evaluated on both synthetic and real scenes, with an analysis of the sensitivity to input noise performed in the synthetic case. Accurate results are obtained on both types of scenes. Further, experimental results indicate that the proposed approach significantly outperforms previous work in terms of geometric and normal accuracy.

    Lewis Bridgeman, Jean-Yves Guillemaut, Adrian Hilton (2021)Dynamic Appearance Modelling from Minimal Cameras, In: 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021pp. 1760-1769 IEEE

    We present a novel method for modelling dynamic texture appearance from a minimal set of cameras. Previous methods to capture the dynamic appearance of a human from multi-view video have relied on large, expensive camera setups, and typically store texture on a frame-by-frame basis. We fit a parameterised human body model to multi-view video from minimal cameras (as few as 3), and combine the partial texture observations from multiple viewpoints and frames in a learned framework to generate full-body textures with dynamic details given an input pose. Key to our method are our multi-band loss functions, which apply separate blending functions to the high and low spatial frequencies to reduce texture artefacts. We evaluate our method on a range of multi-view datasets, and show that our model is able to accurately produce full-body dynamic textures, even with only partial camera coverage. We demonstrate that our method outperforms other texture generation methods on minimal camera setups.

    Lewis Bridgeman, Marco Volino, Jean-Yves Guillemaut, Adrian Hilton (2019)Multi-Person 3D Pose Estimation and Tracking in Sports, In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)2019-pp. 2487-2496 IEEE

    We present an approach to multi-person 3D pose estimation and tracking from multi-view video. Following independent 2D pose detection in each view, we: (1) correct errors in the output of the pose detector; (2) apply a fast greedy algorithm for associating 2D pose detections between camera views; and (3) use the associated poses to generate and track 3D skeletons. Previous methods for estimating skeletons of multiple people suffer long processing times or rely on appearance cues, reducing their applicability to sports. Our approach to associating poses between views works by seeking the best correspondences first in a greedy fashion, while reasoning about the cyclic nature of correspondences to constrain the search. The associated poses can be used to generate 3D skeletons, which we produce via robust triangulation. Our method can track 3D skeletons in the presence of missing detections, substantial occlusions, and large calibration error. We believe ours is the first method for full-body 3D pose estimation and tracking of multiple players in highly dynamic sports scenes. The proposed method achieves a significant improvement in speed over state-of-the-art methods.

    Gianmarco Addari, Jean-Yves Guillemaut (2019)An MRF Optimisation Framework for Full 3D Helmholtz Steropsis Institute for Systems and Technologies of Information, Control and Communication (INSTICC)

    Accurate 3D modelling of real world objects is essential in many applications such as digital film production and cultural heritage preservation. However, current modelling techniques rely on assumptions to constrain the problem, effectively limiting the categories of scenes that can be reconstructed. A common assumption is that the scene’s surface reflectance is Lambertian or known a priori. These constraints rarely hold true in practice and result in inaccurate reconstructions. Helmholtz Stereopsis (HS) addresses this limitation by introducing a reflectance agnostic modelling constraint, but prior work in this area has been predominantly limited to 2.5D reconstruction, providing only a partial model of the scene. In contrast, this paper introduces the first Markov Random Field (MRF) optimisation framework for full 3D HS. First, an initial reconstruction is obtained by performing 2.5D MRF optimisation with visibility constraints from multiple viewpoints and fusing the different outputs. Then, a refined 3D model is obtained through volumetric MRF optimisation using a tailored Iterative Conditional Modes (ICM) algorithm. The proposed approach is evaluated with both synthetic and real data. Results show that the proposed full 3D optimisation significantly increases both geometric and normal accuracy, being able to achieve sub-millimetre precision. Furthermore, the approach is shown to be robust to occlusions and noise.

    Muhammad Sarim, Adrian Hilton, Jean-Yves Guillemaut (2011)TEMPORAL TRIMAP PROPAGATION FOR VIDEO MATTING USING INFERENTIAL STATISTICS, In: 2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)pp. 1745-1748 IEEE

    This paper introduces a statistical inference framework to temporally propagate trimap labels from sparsely defined key frames to estimate trimaps for the entire video sequence. Trimap is a fundamental requirement for digital image and video matting approaches. Statistical inference is coupled with Bayesian statistics to allow robust trimap labelling in the presence of shadows, illumination variation and overlap between the foreground and background appearance. Results demonstrate that trimaps are sufficiently accurate to allow high quality video matting using existing natural image matting algorithms. Quantitative evaluation against ground-truth demonstrates that the approach achieves accurate matte estimation with less amount of user interaction compared to the state-of-the-art techniques.

    JY Guillemaut, A Hilton, J Starck, JJ Kilner, O Grau (2007)A Baysian Framework for Simultaneous Reconstruction and Matting, In: IEEE Int.Conf. on 3D Imaging and Modelingpp. 167-176
    JJ Kilner, J-Y Guillemaut, A Hilton (2009)3D Action Matching with Key-Pose Detection, In: IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops)pp. 1-8

    This paper addresses the problem of human action matching in outdoor sports broadcast environments, by analysing 3D data from a recorded human activity and retrieving the most appropriate proxy action from a motion capture library. Typically pose recognition is carried out using images from a single camera, however this approach is sensitive to occlusions and restricted fields of view, both of which are common in the outdoor sports environment. This paper presents a novel technique for the automatic matching of human activities which operates on the 3D data available in a multi-camera broadcast environment. Shape is retrieved using multi-camera techniques to generate a 3D representation of the scene. Use of 3D data renders the system camera-pose-invariant and allows it to work while cameras are moving and zooming. By comparing the reconstructions to an appropriate 3D library, action matching can be achieved in the presence of significant calibration and matting errors which cause traditional pose detection schemes to fail. An appropriate feature descriptor and distance metric are presented as well as a technique to use these features for key-pose detection and action matching. The technique is then applied to real footage captured at an outdoor sporting event

    M Sarim, A Hilton, JY Guillemaut (2009)Non-parametric Patch Based Video Matting, In: British Machine Vision Conference (BMVC)
    Farshad Einabadi, Jean-Yves Guillemaut, Adrian Hilton (2023)Learning Projective Shadow Textures for Neural Rendering of Human Cast Shadows from Silhouettes, In: 3D Virtual Human Shadow (3DVHshadow) The Eurographics Association

    This contribution introduces a two-step, novel neural rendering framework to learn the transformation from a 2D human silhouette mask to the corresponding cast shadows on background scene geometries. In the first step, the proposed neural renderer learns a binary shadow texture (canonical shadow) from the 2D foreground subject, for each point light source, independent of the background scene geometry. Next, the generated binary shadows are texture-mapped to transparent virtual shadow map planes which are seamlessly used in a traditional rendering pipeline to project hard or soft shadows for arbitrary scenes and light sources of different sizes. The neural renderer is trained with shadow images rendered from a fast, scalable, synthetic data generation framework. We introduce the 3D Virtual Human Shadow (3DVHshadow) dataset as a public benchmark for training and evaluation of human shadow generation. Evaluation on the 3DVHshadow test set and real 2D silhouette images of people demonstrates the proposed framework achieves comparable performance to traditional geometry-based renderers without any requirement for knowledge or computationally intensive, explicit estimation of the 3D human shape. We also show the benefit of learning intermediate canonical shadow textures, compared to learning to generate shadows directly in camera image space. Further experiments are provided to evaluate the effect of having multiple light sources in the scene, model performance with regard to the relative camera-light 2D angular distance, potential aliasing artefacts related to output image resolution, and effect of light sources' dimensions on shadow softness.

    Mark Brown, David Windridge, Jean-Yves Guillemaut (2019)A family of globally optimal branch-and-bound algorithms for 2D–3D correspondence-free registration, In: Pattern Recognition93pp. 36-54 Elsevier

    We present a family of methods for 2D–3D registration spanning both deterministic and non-deterministic branch-and-bound approaches. Critically, the methods exhibit invariance to the underlying scene primitives, enabling e.g. points and lines to be treated on an equivalent basis, potentially enabling a broader range of problems to be tackled while maximising available scene information, all scene primitives being simultaneously considered. Being a branch-and-bound based approach, the method furthermore enjoys intrinsic guarantees of global optimality; while branch-and-bound approaches have been employed in a number of computer vision contexts, the proposed method represents the first time that this strategy has been applied to the 2D–3D correspondence-free registration problem from points and lines. Within the proposed procedure, deterministic and probabilistic procedures serve to speed up the nested branch-and-bound search while maintaining optimality. Experimental evaluation with synthetic and real data indicates that the proposed approach significantly increases both accuracy and robustness compared to the state of the art.

    M Fastovets, JY Guillemaut, A Hilton (2014)Estimating athlete pose from monocular tv sports footage71pp. 161-178

    © Springer International Publishing Switzerland 2014.Human pose estimation from monocular video streams is a challenging problem. Much of the work on this problem has focused on developing inference algorithms and probabilistic prior models based on learned measurements. Such algorithms face challenges in generalisation beyond the learned dataset.We propose an interactive model-based generative approach for estimating the human pose from uncalibratedmonocular video in unconstrained sportsTVfootage. Belief propagation over a spatio-temporal graph of candidate body part hypotheses is used to estimate a temporally consistent pose between user-defined keyframe constraints. Experimental results show that the proposed generative pose estimation framework is capable of estimating pose even in very challenging unconstrained scenarios.

    M Fastovets, J-Y Guillemaut, A Hilton (2014)Athlete pose estimation by non-sequential key-frame propagation., In: P Hall, JP Collomosse, D Cosker (eds.), CVMPpp. 3:1-3:1
    M Brown, D Windridge, JY Guillemaut (2015)A generalisable framework for saliency-based line segment detection, In: Pattern Recognition48(12)pp. 3993-4011 Elsevier

    © 2015 The Authors. Here we present a novel, information-theoretic salient line segment detector. Existing line detectors typically only use the image gradient to search for potential lines. Consequently, many lines are found, particularly in repetitive scenes. In contrast, our approach detects lines that define regions of significant divergence between pixel intensity or colour statistics. This results in a novel detector that naturally avoids the repetitive parts of a scene while detecting the strong, discriminative lines present. We furthermore use our approach as a saliency filter on existing line detectors to more efficiently detect salient line segments. The approach is highly generalisable, depending only on image statistics rather than image gradient; and this is demonstrated by an extension to depth imagery. Our work is evaluated against a number of other line detectors and a quantitative evaluation demonstrates a significant improvement over existing line detectors for a range of image transformations.

    C Malleson, M Klaudiny, J-Y Guillemaut, A Hilton (2014)Structured Representation of Non-Rigid Surfaces from Single View 3D Point Tracks., In: 3DVpp. 625-632
    J Imber, M Volino, J-Y Guillemaut, S Fenney, A Hilton (2013)Free-viewpoint video rendering for mobile devices., In: P Eisert, A Gagalowicz (eds.), MIRAGEpp. 11:1-11:1
    JJ Kilner, J-Y Guillemaut, A Hilton (2009)Summarised Hierarchical Markov Models for Speed Invariant Action Matching., In: ICCV Workshop on Tracking Humans for the Evaluation of their Motion in Image Sequencespp. 1065-1072

    Action matching, where a recorded sequence is matched against, and synchronised with, a suitable proxy from a library of animations, is a technique for generating a synthetic representation of a recorded human activity. This proxy can then be used to represent the action in a virtual environment or as a prior on further processing of the sequence. In this paper we present a novel technique for performing action matching in outdoor sports environments. Outdoor sports broadcasts are typically multi-camera environments and as such reconstruction techniques can be applied to the footage to generate a 3D model of the scene. However due to poor calibration and matting this reconstruction is of a very low quality. Our technique matches the 3D reconstruction sequence against a predefined library of actions to select an appropriate high quality synthetic representation. A hierarchical Markov model combined with 3D summarisation of the data allows a large number of different actions to be matched successfully to the sequence in a rate-invariant manner without prior segmentation of the sequence into discrete units. The technique is applied to data captured at rugby and soccer games.

    Farshad Einabadi, Jean-Yves Guillemaut, Adrian Hilton (2021)Deep Neural Models for Illumination Estimation and Relighting: A Survey, In: Computer Graphics Forum40(6)pp. 315-331 Wiley

    Scene relighting and estimating illumination of a real scene for insertion of virtual objects in a mixed-reality scenario are well-studied challenges in the computer vision and graphics fields. Classical inverse rendering approaches aim to decompose a scene into its orthogonal constituting elements, namely scene geometry, illumination and surface materials, which can later be used for augmented reality or to render new images under novel lighting or viewpoints. Recently, the application of deep neural computing to illumination estimation, relighting and inverse rendering has shown promising results. This contribution aims to bring together in a coherent manner current advances in this conjunction. We examine in detail the attributes of the proposed approaches, presented in three categories: scene illumination estimation, relighting with reflectance-aware scene-specific representations and finally relighting as image-to-image transformations. Each category is concluded with a discussion on the main characteristics of the current methods and possible future trends. We also provide an overview of current publicly available datasets for neural lighting applications.

    M Sarim, A Hilton, J-Y Guillemaut (2011)TEMPORAL TRIMAP PROPAGATION FOR VIDEO MATTING USING INFERENTIAL STATISTICS, In: 2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)pp. 1745-1748
    Boon Lin Teh, Steven Toh, Tom H. Williamson, Boguslaw Obara, Jean-Yves Guillemaut, David H. Steel (2023)Reducing the use of fluorinated gases in vitreoretinal surgery, In: Eye (London)38(2)pp. 229-232
    M Fastovets, J-Y Guillemaut, A Hilton (2013)Athlete Pose Estimation from Monocular TV Sports Footage, In: 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW)pp. 1048-1054
    J-Y Guillemaut, A Hilton (2012)Space-Time Joint Multi-Layer Segmentation and Depth Estimation, In: SECOND JOINT 3DIM/3DPVT CONFERENCE: 3D IMAGING, MODELING, PROCESSING, VISUALIZATION & TRANSMISSION (3DIMPVT 2012)pp. 440-447
    ADM Hilton, Jean-Yves Guillemaut, JJ Kilner, O Grau, G Thomas (2011)3D-TV Production from Conventional Cameras for Sports Broadcast, In: IEEE Transactions Broadcasting57(2)pp. 462-476 IEEE

    3DTV production of live sports events presents a challenging problem involving conflicting requirements of main- taining broadcast stereo picture quality with practical problems in developing robust systems for cost effective deployment. In this paper we propose an alternative approach to stereo production in sports events using the conventional monocular broadcast cameras for 3D reconstruction of the event and subsequent stereo rendering. This approach has the potential advantage over stereo camera rigs of recovering full scene depth, allowing inter-ocular distance and convergence to be adapted according to the requirements of the target display and enabling stereo coverage from both existing and ‘virtual’ camera positions without additional cameras. A prototype system is presented with results of sports TV production trials for rendering of stereo and free-viewpoint video sequences of soccer and rugby.

    M Sarim, A Hilton, J Guillemaut (2009)Non-parametric patch based video matting

    In computer vision, matting is the process of accurate foreground estimation in images and videos. In this paper we presents a novel patch based approach to video matting relying on non-parametric statistics to represent image variations in appearance. This overcomes the limitation of parametric algorithms which only rely on strong colour correlation between the nearby pixels. Initially we construct a clean background by utilising the foreground object’s movement across the background. For a given frame, a trimap is constructed using the background and the last frame’s trimap. A patch-based approach is used to estimate the foreground colour for every unknown pixel and finally the alpha matte is extracted. Quantitative evaluation shows that the technique performs better, in terms of the accuracy and the required user interaction, than the current state-of-the-art parametric approaches.

    M Brown, Jean-Yves Guillemaut, D Windridge (2015)A Saliency-based Framework for 2D-3D Registration, In: Proc. International Conference on Computer Vision Theory and Applications (VISAPP 2014)

    Here we propose a saliency-based filtering approach to the problem of registering an untextured 3D object to a single monocular image. The principle of saliency can be applied to a range of modalities and domains to find intrinsically descriptive entities from amongst detected entities, making it a rigorous approach to multi-modal registration. We build on the Kadir-Brady saliency framework due to its principled information-theoretic approach which enables us to naturally extend it to the 3D domain. The salient points from each domain are initially aligned using the SoftPosit algorithm. This is subsequently refined by aligning the silhouette with contours extracted from the image. Whereas other point based registration algorithms focus on corners or straight lines, our saliency-based approach is more general as it is more widely applicable e.g. to curved surfaces where a corner detector would fail. We compare our salient point detector to the Harris corner and SIFT keypoint detectors and show it generally achieves superior registration accuracy

    A Hilton, Jean-Yves Guillemaut, J Kilner, O Grau, G Thomas (2010)Free-Viewpoint Video for TV Sport Production, In: R Ronfard, G Taubin (eds.), Image and Geometry Processing for 3-D Cinematography5 Springer
    D Casas, M Tejera, J-Y Guillemaut, A Hilton (2011)Parametric control of captured mesh sequences for real-time animation, In: Lecture Notes in Computer Science: Motion in Games7060pp. 242-253

    In this paper we introduce an approach to high-level parameterisation of captured mesh sequences of actor performance for real-time interactive animation control. High-level parametric control is achieved by non-linear blending between multiple mesh sequences exhibiting variation in a particular movement. For example walking speed is parameterised by blending fast and slow walk sequences. A hybrid non-linear mesh sequence blending approach is introduced to approximate the natural deformation of non-linear interpolation techniques whilst maintaining the real-time performance of linear mesh blending. Quantitative results show that the hybrid approach gives an accurate real-time approximation of offline non-linear deformation. Results are presented for single and multi-dimensional parametric control of walking (speed/direction), jumping (heigh/distance) and reaching (height) from captured mesh sequences. This approach allows continuous real-time control of high-level parameters such as speed and direction whilst maintaining the natural surface dynamics of captured movement.

    Stephanie Stoll, Armin Mustafa, Jean-Yves Guillemaut (2022)There and Back Again: 3D Sign Language Generation from Text Using Back-Translation, In: 2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DVpp. 187-196 IEEE

    We introduce the first method to automatically generate 3D mesh sequences from text, inspired by the challenging problem of Sign Language Production (SLP). The approach only requires simple 2D annotations for training, which can be automatically extracted from video. Rather than incorporating high-definition or motion capture data, we propose back-translation as a powerful paradigm for supervision: By first addressing the arguably simpler problem of translating 2D pose sequences to text, we can leverage this to drive a transformer-based architecture to translate text to 2D poses. These are then used to drive a 3D mesh generator. Our mesh generator Pose2Mesh uses temporal information, to enforce temporal coherence and significantly reduce processing time. The approach is evaluated by generating 2D pose, and 3D mesh sequences in DGS (German Sign Language) from German language sentences. An extensive analysis of the approach and its sub-networks is conducted, reporting BLEU and ROUGE scores, as well as Mean 2D Joint Distance. Our proposed Text2Pose model outperforms the current state-of-the-art in SLP, and we establish the first benchmark for the complex task of text-to-3D-mesh-sequence generation with our Text2Mesh model.

    J-Y Guillemaut, J Kittler, MT Sadeghi, WJ Christmas (2006)General pose face recognition using frontal face model, In: PROGRESS IN PATTERN RECOGNITON, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS4225pp. 79-88
    Gianmarco Addari, Jean-Yves Guillemaut (2019)An MRF Optimisation Framework for Full 3D Helmholtz Steropsis, In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications Institute for Systems and Technologies of Information, Control and Communication (INSTICC)

    Accurate 3D modelling of real world objects is essential in many applications such as digital film production and cultural heritage preservation. However, current modelling techniques rely on assumptions to constrain the problem, effectively limiting the categories of scenes that can be reconstructed. A common assumption is that the scene’s surface reflectance is Lambertian or known a priori. These constraints rarely hold true in practice and result in inaccurate reconstructions. Helmholtz Stereopsis (HS) addresses this limitation by introducing a reflectance agnostic modelling constraint, but prior work in this area has been predominantly limited to 2.5D reconstruction, providing only a partial model of the scene. In contrast, this paper introduces the first Markov Random Field (MRF) optimisation framework for full 3D HS. First, an initial reconstruction is obtained by performing 2.5D MRF optimisation with visibility constraints from multiple viewpoints and fusing the different outputs. Then, a refined 3D model is obtained through volumetric MRF optimisation using a tailored Iterative Conditional Modes (ICM) algorithm. The proposed approach is evaluated with both synthetic and real data. Results show that the proposed full 3D optimisation significantly increases both geometric and normal accuracy, being able to achieve sub-millimetre precision. Furthermore, the approach is shown to be robust to occlusions and noise.

    Muhammad Sarim, Adrian Hilton, Jean-Yves Guillemaut, Hansung Kim (2009)NON-PARAMETRIC NATURAL IMAGE MATTING, In: 2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6pp. 3213-3216 IEEE

    Natural image matting is an extremely challenging image processing problem due to its ill-posed nature. It often requires skilled user interaction to aid definition of foreground and background regions. Current algorithms use these pre-defined regions to build local foreground and background colour models. In this paper we propose a novel approach which uses non-parametric statistics to model image appearance variations. This technique overcomes the limitations of previous parametric approaches which are purely colour-based and thereby unable to model natural image structure. The proposed technique consists of three successive stages: (i) background colour estimation, (ii) foreground colour estimation, (iii) alpha estimation. Colour estimation uses patch-based matching techniques to efficiently recover the optimum colour by comparison against patches from the known regions. Quantitative evaluation against ground truth demonstrates that the technique produces better results and successfully recovers fine details such as hair where many other algorithms fail.

    Marco Volino, Armin Mustafa, Jean-Yves Guillemaut, Adrian Hilton (2020)Light Field Video for Immersive Content Production, In: Real VR – Immersive Digital Realitypp. 33-64 Springer International Publishing

    Light field video for content production is gaining both research and commercial interest as it has the potential to push the level of immersion for augmented and virtual reality to a close-to-reality experience. Light fields densely sample the viewing space of an object or scene using hundreds or even thousands of images with small displacements in between. However, a lack of standardised formats for compression, storage and transmission, along with the lack of tools to enable editing of light field data currently make it impractical for use in real-world content production. In this chapter we address two fundamental problems with light field data, namely representation and compression. Firstly we propose a method to obtain a 4D temporally coherent representation from the input light field video. This is an essential problem to solve that will enable efficient compression editing. Secondly, we present a method for compression of light field data based on the eigen texture method that provides a compact representation and enables efficient view-dependent rendering at interactive frame rates. These approaches achieve an order of magnitude compression and temporally consistent representation that are important steps towards practical toolsets for light field video content production.

    HE Imre, J-Y Guillemaut, ADM Hilton (2012)Moving Camera Registration for Multiple Camera Setups in Dynamic Scenes, In: Proceedings of the 21st British Machine Vision Conference

    Many practical applications require an accurate knowledge of the extrinsic calibration (____ie, pose) of a moving camera. The existing SLAM and structure-from-motion solutions are not robust to scenes with large dynamic objects, and do not fully utilize the available information in the presence of static cameras, a common practical scenario. In this paper, we propose an algorithm that addresses both of these issues for a hybrid static-moving camera setup. The algorithm uses the static cameras to build a sparse 3D model of the scene, with respect to which the pose of the moving camera is estimated at each time instant. The performance of the algorithm is studied through extensive experiments that cover a wide range of applications, and is shown to be satisfactory.

    SK Hall, TH Williamson, Jean-Yves Guillemaut, T Goddard, AP Baumann, JC Hutter (2017)Modeling the Dynamics of Tamponade Multicomponent Gases During Retina Reattachment Surgery, In: AIChE Journal63(9)pp. 3651-3662 Wiley, for American Institute of Chemical Engineers

    Vitrectomy and pneumatic retinopexy are common surgical procedures used to treat retinal detachment. To reattach the retina, gases are used to inflate the vitreous space allowing the retina to attach by surface tension and buoyancy forces that are superior to the location of the bubble. These procedures require the injection of either a pure tamponade gas, such as C3F8 or SF6, or mixtures of these gases with air. The location of the retinal detachment, the anatomical spread of the retinal defect, and the length of time the defect has persisted, will determine the suggested volume and duration of the gas bubble to allow reattachment. After inflation, the gases are slowly absorbed by the blood allowing the vitreous to be refilled by aqueous. We have developed a model of the mass transfer dynamics of tamponade gases during pneumatic retinopexy or pars plana vitrectomy procedures. The model predicts the expansion and persistence of intraocular gases (C3F8, SF6), oxygen, nitrogen, and carbon dioxide, as well as the intraocular pressure. The model was validated using published literature in rabbits and humans. In addition to correlating the mass transfer dynamics by surface area, permeability, and partial pressure driving forces, the mass transfer dynamics are affected by the percentage of the tamponade gases. Rates were also correlated with the physical properties of the tamponade and blood gases. The model gave accurate predictions in humans.

    Armin Mustafa, Marco Volino, Hansung Kim, Jean-Yves Guillemaut, Adrian Hilton (2020)Temporally coherent general dynamic scene reconstruction, In: International Journal of Computer Vision Springer

    Existing techniques for dynamic scene re- construction from multiple wide-baseline cameras pri- marily focus on reconstruction in controlled environ- ments, with fixed calibrated cameras and strong prior constraints. This paper introduces a general approach to obtain a 4D representation of complex dynamic scenes from multi-view wide-baseline static or moving cam- eras without prior knowledge of the scene structure, ap- pearance, or illumination. Contributions of the work are: An automatic method for initial coarse reconstruc- tion to initialize joint estimation; Sparse-to-dense tem- poral correspondence integrated with joint multi-view segmentation and reconstruction to introduce tempo- ral coherence; and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes by introducing shape constraint. Com- parison with state-of-the-art approaches on a variety of complex indoor and outdoor scenes, demonstrates im- proved accuracy in both multi-view segmentation and dense reconstruction. This paper demonstrates unsuper- vised reconstruction of complete temporally coherent 4D scene models with improved non-rigid object seg- mentation and shape reconstruction and its application to various applications such as free-view rendering and virtual reality.

    Matthew Bailey, Jean-Yves Guillemaut (2020)A Novel Depth from Defocus Framework Based on a Thick Lens Camera Model, In: 2020 International Conference on 3D Vision (3DV)pp. 1206-1215 IEEE

    Reconstruction approaches based on monocular defocus analysis such as Depth from Defocus (DFD) often utilise the thin lens camera model. Despite this widespread adoption, there are inherent limitations associated with it. Coupled with invalid parameterisation commonplace in literature, the overly-simplified image formation it describes leads to inaccurate defocus modelling; especially in macro-scale scenes. As a result, DFD reconstructions based around this model are not geometrically consistent, and are typically restricted to single-view applications. Subsequently, the handful of existing approaches which attempt to include additional viewpoints have had only limited success.In this work, we address these issues by instead utilising a thick lens camera model, and propose a novel calibration procedure to accurately parameterise it. The effectiveness of our model and calibration is demonstrated with a novel DFD reconstruction framework. We achieve highly detailed, geometrically accurate and complete 3D models of real-world scenes from multi-view focal stacks. To our knowledge, this is the first time DFD has been successfully applied to complete scene modelling in this way.

    Tom H. Williamson, Jean-Yves Guillemaut, Sheldon K. Hall, Joseph C. Hutter, Tony Goddard (2018)Theoretical gas concentrations achieving 100% fill of the vitreous cavity in the postoperative period, a gas eye model study (GEMS), In: RETINA, The Journal of Retinal and Vitreous Diseases38pp. S60-S64 Lippincott, Williams & Wilkins

    Precis. A mathematical model is described of the physical properties of intraocular gases providing a guide to the correct gas concentrations to achieve 100% fill of the vitreous cavity postoperatively. A table for the instruction of surgeons is provided and the effects of different axial lengths examined. ABSTRACT Purpose – To determine the concentrations of different gas tamponades in air to achieve 100% fill of the vitreous cavity postoperatively and to examine the influence of eye volume on these concentrations. Methods – A mathematical model of the mass transfer dynamics of tamponade and blood gases (O2, N2, CO2) when injected into the eye was used. Mass transfer surface areas were calculated from published anatomical data. The model has been calibrated from published volumetric decay and composition results for three gases sulphahexafluoride, SF6, hexafluoroethane, C2F6, or perfluoropropane, C3F8. The concentrations of these gases (in air) required to achieve 100% fill of the vitreous cavity postoperatively without an intra-ocular pressure rise were determined. The concentrations were calculated for three volumes of the vitreous cavity to test if ocular size influenced the results. Results – A table of gas concentrations was produced. In a simulation of pars plana vitrectomy operations in which an 80% to 85% fill of the vitreous cavity with gas was achieved at surgery, the concentrations of the three gases in air to achieve 100% fill postoperatively were 10-13% for C3F8, 12-15% for C2F6 and 19-25% for SF6. These were similar to the so-called ''non-expansive'' concentrations used in the clinical setting. The calculations were repeated for three different sizes of eye. Aiming for an 80% fill at surgery and 100% postoperatively, an eye with a 4ml vitreous cavity required 24% SF6, 15% C2F6 or 13% C3F8; 7.2ml required 25% SF6, 15% C2F6 or 13% C3F8; and 10ml required 25% SF6, 16% C2F6 or 13% C3F8. When using 100% gas (for example, employed in pneumatic retinopexy), in order to achieve 100% fill postoperatively, the minimum vitreous cavity fill at surgery was 43% for SF6, 29% for C2F6 and 25% for C3F8 and was only minimally changed by variation in the size of the eye. Conclusions – A table has been produced which could be used for surgical innovation in gas usage in the vitreous cavity. It provides concentrations for different percentage fills, which will achieve a moment post-operatively with a full fill of the cavity without a pressure rise. Variation in axial length and size of the eye does not appear to alter the values in the table significantly. Those using pneumatic retinopexy need to increase the volume of gas injected with increased size of the eye in order to match the percentage fill of the vitreous cavity recommended for a given tamponade agent.

    J-Y Guillemaut, J Kilner, A Hilton (2009)Robust Graph-Cut Scene Segmentation and Reconstruction for Free-Viewpoint Video of Complex Dynamic Scenes, In: 2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)pp. 809-816
    T Wang, J Guillemaut, J Collomosse (2010)Multi-label Propagation for Coherent Video Segmentation and Artistic Stylization, In: Proceedings of Intl. Conf. on Image Proc. (ICIP)pp. 3005-3008

    We present a new algorithm for segmenting video frames into temporally stable colored regions, applying our technique to create artistic stylizations (e.g. cartoons and paintings) from real video sequences. Our approach is based on a multilabel graph cut applied to successive frames, in which the color data term and label priors are incrementally updated and propagated over time. We demonstrate coherent segmentation and stylization over a variety of home videos.

    Gianmarco Addari, Jean-Yves Guillemaut (2023)A Family of Approaches for Full 3D Reconstruction of Objects with Complex Surface Reflectance, In: Full 3D Helmholtz Stereopsis Datasets Springer

    3D reconstruction of general scenes remains an open challenge with current techniques often reliant on assumptions on the scene’s surface reflectance, which restrict the range of objects that can be modelled. Helmholtz Stereopsis offers an appealing framework to make the modelling process agnostic to surface reflectance. However, previous formulations have been almost exclusively limited to 2.5D modelling. To address this gap, this paper introduces a family of reconstruction approaches that exploit Helmholtz reciprocity to produce complete 3D models of objects with arbitrary unknown reflectance. This includes an approach based on the fusion of (orthographic or perspective) view-dependent reconstructions, a volumetric approach optimising surface location within a voxel grid, and a mesh-based formulation optimising vertices positions of a given mesh topology. The contributed approaches are evaluated on synthetic and real datasets, including novel full 3D datasets publicly released with this paper, with experimental comparison against a wide range of competing methods. Results demonstrate the benefits of the different approaches and their abilities to achieve high quality full 3D reconstructions of complex objects.

    Charles Malleson, Jean-Yves Guillemaut, Adrian Hilton (2019)3D Reconstruction from RGB-D Data, In: Paul L. Rosin, Yu-Kun Lai, Ling Shao, Yonghuai Liu (eds.), RGB-D Image Analysis and Processingpp. pp 87-115 Springer Nature Switzerland AG

    A key task in computer vision is that of generating virtual 3D models of real-world scenes by reconstructing the shape, appearance and, in the case of dynamic scenes, motion of the scene from visual sensors. Recently, low-cost video plus depth (RGB-D) sensors have become widely available and have been applied to 3D reconstruction of both static and dynamic scenes. RGB-D sensors contain an active depth sensor, which provides a stream of depth maps alongside standard colour video. The low cost and ease of use of RGB-D devices as well as their video rate capture of images along with depth make them well suited to 3D reconstruction. Use of active depth capture overcomes some of the limitations of passive monocular or multiple-view video-based approaches since reliable, metrically accurate estimates of the scene depth at each pixel can be obtained from a single view, even in scenes that lack distinctive texture. There are two key components to 3D reconstruction from RGB-D data: (1) spatial alignment of the surface over time and, (2) fusion of noisy, partial surface measurements into a more complete, consistent 3D model. In the case of static scenes, the sensor is typically moved around the scene and its pose is estimated over time. For dynamic scenes, there may be multiple rigid, articulated, or non-rigidly deforming surfaces to be tracked over time. The fusion component consists of integration of the aligned surface measurements, typically using an intermediate representation, such as the volumetric truncated signed distance field (TSDF). In this chapter, we discuss key recent approaches to 3D reconstruction from depth or RGB-D input, with an emphasis on real-time reconstruction of static scenes.

    Chathura Galkandage, Janko Calic, S Dogan, Jean-Yves Guillemaut (2017)Stereoscopic Video Quality Assessment Using Binocular Energy, In: Journal of Selected Topics in Signal Processing11(1)pp. 102-112 IEEE

    Stereoscopic imaging is becoming increasingly popular. However, to ensure the best quality of experience, there is a need to develop more robust and accurate objective metrics for stereoscopic content quality assessment. Existing stereoscopic image and video metrics are either extensions of conventional 2D metrics (with added depth or disparity information) or are based on relatively simple perceptual models. Consequently, they tend to lack the accuracy and robustness required for stereoscopic content quality assessment. This paper introduces full-reference stereoscopic image and video quality metrics based on a Human Visual System (HVS) model incorporating important physiological findings on binocular vision. The proposed approach is based on the following three contributions. First, it introduces a novel HVS model extending previous models to include the phenomena of binocular suppression and recurrent excitation. Second, an image quality metric based on the novel HVS model is proposed. Finally, an optimised temporal pooling strategy is introduced to extend the metric to the video domain. Both image and video quality metrics are obtained via a training procedure to establish a relationship between subjective scores and objective measures of the HVS model. The metrics are evaluated using publicly available stereoscopic image/video databases as well as a new stereoscopic video database. An extensive experimental evaluation demonstrates the robustness of the proposed quality metrics. This indicates a considerable improvement with respect to the state-of-the-art with average correlations with subjective scores of 0.86 for the proposed stereoscopic image metric and 0.89 and 0.91 for the proposed stereoscopic video metrics.

    J-Y Guillemaut, J Kilner, A Hilton (2009)Robust Graph-Cut Scene Segmentation and Reconstruction for Free-Viewpoint Video of Complex Dynamic Scenes, In: IEEE Int.Conf. on Computer Vision, ICCVpp. 809-816
    M Sarim, A Hilton, J-Y Guillemaut, H Kim, T Takai (2010)Wide-Baseline Multi-View Video Segmentation For 3D Reconstruction, In: Proceedings of the 1st international workshop on 3D video processingpp. 13-16

    Obtaining a foreground silhouette across multiple views is one of the fundamental steps in 3D reconstruction. In this paper we present a novel video segmentation approach, to obtain a foreground silhouette, for scenes captured by a wide-baseline camera rig given a sparse manual interaction in a single view. The algorithm is based on trimap propagation, a framework used in video matting. Bayesian inference coupled with camera calibration information are used to spatio-temporally propagate high confidence trimap labels across the multi-view video to obtain coarse silhouettes which are later refined using a matting algorithm. Recent techniques have been developed for foreground segmentation, based on image matting, in multiple views but they are limited to narrow baseline with low foreground variation. The proposed wide-baseline silhouette propagation is robust to inter-view foreground appearance changes, shadows and similarity in foreground/background appearance. The approach has demonstrated good performance in silhouette estimation for views up to 180 degree baseline (opposing views). The segmentation technique has been fully integrated in a multi-view reconstruction pipeline. The results obtained demonstrate the suitability of the technique for multi-view reconstruction with wide-baseline camera set-ups and natural background.

    J-Y Guillemaut, A Hilton (2011)Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications, In: International Journal of Computer Vision93(1)pp. 73-100 Springer
    Charles Malleson, Jean-Yves Guillemaut, Adrian Hilton (2018)Hybrid modelling of non-rigid scenes from RGBD cameras, In: IEEE Transactions on Circuits and Systems for Video Technology IEEE

    Recent advances in sensor technology have introduced low-cost RGB video plus depth sensors, such as the Kinect, which enable simultaneous acquisition of colour and depth images at video rates. This paper introduces a framework for representation of general dynamic scenes from video plus depth acquisition. A hybrid representation is proposed which combines the advantages of prior surfel graph surface segmentation and modelling work with the higher-resolution surface reconstruction capability of volumetric fusion techniques. The contributions are (1) extension of a prior piecewise surfel graph modelling approach for improved accuracy and completeness, (2) combination of this surfel graph modelling with TSDF surface fusion to generate dense geometry, and (3) proposal of means for validation of the reconstructed 4D scene model against the input data and efficient storage of any unmodelled regions via residual depth maps. The approach allows arbitrary dynamic scenes to be efficiently represented with temporally consistent structure and enhanced levels of detail and completeness where possible, but gracefully falls back to raw measurements where no structure can be inferred. The representation is shown to facilitate creative manipulation of real scene data which would previously require more complex capture setups or manual processing.

    M Sarim, A Hilton, J-Y Guillemaut (2009)Alpha Matte Estimation of Natural Images Using Local and Global Template Correspondence, In: ICET: 2009 INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES, PROCEEDINGSpp. 229-234
    C Budd, J Guillemaut, M Klaudiny, A Hilton (2012)Scene Modelling for Richer Media Content
    Armin Mustafa, Marco Volino, Jean-Yves Guillemaut, Adrian Hilton (2018)4D Temporally Coherent Light-field Video, In: 3DV 2017 Proceedings IEEE

    Light-field video has recently been used in virtual and augmented reality applications to increase realism and immersion. However, existing light-field methods are generally limited to static scenes due to the requirement to acquire a dense scene representation. The large amount of data and the absence of methods to infer temporal coherence pose major challenges in storage, compression and editing compared to conventional video. In this paper, we propose the first method to extract a spatio-temporally coherent light-field video representation. A novel method to obtain Epipolar Plane Images (EPIs) from a spare lightfield camera array is proposed. EPIs are used to constrain scene flow estimation to obtain 4D temporally coherent representations of dynamic light-fields. Temporal coherence is achieved on a variety of light-field datasets. Evaluation of the proposed light-field scene flow against existing multiview dense correspondence approaches demonstrates a significant improvement in accuracy of temporal coherence.

    We propose a multi-view framework for joint object detection and labelling based on pairs of images. The proposed framework extends the single-view Mask R-CNN approach to multiple views without need for additional training. Dedicated components are embedded into the framework to match objects across views by enforcing epipolar constraints, appearance feature similarity and class coherence. The multi-view extension enables the proposed framework to detect objects which would otherwise be mis-detected in a classical Mask R-CNN approach, and achieves coherent object labelling across views. By avoiding the need for additional training, the approach effectively overcomes the current shortage of multi-view datasets. The proposed framework achieves high quality results on a range of complex scenes, being able to output class, bounding box, mask and an additional label enforcing coherence across views. In the evaluation, we show qualitative and quantitative results on several challenging outd oor multi-view datasets and perform a comprehensive comparison to verify the advantages of the proposed method

    Charles Malleson, Jean-Yves Guillemaut (2024)Wearable apparatus for correction of visual alignment under torsional strabismus, In: Optometry and Vision Science101(4)pp. 204-210 Lippincott Williams & Wilkins

    SIGNIFICANCE A wearable optical apparatus that compensates for eye misalignment (strabismus) to correct for double vision (diplopia) is proposed. In contrast to prism lenses, commonly used to compensate for horizontal and/or vertical misalignment, the proposed approach is able to compensate for any combination of horizontal, vertical, and torsional misalignment. PURPOSE If the action of the extraocular muscles is compromised (e.g., by nerve damage), a patient may lose their ability to maintain visual alignment, negatively affecting their binocular fusion and stereo depth perception capability. Torsional misalignment cannot be mitigated by standard Fresnel prism lenses. Surgical procedures intended to correct torsional misalignment may be unpredictable. A wearable device able to rectify visual alignment and restore stereo depth perception without surgical intervention could potentially be of great value to people with strabismus. METHODS We propose a novel lightweight wearable optical device for visual alignment correction. The device comprises two mirrors and a Fresnel prism, arranged in such a way that together they rotationally shift the view seen by the affected eye horizontally, vertically, and torsionally. The extent of the alignment correction on each axis can be arbitrarily adjusted according to the patient's particular misalignment characteristics. RESULTS The proposed approach was tested by computer simulation, and a prototype device was manufactured. The prototype device was tested by a strabismus patient exhibiting horizontal and torsional misalignment. In these tests, the device was found to function as intended, allowing the patient to enjoy binocular fusion and stereo depth perception while wearing the device for daily activities over a period of several months. CONCLUSIONS The proposed device is effective in correcting arbitrary horizontal, vertical, and torsional misalignment of the eyes. The results of the initial testing performed are highly encouraging. Future study is warranted to formally assess the effectiveness of the device on multiple test patients.

    JY Guillemaut, AS Aguado, J Illingworth (2005)Using points at infinity for parameter decoupling in camera calibration, In: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE27(2)pp. 265-270 IEEE COMPUTER SOC
    Gianmarco Addari, Jean-Yves Guillemaut (2020)An MRF Optimisation Framework for Full 3D Reconstruction of Scenes with Complex Reflectance, In: A P Claudio, K Bouatouch, M Chessa, A Paljic, A Kerren, C Hurter, A Tremeau, G M Farinella (eds.), COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISIGRAPP 2019)1182pp. 456-476 Springer Nature

    While many existing modelling techniques achieve impressive results, they are often reliant on assumptions such as prior knowledge of the scene's surface reflectance. This considerably restricts the range of scenes that can be reconstructed, as these assumptions are often violated in practice. One technique that allows surface reconstruction regardless of the scene's reflectance model is Helmholtz Stereopsis (HS). However, to date, research on HS has mostly been limited to 2.5D scene reconstruction. In this paper, a framework is introduced to perform full 3D HS using Markov Random Field (MRF) optimisation for the first time. The paper introduces two complementary techniques. The first approach computes multiple 2.5D reconstructions from a small number of viewpoints and fuses these together to obtain a complete model, while the second approach directly reasons in the 3D domain by performing a volumetric MRF optimisation. Both approaches are based on optimising an energy function combining an HS confidence measure and normal consistency across the reconstructed surface. The two approaches are evaluated on both synthetic and real scenes, measuring the accuracy and completeness obtained. Further, the effect of noise on modelling accuracy is experimentally evaluated on the synthetic dataset. Both techniques achieve sub-millimetre accuracy and exhibit robustness to noise. In particular, the method based on full 3D optimisation is shown to significantly outperform the other approach.

    V Brujic-Okretic, J Guillemaut, L Hitchin, M Michielen, G Parker (2003)Remote vehicle manoeuvring using augmented realitypp. 186-189
    Nadejda Roubtsova, Jean-Yves Guillemaut (2017)Bayesian Helmholtz Stereopsis with Integrability Prior, In: IEEE Transactions on Pattern Analysis and Machine Intelligence40(9)pp. 2265-2272 Institute of Electrical and Electronics Engineers (IEEE)

    Helmholtz Stereopsis is a 3D reconstruction method uniquely independent of surface reflectance. Yet, its sub-optimal maximum likelihood formulation with drift-prone normal integration limits performance. Via three contributions this paper presents a complete novel pipeline for Helmholtz Stereopsis. Firstly, we propose a Bayesian formulation replacing the maximum likelihood problem by a maximum a posteriori one. Secondly, a tailored prior enforcing consistency between depth and normal estimates via a novel metric related to optimal surface integrability is proposed. Thirdly, explicit surface integration is eliminated by taking advantage of the accuracy of prior and high resolution of the coarse-to-fine approach. The pipeline is validated quantitatively and qualitatively against alternative formulations, reaching sub-millimetre accuracy and coping with complex geometry and reflectance.

    J Imber, J-Y Guillemaut, A Hilton (2014)Intrinsic textures for relightable free-viewpoint video, In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)8690 L(PART 2)pp. 392-407

    This paper presents an approach to estimate the intrinsic texture properties (albedo, shading, normal) of scenes from multiple view acquisition under unknown illumination conditions. We introduce the concept of intrinsic textures, which are pixel-resolution surface textures representing the intrinsic appearance parameters of a scene. Unlike previous video relighting methods, the approach does not assume regions of uniform albedo, which makes it applicable to richly textured scenes. We show that intrinsic image methods can be used to refine an initial, low-frequency shading estimate based on a global lighting reconstruction from an original texture and coarse scene geometry in order to resolve the inherent global ambiguity in shading. The method is applied to relighting of free-viewpoint rendering from multiple view video capture. This demonstrates relighting with reproduction of fine surface detail. Quantitative evaluation on synthetic models with textured appearance shows accurate estimation of intrinsic surface reflectance properties. © 2014 Springer International Publishing.

    James E. Neffendorf, JEAN-YVES GUILLEMAUT, Joseph C. Hutter, J Ho, TOM WILLIAMSON (2020)Effect of Aqueous Dynamics on Gas Behavior Following Retinal Reattachment Surgery, In: Ophthalmic surgery, lasers & imaging51(9)pp. 522-528 SLACK INCORPORATED

    BACKGROUND AND OBJECTIVE: To determine how the gas concentration in air required to achieve full postoperative vitreous cavity fill varies in different aqueous outflow states. MATERIALS AND METHODS: A mathematical model was used to estimate gas dynamics. The change in gas bubble volume over time was calculated in an eye with normal aqueous outflow, ocular hypertension (OHT), and OHT with apraclonidine treatment. RESULTS: The concentration required was higher for all gases to achieve a full postoperative fill in OHT eyes versus normal eyes. Optimal gas concentrations were 22.6% for SF6, 13.9% for C2F6, and 11.6% for C3F8. Despite this, in OHT, the fill achieved was 95%, 95%, and 94% for SF6, C2F6, and C3F8, respectively. With apraclonidine, percentage fill improved for all gases. CONCLUSIONS: This is the first study to show aqueous outflow affects bubble size and indicates eyes with reduced outflow are at risk of underfill. This can ultimately affect surgical success. [Ophthalmic Surg Lasers Imaging Retina. 2020;51:522–528.]

    Armin Mustafa, Hansung Kim, Jean-Yves Guillemaut, Adrian Hilton (2015)General Dynamic Scene Reconstruction from Multiple View Video, In: 2015 IEEE International Conference on Computer Vision (ICCV)pp. 900-908 IEEE

    This paper introduces a general approach to dynamic scene reconstruction from multiple moving cameras without prior knowledge or limiting constraints on the scene structure, appearance, or illumination. Existing techniques for dynamic scene reconstruction from multiple wide-baseline camera views primarily focus on accurate reconstruction in controlled environments, where the cameras are fixed and calibrated and background is known. These approaches are not robust for general dynamic scenes captured with sparse moving cameras. Previous approaches for outdoor dynamic scene reconstruction assume prior knowledge of the static background appearance and structure. The primary contributions of this paper are twofold: an automatic method for initial coarse dynamic scene segmentation and reconstruction without prior knowledge of background appearance or structure; and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes from multiple wide-baseline static or moving cameras. Evaluation is performed on a variety of indoor and outdoor scenes with cluttered backgrounds and multiple dynamic non-rigid objects such as people. Comparison with state-of-the-art approaches demonstrates improved accuracy in both multiple view segmentation and dense reconstruction. The proposed approach also eliminates the requirement for prior knowledge of scene structure and appearance.

    M Sarim, A Hilton, Jean-Yves Guillemaut, Hansung Kim, T Takai (2010)Multiple view wide-baseline trimap propagation for natural video matting, In: Proc. European Conference on Visual Media Production (CVMP 2010)pp. 82-91

    This paper presents a method to estimate alpha mattes for video sequences of the same foreground scene from wide-baseline views given sparse key-frame trimaps in a single view. A statistical inference framework is introduced for spatio-temporal propagation of high-confidence trimap labels between video sequences without a requirement for correspondence or camera calibration and motion estimation. Multiple view trimap propagation integrates appearance information between views and over time to achieve robust labelling in the presence of shadows, changes in appearance with view point and overlap between foreground and background appearance. Results demonstrate that trimaps are sufficiently accurate to allow high-quality video matting using existing single view natural image matting algorithms. Quantitative evaluation against ground-truth demonstrates that the approach achieves accurate matte estimation for camera views separated by up to 180◦ , with the same amount of manual interaction required for conventional single view video matting.

    E Imre, J-Y Guillemaut, A Hilton (2011)Calibration of nodal and free-moving cameras in dynamic scenes for post-production, In: Proceedings - 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, 3DIMPVT 2011pp. 260-267

    In film production, many post-production tasks require the availability of accurate camera calibration information. This paper presents an algorithm for through-the-lens calibration of a moving camera for a common scenario in film production and broadcasting: The camera views a dynamic scene, which is also viewed by a set of static cameras with known calibration. The proposed method involves the construction of a sparse scene model from the static cameras, with respect to which the moving camera is registered, by applying the appropriate perspective-n-point (PnP) solver. In addition to the general motion case, the algorithm can handle the nodal cameras with unknown focal length via a novel P2P algorithm. The approach can identify a subset of static cameras that are more likely to generate a high number of scene-image correspondences, and can robustly deal with dynamic scenes. Our target applications include dense 3D reconstruction, stereoscopic 3D rendering and 3D scene augmentation, through which the success of the algorithm is demonstrated experimentally.

    Chathura Vindana Perera Galkandage, Janko Calic, Safak Dogan, Jean-Yves Guillemaut (2020)Full-Reference Stereoscopic Video Quality Assessment Using a Motion Sensitive HVS Model, In: IEEE Transactions on Circuits and Systems for Video Technology Institute of Electrical and Electronics Engineers

    Stereoscopic video quality assessment has become a major research topic in recent years. Existing stereoscopic video quality metrics are predominantly based on stereoscopic image quality metrics extended to the time domain via for example temporal pooling. These approaches do not explicitly consider the motion sensitivity of the Human Visual System (HVS). To address this limitation, this paper introduces a novel HVS model inspired by physiological findings characterising the motion sensitive response of complex cells in the primary visual cortex (V1 area). The proposed HVS model generalises previous HVS models, which characterised the behaviour of simple and complex cells but ignored motion sensitivity, by estimating optical flow to measure scene velocity at different scales and orientations. The local motion characteristics (direction and amplitude) are used to modulate the output of complex cells. The model is applied to develop a new type of full-reference stereoscopic video quality metrics which uniquely combine non-motion sensitive and motion sensitive energy terms to mimic the response of the HVS. A tailored two-stage multi-variate stepwise regression algorithm is introduced to determine the optimal contribution of each energy term. The two proposed stereoscopic video quality metrics are evaluated on three stereoscopic video datasets. Results indicate that they achieve average correlations with subjective scores of 0.9257 (PLCC), 0.9338 and 0.9120 (SRCC), 0.8622 and 0.8306 (KRCC), and outperform previous stereoscopic video quality metrics including other recent HVS-based metrics.

    Gianmarco Addari, Jean-Yves Guillemaut (2023)A Family of Approaches for Full 3D Reconstruction of Objects with Complex Surface Reflectance, In: International journal of computer vision Springer Nature

    3D reconstruction of general scenes remains an open challenge with current techniques often reliant on assumptions on the scene's surface reflectance, which restrict the range of objects that can be modelled. Helmholtz Stereopsis offers an appealing framework to make the modelling process agnostic to surface reflectance. However, previous formulations have been almost exclusively limited to 2.5D modelling. To address this gap, this paper introduces a family of reconstruction approaches that exploit Helmholtz reciprocity to produce complete 3D models of objects with arbitrary unknown reflectance. This includes an approach based on the fusion of (orthographic or perspective) view-dependent reconstructions, a volumetric approach optimising surface location within a voxel grid, and a mesh-based formulation optimising vertices positions of a given mesh topology. The contributed approaches are evaluated on synthetic and real datasets, including novel full 3D datasets publicly released with this paper, with experimental comparison against a wide range of competing methods. Results demonstrate the benefits of the different approaches and their abilities to achieve high quality full 3D reconstructions of complex objects.

    J-Y Guillemaut, O Drbohlav, J Illingworth, R Sara (2008)A maximum likelihood surface normal estimation algorithm for Helmholtz stereopsis, In: VISAPP 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2pp. 352-359
    Caroline Scarles, Suzanne van Evan, Naomi Klepacz, Jean-Yves Guillemaut, Michael Humbracht (2020)Bringing The Outdoors Indoors: Immersive Experiences of Recreation in Nature and Coastal Environments in Residential Care Homes, In: E-review of Tourism Research Texas A&M AgriLife

    This paper critiques the opportunities afforded by immersive experience technology to create stimulating, innovative living environments for long-term residents of care homes for the elderly. We identify the ways in which virtual mobility can facilitate reconnection with recreational environments. Specifically, the project examines the potential of two assistive and immersive experiences; virtual reality (VR) and multisensory stimulation environments (MSSE). Findings identify three main areas of knowledge contribution. First, the introduction of VR and MSSE facilitated participants re-engagement and sharing of past experiences as they recalled past family holidays, day trips or everyday practices. Secondly, the combination of the hardware of the VR and MSSE technology with the physical objects of the sensory trays created alternative, multisensual ways of engaging with the experiences presented to participants. Lastly, the clear preference for the MSSE experience over the VR experience highlighted the importance of social interaction and exchange for participants.

    N Roubtsova, Jean-Yves Guillemaut (2015)Colour Helmholtz Stereopsis for reconstruction of complex dynamic scenes, In: Proceedings - 2014 International Conference on 3D Vision, 3DV 2014pp. 251-258

    Helmholtz Stereopsis (HS) is a powerful technique for reconstruction of scenes with arbitrary reflectance properties. However, previous formulations have been limited to static objects due to the requirement to sequentially capture reciprocal image pairs (i.e. two images with the camera and light source positions mutually interchanged). In this paper, we propose colour HS-a novel variant of the technique based on wavelength multiplexing. To address the new set of challenges introduced by multispectral data acquisition, the proposed novel pipeline for colour HS uniquely combines a tailored photometric calibration for multiple camera/light source pairs, a novel procedure for surface chromaticity calibration and the state-of-the-art Bayesian HS suitable for reconstruction from a minimal number of reciprocal pairs. Experimental results including quantitative and qualitative evaluation demonstrate that the method is suitable for flexible (single-shot) reconstruction of static scenes and reconstruction of dynamic scenes with complex surface reflectance properties.

    JJ Kilner, J Starck, A Hilton, JY Guillemaut, O Grau (2007)Dual Mode Deformable Models for Free-Viewpoint Video of Outdoor Sports Events, In: IEEE Int.Conf. on 3D Imaging and Modelingpp. 177-184
    Nadejda Roubtsova, Jean-Yves Guillemaut (2016)Colour Helmholtz Stereopsis for Reconstruction of Dynamic Scenes with Arbitrary Unknown Reflectance, In: International Journal of Computer Vision124pp. 18-48 Springer

    Helmholtz Stereopsis is a powerful technique for reconstruction of scenes with arbitrary re ectance properties. However, previous formulations have been limited to static objects due to the requirement to se- quentially capture reciprocal image pairs (i.e. two im- ages with the camera and light source positions mu- tually interchanged). In this paper, we propose Colour Helmholtz Stereopsis - a novel framework for Helmholtz Stereopsis based on wavelength multiplexing. To ad- dress the new set of challenges introduced by multispec- tral data acquisition, the proposed Colour Helmholtz Stereopsis pipeline uniquely combines a tailored pho- tometric calibration for multiple camera/light source pairs, a novel procedure for spatio-temporal surface chromaticity calibration and a state-of-the-art Bayesian formulation necessary for accurate reconstruction from a minimal number of reciprocal pairs. In this frame- work, re ectance is spatially unconstrained both in terms of its chromaticity and the directional component dependent on the illumination incidence and viewing angles. The proposed approach for the rst time en- ables modelling of dynamic scenes with arbitrary un- known and spatially varying re ectance using a practi- cal acquisition set-up consisting of a small number of cameras and light sources. Experimental results demon- strate the accuracy and exibility of the technique on a variety of static and dynamic scenes with arbitrary un- known BRDF and chromaticity ranging from uniform to arbitrary and spatially varying.

    Conventional stereoscopic video content production requires use of dedicated stereo camera rigs which is both costly and lacking video editing flexibility. In this paper, we propose a novel approach which only requires a small number of standard cameras sparsely located around a scene to automatically convert the monocular inputs into stereoscopic streams. The approach combines a probabilistic spatio-temporal segmentation framework with a state-of-the-art multi-view graph-cut reconstruction algorithm, thus providing full control of the stereoscopic settings at render time. Results with studio sequences of complex human motion demonstrate the suitability of the method for high quality stereoscopic content generation with minimum user interaction.

    C Malleson, M Klaudiny, A Hilton, J-Y Guillemaut (2013)Single-view RGBD-based reconstruction of dynamic human geometry, In: Proceedings of the IEEE International Conference on Computer Vision - Workshop on Dynamic Shape Capture and Analysis (4DMOD 2013)pp. 307-314

    We present a method for reconstructing the geometry and appearance of indoor scenes containing dynamic human subjects using a single (optionally moving) RGBD sensor. We introduce a framework for building a representation of the articulated scene geometry as a set of piecewise rigid parts which are tracked and accumulated over time using moving voxel grids containing a signed distance representation. Data association of noisy depth measurements with body parts is achieved by online training of a prior shape model for the specific subject. A novel frame-to-frame model registration is introduced which combines iterative closest-point with additional correspondences from optical flow and prior pose constraints from noisy skeletal tracking data. We quantitatively evaluate the reconstruction and tracking performance of the approach using a synthetic animated scene. We demonstrate that the approach is capable of reconstructing mid-resolution surface models of people from low-resolution noisy data acquired from a consumer RGBD camera. © 2013 IEEE.

    V Brujic-Okretic, J Guillemaut, L Hitchin, M Michielen, G Parker (2004)Real-time scene reconstruction for remote vehicle navigation, In: Geometric Modeling and Computing: Seattle 2003pp. 113-123
    M Sarim, A Hilton, Jean-Yves Guillemaut, T Takai, Hansung Kim (2010)Natural image matting for multiple wide-baseline views, In: Proceedings of 17th IEEE International Conference on Image Processing (ICIP)pp. 2233-2236

    In this paper we present a novel approach to estimate the alpha mattes of a foreground object captured by a widebaseline circular camera rig provided a single key frame trimap. Bayesian inference coupled with camera calibration information are used to propagate high confidence trimaps labels across the views. Recent techniques have been developed to estimate an alpha matte of an image using multiple views but they are limited to narrow baseline views with low foreground variation. The proposed wide-baseline trimap propagation is robust to inter-view foreground appearance changes, shadows and similarity in foreground/background appearance for cameras with opposing views enabling high quality alpha matte extraction using any state-of-the-art image matting algorithm.

    Mark Brown, David Windridge, Jean-Yves Guillemaut (2016)A Generalised Framework for Saliency-Based Point Feature Detection, In: Computer Vision and Image Understanding157pp. 117-137 Elsevier

    Here we present a novel, histogram-based salient point feature detector that may naturally be applied to both images and 3D data. Existing point feature detectors are often modality specific, with 2D and 3D feature detectors typically constructed in separate ways. As such, their applicability in a 2D-3D context is very limited, particularly where the 3D data is obtained by a LiDAR scanner. By contrast, our histogram-based approach is highly generalisable and as such, may be meaningfully applied between 2D and 3D data. Using the generalised approach, we propose salient point detectors for images, and both untextured and textured 3D data. The approach naturally allows for the detection of salient 3D points based jointly on both the geometry and texture of the scene, allowing for broader applicability. The repeatability of the feature detectors is evaluated using a range of datasets including image and LiDAR input from indoor and outdoor scenes. Experimental results demonstrate a significant improvement in terms of 2D-2D and 2D-3D repeatability compared to existing multi-modal feature detectors.

    M Brown, D Windbridge, J Guillemaut (2015)Globally Optimal 2D-3D Registration from Points or Lines Without Correspondences, In: Proceedings of International Conference on Computer Vision (ICCV 2015)

    We present a novel approach to 2D-3D registration from points or lines without correspondences. While there exist established solutions in the case where correspondences are known, there are many situations where it is not possible to reliably extract such correspondences across modalities, thus requiring the use of a correspondence-free registration algorithm. Existing correspondence-free methods rely on local search strategies and consequently have no guarantee of finding the optimal solution. In contrast, we present the first globally optimal approach to 2D-3D registration without correspondences, achieved by a Branch-and-Bound algorithm. Furthermore, a deterministic annealing procedure is proposed to speed up the nested branch-and-bound algorithm used. The theoretical and practical advantages this brings are demonstrated on a range of synthetic and real data where it is observed that the proposed approach is significantly more robust to high proportions of outliers compared to existing approaches.

    J Kilner, J-Y Guillemaut, A Hilton (2010)3D action matching with key-pose detection, In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009pp. 1-8

    This paper addresses the problem of human action matching in outdoor sports broadcast environments, by analysing 3D data from a recorded human activity and retrieving the most appropriate proxy action from a motion capture library. Typically pose recognition is carried out using images from a single camera, however this approach is sensitive to occlusions and restricted fields of view, both of which are common in the outdoor sports environment. This paper presents a novel technique for the automatic matching of human activities which operates on the 3D data available in a multi-camera broadcast environment. Shape is retrieved using multi-camera techniques to generate a 3D representation of the scene. Use of 3D data renders the system camera-pose-invariant and allows it to work while cameras are moving and zooming. By comparing the reconstructions to an appropriate 3D library, action matching can be achieved in the presence of significant calibration and matting errors which cause traditional pose detection schemes to fail. An appropriate feature descriptor and distance metric are presented as well as a technique to use these features for key-pose detection and action matching. The technique is then applied to real footage captured at an outdoor sporting event. ©2009 IEEE.

    J-Y Guillemaut, A Hilton, J Starck, J Kilner, O Grau (2007)A Bayesian framework for simultaneous matting and 3D reconstruction, In: 3DIM 2007: Sixth International Conference on 3-D Digital Imaging and Modeling, Proceedingspp. 167-174
    JY Guillemaut, AS Aguado, J Illingworth (2003)Calibration of a zooming camera using the Normalized Image of the Absolute Conic, In: FOURTH INTERNATIONAL CONFERENCE ON 3-D DIGITAL IMAGING AND MODELING, PROCEEDINGSpp. 225-232
    D Casas, M Tejera, J-Y Guillemaut, A Hilton (2012)Parametric animation of performance-captured mesh sequences, In: COMPUTER ANIMATION AND VIRTUAL WORLDS23(2)pp. 101-111 WILEY-BLACKWELL
    H Kim, M Sarim, T Takai, J-Y Guillemaut, A Hilton (2010)Dynamic 3D Scene Reconstruction in Outdoor Environments, In: In Proc. IEEE Symp. on 3D Data Processing and Visualization

    A number of systems have been developed for dynamic 3D reconstruction from multiple view videos over the past decade. In this paper we present a system for multiple view reconstruction of dynamic outdoor scenes transferring studio technology to uncontrolled environments. A synchronised portable multiple camera system is composed of off-the-shelf HD cameras for dynamic scene capture. For foreground extraction, we propose a multi-view trimap propagation method which is robust against dynamic changes in appearance between views and over time. This allows us to apply state-of-the-art natural image matting algorithms for multi-view sequences with minimal interaction. Optimal 3D surface of the foreground models are reconstructed by integrating multi-view shape cues and features. For background modelling, we use a line scan camera with a fish eye lens to capture a full environment with high resolution. The environment model is reconstructed from a spherical stereo image pair with sub-pixel correspondence. Finally the foreground and background models are merged into a 3D world coordinate and the composite model is rendered from arbitrary viewpoints. We show that the proposed system generates high quality scene images with dynamic virtual camera actions.

    Matthew James Bailey, Adrian Douglas Mark Hilton, Jean-Yves Guillemaut (2022)Finite Aperture Stereo: 3D Reconstruction of Macro-Scale Scenes, In: Finite Aperture Stereo Institute of Electrical and Electronics Engineers (IEEE)

    While the accuracy of multi-view stereo (MVS) has continued to advance, its performance reconstructing challenging scenes from images with a limited depth of field is generally poor. Typical implementations assume a pinhole camera model, and therefore treat defocused regions as a source of outlier. In this paper, we address these limitations by instead modelling the camera as a thick lens. Doing so allows us to exploit the complementary nature of stereo and defocus information, and overcome constraints imposed by traditional MVS methods. Using our novel reconstruction framework, we recover complete 3D models of complex macro-scale scenes. Our approach demonstrates robustness to view-dependent materials, and outperforms state-of-the-art MVS and depth from defocus across a range of real and synthetic datasets.

    J Kilner, J Starck, Jean-Yves Guillemaut, A Hilton (2009)Objective Quality Assessment in Free-viewpoint Video Production, In: Signal Processing: Image Communication24(1-2)pp. 3-16 Elsevier
    J Kilner, J-Y Guillemaut, A Hilton (2010)Summarised hierarchical Markov models for speed-invariant action matching, In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009pp. 1065-1072

    Action matching, where a recorded sequence is matched against, and synchronised with, a suitable proxy from a library of animations, is a technique for generating a synthetic representation of a recorded human activity. This proxy can then be used to represent the action in a virtual environment or as a prior on further processing of the sequence. In this paper we present a novel technique for performing action matching in outdoor sports environments. Outdoor sports broadcasts are typically multi-camera environments and as such reconstruction techniques can be applied to the footage to generate a 3D model of the scene. However due to poor calibration and matting this reconstruction is of a very low quality. Our technique matches the 3D reconstruction sequence against a predefined library of actions to select an appropriate high quality synthetic representation. A hierarchical Markov model combined with 3D summarisation of the data allows a large number of different actions to be matched successfully to the sequence in a rate-invariant manner without prior segmentation of the sequence into discrete units. The technique is applied to data captured at rugby and soccer games. ©2009 IEEE.

    D Casas, M Tejera, Jean-Yves Guillemaut, A Hilton (2013)Interactive Animation of 4D Performance Capture., In: IEEE Trans. Vis. Comput. Graph.195pp. 762-773
    M Sarim, A Hilton, Jean-Yves Guillemaut (2009)WIDE-BASELINE MATTE PROPAGATION FOR INDOOR SCENES, In: 2009 CONFERENCE FOR VISUAL MEDIA PRODUCTION: CVMP 2009pp. 195-204

    This paper presents a method to estimate alpha mattes for video sequences of the same foreground scene from wide-baseline views given sparse key-frame trimaps in a single view. A statistical inference framework is introduced for spatio-temporal propagation of high-confidence trimap labels between video sequences without a requirement for correspondence or camera calibration and motion estimation. Multiple view trimap propagation integrates appearance information between views and over time to achieve robust labelling in the presence of shadows, changes in appearance with view point and overlap between foreground and background appearance. Results demonstrate that trimaps are sufficiently accurate to allow high-quality video matting using existing single view natural image matting algorithms. Quantitative evaluation against ground-truth demonstrates that the approach achieves accurate matte estimation for camera views separated by up to 180◦ , with the same amount of manual interaction required for conventional single view video matting.

    A Neophytou, J-Y Guillemaut, A Hilton (2015)A dense surface motion capture system for accurate acquisition of cloth deformation, In: CVMP 2015: PROCEEDINGS OF THE 12TH EUROPEAN CONFERENCE ON VISUAL MEDIA PRODUCTION
    JY Guillemaut, O Drbohlav, R Sara, J Illingworth (2004)Helmholtz stereopsis on rough and strongly textured surfaces, In: Y Aloimonos, G Taubin (eds.), 2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGSpp. 10-17

    Helmholtz Stereopsis (HS) has recently been explored as a promising technique for capturing shape of objects with unknown reflectance. So far, it has been widely applied to objects of smooth geometry and piecewise uniform Bidirectional Reflectance Distribution Function (BRDF). Moreover, for nonconvex surfaces the inter-reflect ion effects have been completely neglected. We extend the method to surfaces which exhibit strong texture, nontrivial geometry and are possibly nonconvex. The problem associated with these surface features is that Helmholtz reciprocity is apparently violated when point-based measurements are used independently to establish the matching constraint as in the standard HS implementation. We argue that the problem is avoided by computing radiance measurements on image regions corresponding exactly to projections of the same surface point neighbourhood with appropriate scale. The experimental results demonstrate the success of the novel method proposed on real objects.

    M Sarim, JY Guillemaut, H Kim, A Hilton (2009)Wide-baseline Image Matting, In: European Conference on Visual Media Production(CVMP)
    M Sarim, A Hilton, J-Y Guillemaut, H Kim (2017)Non-parametric Natural Image Mattingpp. 3213-3216
    J Guillemaut, A Aguado, J Illingworth (2002)Using points at infinity for parameter decoupling in camera calibration1pp. 263-272