Audio and video based speech separation for multiple moving sources within a room environment

Overview

Human beings have developed a unique ability to communicate within a noisy environment, such as at a cocktail party. This skill is dependent upon the use of both the aural and visual senses together with sophisticated processing within the brain. To mimic this ability within a machine is very challenging, particularly if the humans are moving. This project attempts to address major challenges in audio-visual speaker localization, tracking and separation.