Arshdeep singh

Dr Arshdeep Singh


UKAN+ Early Career Acoustic Champion, Research Fellow A in AI4S Project + Sustainability Fellow at the Institute for Sustainability
PhD

About

Areas of specialism

Audio signal processing, Audio classification; Compression of CNNs; Signal processing; Machine learning; Sustainable AI: AI Model compression

University roles and responsibilities

  • Fire warden

    Supervision

    Postgraduate research supervision

    Teaching

    Publications

    Highlights

    For full list of publications, please visit the link https://sites.google.com/view/arshdeep-singh/home/publications?authuser=0

    JA Kingh, A Singh, Mark D Plumbley (2023) "Compressing audio CNNs with graph centrality based filter pruning" in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, 2023.

    Convolutional neural networks (CNNs) are commonplace in high-performing solutions to many real-world problems, such as audio classification. CNNs have many parameters and filters, with some having a larger impact on the performance than others. This means that networks may contain many unnecessary filters, increasing a CNN's computation and memory requirements while providing limited performance benefits. To make CNNs more efficient, we propose a pruning framework that eliminates filters with the highest "commonality". We measure this commonality using the graph-theoretic concept of "centrality". We hypothesise that a filter with a high centrality should be eliminated as it represents commonality and can be replaced by other filters without affecting the performance of a network much. An experimental evaluation of the proposed framework is performed on acoustic scene classification and audio tagging. On the DCASE 2021 Task 1A baseline network, our proposed method reduces computations per inference by 71\% with 50\% fewer parameters at less than a two percentage point drop in accuracy compared to the original network. For large-scale CNNs such as PANNs designed for audio tagging, our method reduces 24\% computations per inference with 41\% fewer parameters at a slight improvement in performance.

    A Singh, T Deacon, M D. Plumbley (2024) "Environmental sound classification using raw-audio based ensemble framework", 53rd International Congress and Exposition on Noise Control Engineering (Internoise) 2024 (Nantes, France, 25/08/2024 - 29/08/2024)

    Environmental sound classification (ESC) aims to automatically recognize audio recordings from the underlying environment, such as " urban park " or " city centre ". Most of the existing methods for ESC use hand-crafted time-frequency features such as log-mel spectrogram to represent audio recordings. However, the hand-crafted features rely on transformations that are defined beforehand and do not consider the variability in the environment due to differences in recording conditions or recording devices. To overcome this, we present an alternative representation framework by leveraging a pre-trained convolutional neural network, SoundNet, trained on a large-scale audio dataset to represent raw audio recordings. We observe that the representations obtained from the intermediate layers of SoundNet lie in low-dimensional subspace. However, the dimensionality of the low-dimensional subspace is not known. To address this, an automatic compact dictionary learning framework is utilized that gives the dimensionality of the underlying subspace. The low-dimensional embeddings are then aggregated in a late-fusion manner in the ensemble framework to incorporate hierarchical information learned at various intermediate layers of SoundNet. We perform experimental evaluation on publicly available DCASE 2017 and 2018 ASC datasets. The proposed ensemble framework improves performance between 1 and 4 percentage points compared to that of existing time-frequency representations

    Aryan Choudhary, Arshdeep Singh, Vinayak Abrol and Mark D Plumbley (2024) "Efficient CNNs with Quaternion Transformations and Pruning for Audio Tagging", Proceedings of Interspeech 2024 International Speech Communication Association (ISCA) Interspeech 2024 (Kos Island, Greece, 01/09/2024 - 05/09/2024)

    This paper presents a novel approach to make convolutional neural networks (CNNs) efficient by reducing their computational cost and memory footprint. Even though large-scale CNNs show state-of-the-art performance in many tasks, high computational costs and the requirement of a large memory footprint make them resource-hungry. Therefore, deploying large-scale CNNs on resource-constrained devices poses significant challenges. To address this challenge, we propose to use quaternion CNNs, where quaternion algebra enables the memory footprint to be reduced. Furthermore, we investigate methods to reduce the memory footprint and computational cost further through pruning the quaternion CNNs. Experimental evaluation of the audio tagging task involving the classification of 527 audio events from AudioSet shows that the quaternion algebra and pruning reduce memory footprint by 90% and computational cost by 70% compared to the original CNN model while maintaining similar performance.