Natural Language Processing

Our group focuses on Natural Language Processing (NLP), an interdisciplinary subfield of Artificial Intelligence (AI), concerned with how machines deal with languages.

Overview

We aim to advance NLP through both theoretical and applied approaches, developing algorithms and deployable solutions for interdisciplinary problems in various domains.

Our research has been applied towards semantic knowledge representation and reasoning in AI systems, large language models, adaptive privacy preserving models for social networks, metadata curation from longitudinal datasets, and credit risk analysis models for business ecosystems, among others.

Including both language understanding and generation, our research also addresses ethical considerations in AI, ensuring inclusivity and fairness across diverse linguistic and cultural landscapes.

Meet the team

Dr Diptesh Kanojia

Lecturer in Artificial Intelligence for Natural Language Processing, lead of NLP theme

Professor H Lilian Tang

Professor in Artificial Intelligence

Dr Frank Guerin

Senior Lecturer

Dr Suparna De

Senior Lecturer in Computer Science

Dr Alaa Marshan

Senior Lecturer in Intelligent Data Analysis

Zhenhua Feng

(Visiting)

Dr Lu Yin

Lecturer

Publications

Lent, H., Tatariya, K., Dabre, R., Chen, Y., Fekete, M., Ploeger, E., Zhou, L., Armstrong, R.A., Eijansantos, A., Malau, C. and Heje, H.E., Lavrinovics, E., Kanojia, D., Belony, P., Bollmann, M., Grobol, L., de Lhoneux, M., Hershcovich, D., DeGraff, M., Søgaard, A., Bjerva, J., 2024. “CreoleVal: Multilingual Multitask Benchmarks for Creoles”. Transactions of the Association for Computational Linguistics, 12.
Wang, Z., Wang, Y., Zhang, H., Wang, W., Qi, J., Chen, J., Sastry, N., Johnson, J. and De, S., 2024. ICDXML: enhancing ICD coding with probabilistic label trees and dynamic semantic representations. Scientific Reports, 14(1).
Wang, Y., Wang, Z., Wang, W., Chen, Q., Huang, K., Nguyen, A. and De, S. 2024, June. Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness. SemEval-2024 - The 18th International Workshop on Semantic Evaluation. NAACL 2024.
Huang, Q., Liu, X., Ko, T., Wu, B., Wang, W., Zhang, Y. and Tang, L., 2024, August. Selective Prompting Tuning for Personalized Conversations with LLMs. In Findings of the Association for Computational Linguistics. ACL 2024.
Marshan A, Almutairi AN, Ioannou A, Bell D, Monaghan A and Arzoky M (2024) MedT5SQL: a transformers-based large language model for text-to-SQL conversion in the healthcare domain. Frontiers in Big Data 7:1371680
Li, Y., Wang, S., Lin, C. and Guerin, F., 2023, July. Metaphor Detection via Explicit Basic Meanings Modelling. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. ACL 2023.
Deoghare, S., Choudhary, P., Kanojia, D., Ranasinghe, T., Bhattacharyya, P. and Orašan, C., 2023, July. A Multi-task Learning Framework for Quality Estimation. In Findings of the Association for Computational Linguistics: ACL 2023.
Tang, C., Zhang, H., Loakman, T., Lin, C. and Guerin, F., 2023, July. Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. ACL 2023.
Marshan, A., Mbedzi, M. and Ioannou, A., 2023, July. Exploring the Relationship Between News Articles and Stocks Market Movements: A Sentiment Analysis Perspective. In Science and Information Conference. Cham: Springer Nature Switzerland.
Huang, Q., Zhang, Y., Ko, T., Liu, X., Wu, B., Wang, W., & Tang, H. (2023). Personalized Dialogue Generation with Persona-Adaptive Attention. Proceedings of the AAAI Conference on Artificial Intelligence, 37(11). https://doi.org/10.1609/aaai.v37i11.26518.
Marshan, A., Nizar, F.N.M., Ioannou, A. and Spanaki, K., 2023. Comparing Machine Learning and Deep Learning Techniques for Text Analytics: Detecting the Severity of Hate Comments Online. Information Systems Frontiers.
Deoghare, S., Kanojia, D., Blain, F., Ranasinghe, T. and Bhattacharyya, P., 2023, December. Quality Estimation-Assisted Automatic Post-Editing. In Findings of the Association for Computational Linguistics: EMNLP 2023.
Li, Y., Dong, B., Guerin, F. and Lin, C., 2023, December. Compressing Context to Enhance Inference Efficiency of Large Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. EMNLP 2023.
Huang, Q., Fu, S., Liu, X., Wang, W., Ko, T., Zhang, Y. and Tang, L., Learning Retrieval Augmentation for Personalized Dialogue Generation. In The 2023 Conference on Empirical Methods in Natural Language Processing. EMNLP 2023.
De, S., Wang, W., Jassat, U., and Moessner, K., 2022, Usage Mining of the London Santander Bike-Sharing System, Computer 55 (12).

Funded Projects

METACURATE-ML: Extraction and Utilisation of Metadata from Non-machine-actionable Documents to Improve Data Curation and Discovery, 2024-25, Total grant award: £757k (UKRI ESRC; Leading Scientist: S. De)
Sponsored project: Emotion AI for trading on financial markets, 2021-2023, Total grant award: £190,488 (Innovate UK, KTP; Leading Scientist: A. Marshan).
Sponsored project: Lightweight Contextual Character-based Embeddings, 2023 - 2024, Total grant award: £98,200 (eBay Inc.; Leading Scientist: D. Kanojia)
Sponsored project: A Benchmark for Sentiment Sarcasm Classification for Dialects of English, 2024-2025, Total grant award: £47,241 (Google Research; Leading Scientist: D. Kanojia)
Sponsored project: Prompt-based Explainable Quality Estimation for Malayalam, 2024-2025. (European Association for Machine Translation; Leading Scientist: D. Kanojia)
Large Language Models for Open-Ended Dialogue in Computer Games, 2024-25, Total grant award: £36k (GAIN PoC Funding; Leading Scientist: F. Guerin)
Sponsored project: Quality Estimation for Low-resource Indic languages, 2023 – 2024. (European Association for Machine Translation; Leading Scientist: D. Kanojia)
Understanding the multiple dimensions of prediction of concepts in social and biomedical science questionnaires, 2021-22, £23k (Science and Technology Facilities (STFC DiRAC; Leading Scientist: S. De)
Automating capturing structured content from questionnaires, 2021-22, £81.5k, (UKRI ESRC; Leading Scientist: S. De)
Machine learning to enhance metadata in cohort studies, 2021, £20k, (STFC DiRAC; Leading Scientist: S. De).

Other Achievements

Best Paper Honorable Mention at the 16 European Chapter of Association for Computational Linguistics Conference (2021).
- Kanojia, D., Sharma, P., Ghodekar, S., Bhattacharyya, P., Haffari, G. and Kulkarni, M., 2021, April. Cognition-aware Cognate Detection. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume.