Shenbin Qian
About
My research project
Emotion preservation in neural machine translationDespite the progress of neural machine translation in recent years, the translation of emotion-intensive texts like microblogs on social media is still not satisfactory. While current studies in natural language processing focus on developing larger general-purpose language models and using fewer parallel data, this project investigates the errors made by current neural machine translation systems and its linguistic background, and then develops a neural machine translation system specifically targeted to emotion-intensive texts to better convey human emotions.
Supervisors
Despite the progress of neural machine translation in recent years, the translation of emotion-intensive texts like microblogs on social media is still not satisfactory. While current studies in natural language processing focus on developing larger general-purpose language models and using fewer parallel data, this project investigates the errors made by current neural machine translation systems and its linguistic background, and then develops a neural machine translation system specifically targeted to emotion-intensive texts to better convey human emotions.
My qualifications
ResearchResearch interests
- Machine Translation;
- Sentiment Analysis;
- Topic Modeling;
- Text Mining;
- Topics Related to Language Technology
Research interests
- Machine Translation;
- Sentiment Analysis;
- Topic Modeling;
- Text Mining;
- Topics Related to Language Technology
Publications
This paper summarises the submissions our team, SURREY-CTS-NLP has made for the WASSA 2022 Shared Task for the prediction of empathy, distress and emotion. In this work, we tested different learning strategies, like ensemble learning and multi-task learning, as well as several large language models, but our primary focus was on analysing and extracting emotion-intensive features from both the essays in the training data and the news articles, to better predict empathy and distress scores from the perspective of discourse and sentiment analysis. We propose several text feature extraction schemes to compensate the small size of training examples for fine-tuning pretrained language models, including methods based on Rhetorical Structure Theory (RST) parsing, cosine similarity and sentiment score. Our best submissions achieve an average Pearson correlation score of 0.518 for the empathy prediction task and an F1 score of 0.571 for the emotion prediction task(1), indicating that using these schemes to extract emotion-intensive information can help improve model performance.
In this paper, we focus on how current Machine Translation (MT) tools perform on the translation of emotion-loaded texts by evaluating outputs from Google Translate according to a framework proposed in this paper. We propose this evaluation framework based on the Multidimensional Quality Metrics (MQM) and perform a detailed error analysis of the MT outputs. From our analysis, we observe that about 50% of the MT outputs fail to preserve the original emotion. After further analysis of the errors, we find that emotion carrying words and linguistic phenomena such as polysemous words, negation, abbreviation etc., are common causes for these translation errors.