data:image/s3,"s3://crabby-images/e32cf/e32cf3289cf3f3b16ff85b41dcbbf131fbc3374f" alt="profile-image-qian"
Shenbin Qian
About
My research project
Emotion preservation in neural machine translationDespite the progress of neural machine translation in recent years, the translation of emotion-intensive texts like microblogs on social media is still not satisfactory. While current studies in natural language processing focus on developing larger general-purpose language models and using fewer parallel data, this project investigates the errors made by current neural machine translation systems and its linguistic background, and then develops a neural machine translation system specifically targeted to emotion-intensive texts to better convey human emotions.
Supervisors
Despite the progress of neural machine translation in recent years, the translation of emotion-intensive texts like microblogs on social media is still not satisfactory. While current studies in natural language processing focus on developing larger general-purpose language models and using fewer parallel data, this project investigates the errors made by current neural machine translation systems and its linguistic background, and then develops a neural machine translation system specifically targeted to emotion-intensive texts to better convey human emotions.
My qualifications
ResearchResearch interests
- Machine Translation;
- Sentiment Analysis;
- Topic Modeling;
- Text Mining;
- Topics Related to Language Technology
Research interests
- Machine Translation;
- Sentiment Analysis;
- Topic Modeling;
- Text Mining;
- Topics Related to Language Technology
Publications
This paper investigates the reference-less evaluation of machine translation for low-resource language pairs, known as quality estimation (QE). Segment-level QE is a challenging cross-lingual language understanding task that provides a quality score (0 − 100) to the translated output. We comprehensively evaluate large language models (LLMs) in zero/few-shot scenarios and perform instruction fine-tuning using a novel prompt based on annotation guidelines. Our results indicate that prompt-based approaches are outperformed by the encoder-based fine-tuned QE models. Our error analysis reveals tokenization issues, along with errors due to transliteration and named entities, and argues for refinement in LLM pre-training for cross-lingual tasks. We release the data, and models trained publicly for further research.
Leveraging large language models (LLMs) for various natural language processing tasks has led to superlative claims about their performance. For the evaluation of machine translation (MT), existing research shows that LLMs are able to achieve results comparable to fine-tuned multilingual pre-trained language models. In this paper, we explore what translation information, such as the source, reference, translation errors and annotation guidelines, is needed for LLMs to evaluate MT quality. In addition, we investigate prompting techniques such as zero-shot, Chain of Thought (CoT) and few-shot prompting for eight language pairs covering high-, medium-and low-resource languages, leveraging varying LLM variants. Our findings indicate the importance of reference translations for an LLM-based evaluation. While larger models do not necessarily fare better, they tend to benefit more from CoT prompting, than smaller models. We also observe that LLMs do not always provide a numerical score when generating evaluations, which poses a question on their reliability for the task. Our work presents a comprehensive analysis for resource-constrained and training-less LLM-based evaluation of machine translation. We release the accrued prompt templates, code and data publicly for reproducibility.
This paper summarises the submissions our team, SURREY-CTS-NLP has made for the WASSA 2022 Shared Task for the prediction of empathy, distress and emotion. In this work, we tested different learning strategies, like ensemble learning and multi-task learning, as well as several large language models, but our primary focus was on analysing and extracting emotion-intensive features from both the essays in the training data and the news articles, to better predict empathy and distress scores from the perspective of discourse and sentiment analysis. We propose several text feature extraction schemes to compensate the small size of training examples for fine-tuning pretrained language models, including methods based on Rhetorical Structure Theory (RST) parsing, cosine similarity and sentiment score. Our best submissions achieve an average Pearson correlation score of 0.518 for the empathy prediction task and an F1 score of 0.571 for the emotion prediction task(1), indicating that using these schemes to extract emotion-intensive information can help improve model performance.
In this paper, we focus on how current Machine Translation (MT) tools perform on the translation of emotion-loaded texts by evaluating outputs from Google Translate according to a framework proposed in this paper. We propose this evaluation framework based on the Multidimensional Quality Metrics (MQM) and perform a detailed error analysis of the MT outputs. From our analysis, we observe that about 50% of the MT outputs fail to preserve the original emotion. After further analysis of the errors, we find that emotion carrying words and linguistic phenomena such as polysemous words, negation, abbreviation etc., are common causes for these translation errors.