Funded by

Kontext MT

  • Contact:

    Sai Arjun Koneru, Jan Niehues

  • Funding:

    Industrieproject

  • Partner:

    SAP

  • Startdate:

    01.03.2022

  • Enddate:

    28.02.2025

Although existing Machine Translation systems have achieved impressive performance on many language pairs, several challenges are yet to be solved. Domain-mismatch for training and test data, limited amounts of labelled in-domain data, specialised terminology, translating conversational content are few examples of open problems in the current MT research. In the SAP-KIT “Kontext-MT” project, we aim to solve some of the problems described above to improve software localisation for SAP products.

The core idea of the project is to use additional contextual information for improving the Neural MT (NMT) models. Majority of the current MT systems rely on only the source sentence to generate a target translation. However, such systems ignore information that is necessary in producing the accurate translation. For example, consider the word “driver” as the user-interface text we need to translate for a taxi and software application (app). In the taxi app, we should translate the text so that it means somebody who is driving the vehicle. In the software app, it should be translated to software drivers and not as mentioned before. However, we cannot know exactly how to translate with the source sentence alone and need additional context.

Furthermore, we will focus on using also other sources of contextual informations to improve the overall quality. We need to maintain consistency (same translation for source sentences that mean the same thing), translate in-domain special terms (use information dictionary/ additional resources) and follow the length restrictions (Use screenshots so that generated translation does not exceed the text box length but also of high-quality). Therefore, we hope to improve the current MT systems using context described above and enable high quality translations for SAP products.

Publications

Koneru, S., Huck, M., Exel, M., & Niehues, J. (2024, Nov). Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies. In Proceedings of the Ninth Conference on Machine Translation, pages 1467–1481, Miami, Florida, USA. Association for Computational Linguistics. Code

Koneru, S., Exel, M., Huck, M., & Niehues, J. (2024, June). Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (pp. 2711-2725). 

Koneru, S., Huck, M., Exel, M., & Niehues, J. (2023, May). Analyzing Challenges in Neural Machine Translation for Software Localization. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 2442-2454).  Code