Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
Showing results for 
Search instead for 
Did you mean: 
At SAP, we develop and run our own machine translation (MT) cloud service. Its functionality is embedded in SAP Translation Hub and other products. It is tailored to translate SAP-related content; software documentation being one example. Developing a machine translation solution not only includes classical software development, but mainly consists of tasks and challenges related to machine learning, such as preparing and processing large amounts of data, training neural networks and evaluating the resulting models.

Recently, we released a new generation of neural machine translation (NMT) models. They are based on a novel neural network architecture, called the Transformer. Transformers have enabled a considerable quality improvement compared to the previous MT models in production: the average increase in BLEU[1], a standard metric to evaluate the quality of machine-translated texts, is more than 5 percentage points across some of our most popular languages. In addition, we have substantially improved the placement accuracy for inline markup. In this blog post, we will look at these results in detail and explain how we achieved these improvements.

NMT Technology Updates

The Transformer is a neural network architecture that was originally proposed by researchers from Google, see Vaswani et al. (2017) for the publication. In the meantime, it has become a staple in the NMT community – as well as for other tasks in the area of natural language processing (NLP).

For details about the Transformer, we would like to refer you to other dedicated online learning resources, such as “The Illustrated Transformer” for example. Similarly to other architectures for transducing input sequences into output sequences, the Transformer also follows an encoder-decoder setup. It employs many attention mechanisms, not only between the words in the source and in the translation (cross-attention), but also within the source segment, and within the generated translation (self-attention). Self-attention performs the task of modelling relationships between words within one language (such as syntactic structure) and disambiguating words depending on the other words in their sentential context, whereas cross-attention rather models translational relationships.

Improvements in translation quality

Using the Transformer architecture has led to considerable improvements in translation quality for SAP’s machine translation. We report the quality in BLEU, which is a controversial, but standard, widely accepted and easy to calculate automatic metric for MT output. The increase amounts to +5.2 BLEU percentage points on average for the following 14 language pairs: English (en) <> German (de), Spanish (es), French (fr), Japanese (ja), Brazilian Portuguese (pt), Russian (ru), Chinese (zh). In the charts, we compare the new models based on the Transformer against the previous models for each language pair. The latter use a rather shallow network architecture based on Recurrent Neural Networks (RNN).

Translation quality - from English

Translation quality - to English

The charts clearly show consistent and considerable improvements for all involved language pairs. The improvements apply to both the internal (int) and the external (ext) test data sets (on average +4.5 and +5.8 respectively). The gains are slightly larger for the translation directions into English (+5.8) than when translating from English (+4.5). Note that the old and the new models for each language direction have been trained on the same data set respectively; thus the improvement that we experience is solely due to better neural network technology.

In the evaluation, internal test sets cover SAP data, and include data, for example, from SAP product documentation and SAP product UIs. External test sets refer to usage scenarios that are not related to SAP, and include parallel data sets from European institutions, the United Nations, TED talks, News, Wikipedia, etc. As the test sets differ between the language pairs, note that the scores are not directly comparable across language pairs.

Improvements in inline elements transfer

Another great achievement with the new generation of NMT models is that we have been able to improve the accuracy of the placement of inline elements (also known as markup or tags) in the translation, for all 14 language pairs. An accurate transfer of inline elements is essential for the automatic translation of structured documents.[2] Think of a document in which you have highlighted certain words in bold font. Obviously, you’d like to see these highlighted correctly in the translation as well.

To even be able to make statements about the accuracy of inline elements, we first had to come up with our own automatic evaluation scheme for the transfer of inline elements in MT! We are now able to report inline elements F-measure, a combination of inline elements precision and recall. It compares the inline elements in the MT output to inline elements in the reference translation. The chart below shows the detailed results side-by-side. While there have been improvements for all tested language pairs (+0.19 on average), the largest improvements have been achieved for English<>Japanese, Chinese, Russian. The scores refer to dedicated test sets with inline elements that we created specifically.

Quality of transfer of inline elements

The F-measure ranges between 0 and 1, and a value of 1 means that all inline elements in the test data have been placed as expected by the reference in the test data. Other tag positions might, however, have also been acceptable, but the automatic metric cannot capture this. Such cases can currently only be covered with human evaluation.

Next steps

This article covered the improvements for 14 language pairs that we achieved with the newest generation of machine translation models at SAP based on the Transformer architecture. We did not stop there; we have been gradually replacing the models for all language pairs. In fact, for many of those, we not only updated the neural network architecture, but, at the same time, we also added more training data from various sources. That, however, could be the topic of another blogpost.

Would you like to know more about what is behind the machine translation offerings of SAP? If you found this article interesting and wish for more updates, please let us know about it in the comment section below.

[1] See Papineni et al. (2002) for the original publication.
[2] We also published a blogpost about the Document Translation service that we offer.