Neural Machine Translation

"Neural machine translation uses deep learning models to translate languages, overcoming limitations of rule-based approaches. The transformer model, with its self-attention mechanism, has revolutionized neural machine translation by allowing the model to focus on relevant parts of the source sentence when generating the translation. This has led to significant improvements in translation accuracy and fluency."- Gemini 2024

Humans have been translating works for millenia, so it's no surprise we've programmed computers to help with the task.

Common machine translation tasks

  • Translating information delivered via technology, such as web pages, email, subtitles
  • On demand translations within software applications (chat, menus, etc.)
  • First draft translation to assist a human translator
Challenges

There are thousands of human languages that may share some structural similarities, but are still different in many ways.

A key challenge in translating text from a source language to a target language is that the two languages may not agree in terms of the order or number of words required for an accurate translation.

This example illustrates just some of the problems that may arise

He saw a black cat under a ladder

Il a vu un chat noir sous une échelle.

This trivial example has several differences in sentence structure.

Another example deals with word count/order in negation

I am not tired.

Je ne suis pas fatigué.

These and other linguistic typologies contribute to translation challenges. Including:

World Atlas of Language Structures (WALS) provides typological structures of languages.

Solution Architecture

The standard architecture for machine translation is a sequence-to-sequence model, or more precisely, an encoder-decoder network architecture.



Training a Model

Before we can use our encoder-decoder, we need to train a model.

  1. Input (training data): a parallel corpus of text in the source and target languages, often presented as aligned pairs of sentences
  2. Apply tokenization (word, subword, char, etc.) algorithm - e.g. wordpiece, sentencepiece, et al (Summary of Tokenizers on Hugging Face)
  3. Train model to build a fixed vocabulary based on the source/target tokens
  4. Output: trained model ready to perform translation

A deeper dive: Transformers-based Encoder-Decoder Models

Training in Low-resource Environments

An ongoing research question is how to perform quality translations when a source or target language does not have a large corpora of parallel training texts available.

Solving this low-resource problem requires creative approaches. Here we list some common techniques.

Data Augmentation in NLP

In data augmentation we aim to generate new synthetic data based on available natural data. In the techniques that follow, it is import to consider the language pair in question and avoid over-augmentation. Choosing the wrong technique or augmenting too much may lead to non-sensical training data.

Multilingual Training

In bilingual translation a model is trained to translate from one language to another. A model that is instead trained on parallel sentences in many languages is a multilinguage model.

Benefits low-resource languages with similar higher resource languages

Evaluating a Model

Evaluation concerns in machine translation

  • Adequacy: how well the translation captures the exact meaning of the source sentence (aka, faithfulness or fidelity).
  • Fluency: how fluent the translation is in the target language (is it grammatical, clear, readable, natural, ...)

Evaluation options

  • Human Raters
  • Automated Evaluation

Pros and Cons of Human vs. Automated Evaluation of Machine Translation

Method Pros Cons
Human
Evaluation
  • High accuracy - Can assess fluency, naturalness, and cultural appropriateness
  • Flexibility - Can handle complex sentences and nuanced errors
  • Provides insights into user experience
  • Time-consuming and expensive
  • Subjectivity - Evaluators may have different preferences
  • Scalability - Difficult to evaluate large volumes of translations
Automated Evaluation
  • Fast and scalable - Can evaluate large datasets quickly
  • Objective - Uses consistent metrics
  • Cost-effective - No need for human evaluators
  • Limited accuracy - May not capture fluency or cultural appropriateness
  • Difficulty with complex sentences - May struggle with nuanced errors
  • Doesn't reflect user experience - Doesn't assess naturalness for humans

The best approach to evaluation often involves a combination of human and automated methods. Automated evaluation can be used for initial screening and large datasets, while human evaluation can be used for more in-depth analysis and final judgment.

Evaluation Metrics

Shortcomings

Further Reading: BLEU on Google Cloud