Neural Machine Translation

"Neural machine translation uses deep learning models to translate languages, overcoming limitations of rule-based approaches. The transformer model, with its self-attention mechanism, has revolutionized neural machine translation by allowing the model to focus on relevant parts of the source sentence when generating the translation. This has led to significant improvements in translation accuracy and fluency."- Gemini 2024

Humans have been translating works for millenia, so it's no surprise we've programmed computers to help with the task.

Common machine translation tasks

Translating information delivered via technology, such as web pages, email, subtitles
On demand translations within software applications (chat, menus, etc.)
First draft translation to assist a human translator

Challenges

There are thousands of human languages that may share some structural similarities, but are still different in many ways.

A key challenge in translating text from a source language to a target language is that the two languages may not agree in terms of the order or number of words required for an accurate translation.

This example illustrates just some of the problems that may arise

He saw a black cat under a ladder

Il a vu un chat noir sous une échelle.

This trivial example has several differences in sentence structure.

8 words translate to 9 words
The single word saw translates to the phrase a vu; the conjugated verb voir in past tense
The first a translates to un; the indefinite article for masculine singular nouns
In the translation of black cat, the adjective and noun switch places and become chat noir
The second a translates to une - the indefinite article for a feminine singular noun

Another example deals with word count/order in negation

I am not tired.

Je ne suis pas fatigué.

These and other linguistic typologies contribute to translation challenges. Including:

Word order (as in the examples above)

Subject, verb, object (e.g. SVO vs VSO)
Negation
Prepositions, Postpositions, or neither

Lexical (think vocabulary)
- Ambiguous words (requiring word sense disambiguation) - different words for the same thing
- No word that translates (lexical gap)
- Plurality and gender forms
Morphological: morphemes (e.g. prefix, root, suffix) range from one to many per word across languages
Referential Density: such as requiring pronouns or not

World Atlas of Language Structures (WALS) provides typological structures of languages.

Solution Architecture

The standard architecture for machine translation is a sequence-to-sequence model, or more precisely, an encoder-decoder network architecture.

Training a Model

Before we can use our encoder-decoder, we need to train a model.

Input (training data): a parallel corpus of text in the source and target languages, often presented as aligned pairs of sentences
Apply tokenization (word, subword, char, etc.) algorithm - e.g. wordpiece, sentencepiece, et al (Summary of Tokenizers on Hugging Face)
Train model to build a fixed vocabulary based on the source/target tokens
Output: trained model ready to perform translation

A deeper dive: Transformers-based Encoder-Decoder Models

Training in Low-resource Environments

An ongoing research question is how to perform quality translations when a source or target language does not have a large corpora of parallel training texts available.

Solving this low-resource problem requires creative approaches. Here we list some common techniques.

Data Augmentation in NLP

In data augmentation we aim to generate new synthetic data based on available natural data. In the techniques that follow, it is import to consider the language pair in question and avoid over-augmentation. Choosing the wrong technique or augmenting too much may lead to non-sensical training data.

Back-Translation: Translate a sentence in the target language back to the source language using a different machine translation model. This may create a new source language sentence slightly different from the original that can be paired with the target sentence in the training data.

# Back-translation with Hugging Face transformer pipelines

    from transformers import pipeline

    target = pipeline('translation', model='Helsinki-NLP/opus-mt-en-fr')
    source = pipeline("translation", model='Helsinki-NLP/opus-mt-fr-en')

    sentence = 'This is an English sentence.'

    translated = target(sentence)[0]['translation_text']
    back_trans = source(translated)[0]['translation_text']

    print(sentence)
    print(translated)
    print(back_trans)

    # Outputs
    # This is an English sentence.
    # C'est une phrase anglaise.
    # It's an English phrase.

In addition to data augmentation, back translation can be used to highlight translation errors.

Source Modifications: Modify the original source sentences and/or back translated sentences to create new sentences that preserve meaning.

Lexical Changes: Replace words with synonyms - using a thesaurus, statistically similar word embeddings, or contextualized word embeddings as in this paper or this one.
Morphological Changes: Change prefixes, suffixes and verb tense
Syntactic Changes: Shuffling word order when the order is flexible between the language pair. Splitting or merging sentences where allowed by the language to create shorter or longer sentences. Switching active to passive voice, or negating sentences.

Text Generation: Train a machine learning model on existing sentence pairs and use the model to generate new variations in the source language.
Noise injection: Inject noise (e.g. spelling errors, minor word replacement) into the training data. May result in a model more robust to imperfect real data.

Multilingual Training

In bilingual translation a model is trained to translate from one language to another. A model that is instead trained on parallel sentences in many languages is a multilinguage model.

Input bitexts of many language pairs
Add a token to the encoder specifying the source language

Benefits low-resource languages with similar higher resource languages

Evaluating a Model

Evaluation concerns in machine translation

Adequacy: how well the translation captures the exact meaning of the source sentence (aka, faithfulness or fidelity).
Fluency: how fluent the translation is in the target language (is it grammatical, clear, readable, natural, ...)

Evaluation options

Human Raters
Automated Evaluation

Pros and Cons of Human vs. Automated Evaluation of Machine Translation

Method	Pros	Cons
Human Evaluation	High accuracy - Can assess fluency, naturalness, and cultural appropriateness Flexibility - Can handle complex sentences and nuanced errors Provides insights into user experience	Time-consuming and expensive Subjectivity - Evaluators may have different preferences Scalability - Difficult to evaluate large volumes of translations
Automated Evaluation	Fast and scalable - Can evaluate large datasets quickly Objective - Uses consistent metrics Cost-effective - No need for human evaluators	Limited accuracy - May not capture fluency or cultural appropriateness Difficulty with complex sentences - May struggle with nuanced errors Doesn't reflect user experience - Doesn't assess naturalness for humans

The best approach to evaluation often involves a combination of human and automated methods. Automated evaluation can be used for initial screening and large datasets, while human evaluation can be used for more in-depth analysis and final judgment.

Evaluation Metrics

BLEU (BiLingual Evaluation Understudy) - A function of the n-gram word precision over all the sentences combined with a brevity penalty computed over the corpus as a whole.
chrF (character F-score) - Based on the idea that a good machine translation will tend to contain characters and words that occur in a human translation of the same sentence.

Shortcomings

Limited Focus: BLEU and chrF scores primarily focus on n-gram precision (n-gram overlap between translated and reference text). They don't directly evaluate fluency, naturalness, or semantic accuracy.
Sensitivity to Order: These metrics can penalize translations with slightly different word order, even if the overall meaning is conveyed correctly.
Inconsideration of Nuance: They struggle to capture the subtleties of human language, such as synonyms, paraphrases, and cultural references.
Potential Gaming: In some cases, high BLEU/chrF scores can be achieved by translations that are grammatically correct but unnatural or don't fully capture the original meaning.