Machine Translation
Introduction
Approaches to machine translation
Rule-based translation
Statistical translation
Статистический перевод
Rule-based vs. statistical
Rule-based translation
Dictionary-based (direct)
Direct translation example
Transfer-based machine translation
Interlingua machine translation
Transfer vs. interlingua
Hybrid machine translation 
MT software
1.27M
Category: softwaresoftware

Machine Translation

1. Machine Translation

MT

2. Introduction

sub-field of computational linguistics that investigates the use of
software to translate text or speech from one natural language to
another (http://en.wikipedia.org/)
Use: translation of large amount of date in the shortest possible
time
Standard documents
Instructions and manuals
Web sites, multilingual search
Reference information(addresses, recipes, etc.)
Aim: to understand the main contents of the document in a
foreign language unknown to the user
NOT to be used instead of human translation !!!

3. Approaches to machine translation

Rule-based approach
Statistical
Example-based approach
Hybrid machine translation

4. Rule-based translation

Stages
Morphological
analyses of source
language
Parsing source
language
(syntactic groups)
Getting syntactic
information about
each word
Dictionary based
translation
example:
A girl eats an apple. (Eng.-Ger.)
stages of translation:
1st: getting basic part-of-speech
information of each source word: a =
ind.art.; girl = n.; eats = v.; an =
ind.art.; apple = n.
2nd: getting syntactic information
about the verb “to eat”: here: eat –
Pr. Simple, 3rd Pers. Sing., Act. V.
3rd: parsing the source sentence:(an
apple) = the object of eat
4th: translate English words into
Germana (category = indef.article)
=> ein (category = indef.article)girl
(category = noun) => Mädchen…
5th: finding appropriate inflected
forms: A girl eats an apple. => Ein
Mädchen isst einen Apfel.

5. Statistical translation

Translations are generated according to
probability distribution on the basis of
statistical models whose parameters are derived
from the analysis of bilingual text corpora
Benefits
Better use of resources
More natural translations
No programmers or linguists* involved
Shortcomings
Corpus creation can be costly for users with limited resources.
The results are unexpected. Superficial fluency can be deceiving.
Statistical machine translation does not work well between
languages that have significantly different word orders

6. Статистический перевод

Основа - параллельный корпус
Вероятности назначаются подсчетом
наиболее вероятного варианта перевода
Оценки вероятности зависят от объема и
качества обучающего корпуса
Лингвистическая информация: разбиение на
предложения, графематический анализ,
морфология
При наличии корпуса простейшая система
перевода может быть сделана на 2 недели

7. Rule-based vs. statistical

news:
document:

8. Rule-based translation

Types
Dictionarybased (direct)
Transfer-based
Interlingual

9. Dictionary-based (direct)

word by word translation
with or without morphological analysis or
lemmatisation
Application
translation of long lists of phrases on the
subsentential (i.e., not a full sentence) level,
e.g. lists, inventories or simple catalogs of products
and services.

10. Direct translation example

11. Transfer-based machine translation

1. Analyzing the input text
for morphology and syntax
(and sometimes semantics)
Sentence in a source language
analysis
Source language structure
2. Creating an
representation
internal
3. Generating translation
using
both
bilingual
dictionaries
and
grammatical rules
transfer
Target language structure
synthesis
Sentence in a target language

12. Interlingua machine translation

the source language is
transformed into an
interlingua, i.e., an
abstract languageindependent
representation
the target language is
generated from the
interlingua.

13. Transfer vs. interlingua

14. Hybrid machine translation 

Hybrid machine translation
method of machine translation characterized by
the use of multiple approaches within a single
machine translation system.
Types:
RBMT guided by statistics
Statistical method guided by RBMT

15. MT software

Name
Platform
Freeware/commercial
Type
Google Translate
Cross-platform
(Web application)
Freeware
Statistical
Commercial
Hybrid rules-based
and SMT
Commercial
Hybrid rules-based
and SMT
SYSTRAN
Promt
Cross-platform
(Web application)
Cross-platform
English     Русский Rules