Similar presentations:
Machine Translation
1. Machine Translation
MT2. Introduction
sub-field of computational linguistics that investigates the use ofsoftware to translate text or speech from one natural language to
another (http://en.wikipedia.org/)
Use: translation of large amount of date in the shortest possible
time
Standard documents
Instructions and manuals
Web sites, multilingual search
Reference information(addresses, recipes, etc.)
Aim: to understand the main contents of the document in a
foreign language unknown to the user
NOT to be used instead of human translation !!!
3. Approaches to machine translation
Rule-based approachStatistical
Example-based approach
Hybrid machine translation
4. Rule-based translation
StagesMorphological
analyses of source
language
Parsing source
language
(syntactic groups)
Getting syntactic
information about
each word
Dictionary based
translation
example:
A girl eats an apple. (Eng.-Ger.)
stages of translation:
1st: getting basic part-of-speech
information of each source word: a =
ind.art.; girl = n.; eats = v.; an =
ind.art.; apple = n.
2nd: getting syntactic information
about the verb “to eat”: here: eat –
Pr. Simple, 3rd Pers. Sing., Act. V.
3rd: parsing the source sentence:(an
apple) = the object of eat
4th: translate English words into
Germana (category = indef.article)
=> ein (category = indef.article)girl
(category = noun) => Mädchen…
5th: finding appropriate inflected
forms: A girl eats an apple. => Ein
Mädchen isst einen Apfel.
5. Statistical translation
Translations are generated according toprobability distribution on the basis of
statistical models whose parameters are derived
from the analysis of bilingual text corpora
Benefits
Better use of resources
More natural translations
No programmers or linguists* involved
Shortcomings
Corpus creation can be costly for users with limited resources.
The results are unexpected. Superficial fluency can be deceiving.
Statistical machine translation does not work well between
languages that have significantly different word orders
6. Статистический перевод
Основа - параллельный корпусВероятности назначаются подсчетом
наиболее вероятного варианта перевода
Оценки вероятности зависят от объема и
качества обучающего корпуса
Лингвистическая информация: разбиение на
предложения, графематический анализ,
морфология
При наличии корпуса простейшая система
перевода может быть сделана на 2 недели
7. Rule-based vs. statistical
news:document:
8. Rule-based translation
TypesDictionarybased (direct)
Transfer-based
Interlingual
9. Dictionary-based (direct)
word by word translationwith or without morphological analysis or
lemmatisation
Application
translation of long lists of phrases on the
subsentential (i.e., not a full sentence) level,
e.g. lists, inventories or simple catalogs of products
and services.
10. Direct translation example
11. Transfer-based machine translation
1. Analyzing the input textfor morphology and syntax
(and sometimes semantics)
Sentence in a source language
analysis
Source language structure
2. Creating an
representation
internal
3. Generating translation
using
both
bilingual
dictionaries
and
grammatical rules
transfer
Target language structure
synthesis
Sentence in a target language
12. Interlingua machine translation
the source language istransformed into an
interlingua, i.e., an
abstract languageindependent
representation
the target language is
generated from the
interlingua.
13. Transfer vs. interlingua
14. Hybrid machine translation
Hybrid machine translationmethod of machine translation characterized by
the use of multiple approaches within a single
machine translation system.
Types:
RBMT guided by statistics
Statistical method guided by RBMT
15. MT software
NamePlatform
Freeware/commercial
Type
Google Translate
Cross-platform
(Web application)
Freeware
Statistical
Commercial
Hybrid rules-based
and SMT
Commercial
Hybrid rules-based
and SMT
SYSTRAN
Promt
Cross-platform
(Web application)
Cross-platform