In this paper we describe a prototype of a Venetan to English translation system developed under the STILVEN project financed by the Regional Authorities of Veneto Region in Italy. The general approach is a statistical one with some preprocessing operations both at training and translation time (ortographic normalization and POS tagging to make use of factored models) which are needed especially to overcome two main problems: the scarcity of Venetan resources (our Venetan-English corpus is made up of only 13,000 sentences, amounting to 128,000 Venetan tokens excluding punctuation) and the diasystemic nature of Venetan, which really represents an ensemble of varieties rather than a single dialect. We will present in detail the problems related to Venetan, our ideas to solve them, their implementation and the results obtained so far.
Venetan to English machine translation: issues and possible solutions
JABER, SUHEL;DELMONTE, Rodolfo
2011-01-01
Abstract
In this paper we describe a prototype of a Venetan to English translation system developed under the STILVEN project financed by the Regional Authorities of Veneto Region in Italy. The general approach is a statistical one with some preprocessing operations both at training and translation time (ortographic normalization and POS tagging to make use of factored models) which are needed especially to overcome two main problems: the scarcity of Venetan resources (our Venetan-English corpus is made up of only 13,000 sentences, amounting to 128,000 Venetan tokens excluding punctuation) and the diasystemic nature of Venetan, which really represents an ensemble of varieties rather than a single dialect. We will present in detail the problems related to Venetan, our ideas to solve them, their implementation and the results obtained so far.File | Dimensione | Formato | |
---|---|---|---|
nlpcs-2011-jaber.pdf
accesso aperto
Tipologia:
Documento in Post-print
Licenza:
Accesso gratuito (solo visione)
Dimensione
872.64 kB
Formato
Adobe PDF
|
872.64 kB | Adobe PDF | Visualizza/Apri |
I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.