In this paper we propose a rule-based approach to extract dependency and grammatical relations from the Venice Italian Treebank (VIT) (Delmonte et al., 2007) with bracketed tree structure. To our knowledge, the only dependency annotated corpus for Italian available is the Turin University Treebank (Lesmo et al., 2002), which has 25,000 tokens and is about 1/10 of VIT. As manual corpus annotation is expensive and time-consuming, we decided to exploit an existing constituency-based treebank, the VIT, to derive dependency structures with lower effort. After describing the procedure to extract heads and dependents, based on a head percolation table for Italian, we introduce the rules adopted to add grammatical relation labels. To this purpose, we manually relabeled all non-canonical arguments, which are very frequent in Italian, then we automatically labeled the remaining complements or arguments following some syntactic restrictions based on the position of the constituents w.r.t to parent and sibling nodes. The final section of the paper describes evaluation results, carried out in two steps, one for dependency relations and one for grammatical roles. Since results are promising, we plan to use the dependency treebank to train a dependency-based parser and eventually a semantic role labelling system.

Enriching the Venice Italian Treebank with dependency and grammatical relations

TONELLI, Sara;DELMONTE, Rodolfo;BRISTOT, Antonella
2008

Abstract

In this paper we propose a rule-based approach to extract dependency and grammatical relations from the Venice Italian Treebank (VIT) (Delmonte et al., 2007) with bracketed tree structure. To our knowledge, the only dependency annotated corpus for Italian available is the Turin University Treebank (Lesmo et al., 2002), which has 25,000 tokens and is about 1/10 of VIT. As manual corpus annotation is expensive and time-consuming, we decided to exploit an existing constituency-based treebank, the VIT, to derive dependency structures with lower effort. After describing the procedure to extract heads and dependents, based on a head percolation table for Italian, we introduce the rules adopted to add grammatical relation labels. To this purpose, we manually relabeled all non-canonical arguments, which are very frequent in Italian, then we automatically labeled the remaining complements or arguments following some syntactic restrictions based on the position of the constituents w.r.t to parent and sibling nodes. The final section of the paper describes evaluation results, carried out in two steps, one for dependency relations and one for grammatical roles. Since results are promising, we plan to use the dependency treebank to train a dependency-based parser and eventually a semantic role labelling system.
Proceedings SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008
File in questo prodotto:
File Dimensione Formato  
490_paper.pdf

non disponibili

Tipologia: Abstract
Licenza: Licenza non definita
Dimensione 273.09 kB
Formato Adobe PDF
273.09 kB Adobe PDF   Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/10278/39627
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 1
social impact