Predicate Argument Structures for Information Extraction from Dependency Representations: Null Elements are Missing

Delmonte, Rodolfo

doi:10.1007/978-3-642-40621-8_2

State of the art parsers are currently trained on converted versions of Penn Treebank into dependency representations which however don’t include null elements. This is done to facilitate structural learning and prevent the probabilistic engine to postulate the existence of deprecated null elements everywhere (see R. Gaizauskas, 1995). However it is a fact that in this way, the semantics of the representation used and produced on runtime is inconsistent and will reduce dramatically its usefulness in real life applications like Information Extraction, Q/A and other semantically driven fields by hampering the mapping of a complete logical form. What systems have come up with are “Quasi”-logical forms or partial logical forms mapped directly from the surface representation in dependency structure. We show the most common problems derived from the conversion and then describe an algorithm that we have implemented to apply to our converted Italian Treebank, that can be used on any CONLL-style treebank or representation to produce an “almost complete” semantically consistent dependency treebank.