The Entity Linking (EL) problem consists in automatically linking short fragments of text within a document to entities in a given Knowledge Base like Wikipedia. Due to its impact in several text-understanding related tasks, EL is an hot research topic. The correlated problem of devising the most relevant entities mentioned in the document, a.k.a. salient entities (SE), is also attracting increasing interest. Unfortunately, publicly available evaluation datasets that contain accurate and supervised knowledge about mentioned entities and their relevance ranking are currently very poor both in number and quality. This lack makes very difficult to compare different EL and SE solutions on a fair basis, as well as to devise innovative techniques that relies on these datasets to train machine learning models, in turn used to automatically link and rank entities. In this demo paper we propose a Web-deployed tool that allows to crowdsource the creation of these datasets, by supporting the collaborative human annotation of semi-structured documents. The tool, called ELIANTO, is actually an open source framework, which provides a user friendly and reactive Web interface to support both EL and SE labelling tasks, through a guided two-step process.

Manual Annotation of Semi-Structured Documents for Entity-Linking

LUCCHESE, Claudio;ORLANDO, Salvatore;
2014-01-01

Abstract

The Entity Linking (EL) problem consists in automatically linking short fragments of text within a document to entities in a given Knowledge Base like Wikipedia. Due to its impact in several text-understanding related tasks, EL is an hot research topic. The correlated problem of devising the most relevant entities mentioned in the document, a.k.a. salient entities (SE), is also attracting increasing interest. Unfortunately, publicly available evaluation datasets that contain accurate and supervised knowledge about mentioned entities and their relevance ranking are currently very poor both in number and quality. This lack makes very difficult to compare different EL and SE solutions on a fair basis, as well as to devise innovative techniques that relies on these datasets to train machine learning models, in turn used to automatically link and rank entities. In this demo paper we propose a Web-deployed tool that allows to crowdsource the creation of these datasets, by supporting the collaborative human annotation of semi-structured documents. The tool, called ELIANTO, is actually an open source framework, which provides a user friendly and reactive Web interface to support both EL and SE labelling tasks, through a guided two-step process.
2014
Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management - CIKM '14
File in questo prodotto:
File Dimensione Formato  
p2075-trani.pdf

non disponibili

Tipologia: Documento in Post-print
Licenza: Accesso chiuso-personale
Dimensione 252.72 kB
Formato Adobe PDF
252.72 kB Adobe PDF   Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/43941
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact