The markup approach to represent and store large corpora of annotated textual documents is criticized for several reasons: it poses problems in expressing non-hierarchical structures, it limits the annota- tions in type and complexity, it makes difficult the writing of complex textual analysis programs since it requires the use of generic query lan- guages like XQuery which are not well suited to the special need of the domain. We present a model and a language, called Manuzio, developed to be at the base of a new generation of textual document management systems which overcome the previous shortcomings. The model is an ob- ject based one, specialized for the specific domain, and has abstraction mechanisms which present some similarities with those of the object ori- ented database models. The language has query facilities and allows the development of sophisticated textual analysis applications. A prototype for a system has been designed and applied to several test cases.

A Model and a Language for Large Textual Databases

MAURIZIO, Marek;ORSINI, Renzo
2010

Abstract

The markup approach to represent and store large corpora of annotated textual documents is criticized for several reasons: it poses problems in expressing non-hierarchical structures, it limits the annota- tions in type and complexity, it makes difficult the writing of complex textual analysis programs since it requires the use of generic query lan- guages like XQuery which are not well suited to the special need of the domain. We present a model and a language, called Manuzio, developed to be at the base of a new generation of textual document management systems which overcome the previous shortcomings. The model is an ob- ject based one, specialized for the specific domain, and has abstraction mechanisms which present some similarities with those of the object ori- ented database models. The language has query facilities and allows the development of sophisticated textual analysis applications. A prototype for a system has been designed and applied to several test cases.
Proceedings of the Eighteenth Italian Symposium on Advanced Database Systems, SEBD 2010
File in questo prodotto:
File Dimensione Formato  
paper.pdf

non disponibili

Tipologia: Documento in Pre-print
Licenza: Accesso chiuso-personale
Dimensione 435.08 kB
Formato Adobe PDF
435.08 kB Adobe PDF   Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/10278/31146
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact