The markup approach to represent and store large corpora of annotated textual documents is criticized for several reasons: it poses problems in expressing non-hierarchical structures, it limits the annota- tions in type and complexity, it makes difficult the writing of complex textual analysis programs since it requires the use of generic query lan- guages like XQuery which are not well suited to the special need of the domain. We present a model and a language, called Manuzio, developed to be at the base of a new generation of textual document management systems which overcome the previous shortcomings. The model is an ob- ject based one, specialized for the specific domain, and has abstraction mechanisms which present some similarities with those of the object ori- ented database models. The language has query facilities and allows the development of sophisticated textual analysis applications. A prototype for a system has been designed and applied to several test cases.
A Model and a Language for Large Textual Databases
MAURIZIO, Marek;ORSINI, Renzo
2010-01-01
Abstract
The markup approach to represent and store large corpora of annotated textual documents is criticized for several reasons: it poses problems in expressing non-hierarchical structures, it limits the annota- tions in type and complexity, it makes difficult the writing of complex textual analysis programs since it requires the use of generic query lan- guages like XQuery which are not well suited to the special need of the domain. We present a model and a language, called Manuzio, developed to be at the base of a new generation of textual document management systems which overcome the previous shortcomings. The model is an ob- ject based one, specialized for the specific domain, and has abstraction mechanisms which present some similarities with those of the object ori- ented database models. The language has query facilities and allows the development of sophisticated textual analysis applications. A prototype for a system has been designed and applied to several test cases.File | Dimensione | Formato | |
---|---|---|---|
paper.pdf
non disponibili
Tipologia:
Documento in Pre-print
Licenza:
Accesso chiuso-personale
Dimensione
435.08 kB
Formato
Adobe PDF
|
435.08 kB | Adobe PDF | Visualizza/Apri |
I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.