The markup approach to represent and store large corpora of annotated textual documents is criticized for several reasons: it poses problems in expressing non-hierarchical structures, it limits the annota- tions in type and complexity, it makes difficult the writing of complex textual analysis programs since it requires the use of generic query lan- guages like XQuery which are not well suited to the special need of the domain. We present a model and a language, called Manuzio, developed to be at the base of a new generation of textual document management systems which overcome the previous shortcomings. The model is an ob- ject based one, specialized for the specific domain, and has abstraction mechanisms which present some similarities with those of the object ori- ented database models. The language has query facilities and allows the development of sophisticated textual analysis applications. A prototype for a system has been designed and applied to several test cases.
|Data di pubblicazione:||2010|
|Titolo:||A Model and a Language for Large Textual Databases|
|Titolo del libro:||Proceedings of the Eighteenth Italian Symposium on Advanced Database Systems, SEBD 2010|
|Appare nelle tipologie:||3.1 Articolo su libro|
File in questo prodotto:
|paper.pdf||Documento in Pre-print||Accesso chiuso-personale||Riservato|