A Model and a Language for Large Textual Databases

Maurizio, Marek; Orsini, Renzo

The markup approach to represent and store large corpora of annotated textual documents is criticized for several reasons: it poses problems in expressing non-hierarchical structures, it limits the annota- tions in type and complexity, it makes difficult the writing of complex textual analysis programs since it requires the use of generic query lan- guages like XQuery which are not well suited to the special need of the domain. We present a model and a language, called Manuzio, developed to be at the base of a new generation of textual document management systems which overcome the previous shortcomings. The model is an ob- ject based one, specialized for the specific domain, and has abstraction mechanisms which present some similarities with those of the object ori- ented database models. The language has query facilities and allows the development of sophisticated textual analysis applications. A prototype for a system has been designed and applied to several test cases.