Il corpus digitale AttiChiari: costruzione, analisi, strumenti di ricerca

The paper delves into the complexities of indexing legal documents for the purpose of building corpora for linguistic and juridical analysis, highlighting the challenges of applying a systematic pseudonymisation of the original texts without disrupting their linguistic texture, while integrating an unlimited number of metadata from various sources, such as document metadata, typographic aspects, and more linguistic information from NLP tools. It emphasizes the need for a higher level of abstraction in representing textual and metatextual data to offer a uniform interface for the search process. Furthermore, it introduces the architecture of a new engine, stemming from the linguistic and metrical analysis of Classical texts, designed to meet these requirements through a "dematerialization" of text, transforming it into a higher abstraction level that better supports the desired characteristics. This engine differs from traditional systems by focusing on objects—containers of metadata properties—rather than character sequences. This approach allows for more complex and varied metadata, enhancing search capabilities beyond simple character sequence comparisons. The paper also outlines the modular components of the system, which support interactive text consultation, creating an environment conducive to both research and reading. The entire process from the original document to its indexing is described as a modular sequence, allowing for the integration of various text analysis procedures and metadata from different sources, ultimately rendering the document in a typographically rich format suitable for presentation to the end-user.

Il corpus digitale AttiChiari: costruzione, analisi, strumenti di ricerca

Daniele Fusi

2024

Abstract

The paper delves into the complexities of indexing legal documents for the purpose of building corpora for linguistic and juridical analysis, highlighting the challenges of applying a systematic pseudonymisation of the original texts without disrupting their linguistic texture, while integrating an unlimited number of metadata from various sources, such as document metadata, typographic aspects, and more linguistic information from NLP tools. It emphasizes the need for a higher level of abstraction in representing textual and metatextual data to offer a uniform interface for the search process. Furthermore, it introduces the architecture of a new engine, stemming from the linguistic and metrical analysis of Classical texts, designed to meet these requirements through a "dematerialization" of text, transforming it into a higher abstraction level that better supports the desired characteristics. This engine differs from traditional systems by focusing on objects—containers of metadata properties—rather than character sequences. This approach allows for more complex and varied metadata, enhancing search capabilities beyond simple character sequence comparisons. The paper also outlines the modular components of the system, which support interactive text consultation, creating an environment conducive to both research and reading. The entire process from the original document to its indexing is described as a modular sequence, allowing for the integration of various text analysis procedures and metadata from different sources, ultimately rendering the document in a typographically rich format suitable for presentation to the end-user.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2024
			
	Titolo del Volume
	
				La lingua e la scrittura forense: storia, temi, prospettive
			
	Appare nelle tipologie:
	
				3.1 Articolo su libro

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5060344

Citazioni

ND

ND

ND

social impact