Testi in maschera: nuovi strumenti per la sicurezza e l'analisi linguistica di corpora giuridici

The Atti Chiari project, collecting the first large Italian corpus of judicial acts, presents strict legal requirements as well as many peculiarities in terms of language and content; to meet them, a number of processes and tools have been designed and implemented. The first issue is the requirement to remove any personal data from the documents, without however destroying their linguistic form, nor compromising their readability. To this end, a pseudonymisation procedure has been created based on a preliminary annotation stage, which adds information right in order to remove it in different ways, according to different purposes (linguistic analysis, legal analysis, etc.). At the same time, this light annotation provides data useful not only for pseudonymization, but also for the conversion of documents, from their original presentational format into a semantic one based on TEI. Once documents have been prepared in this way, they are then centralized in a corpus, ready to be indexed for linguistic research. Given the multiple search criteria that must be combined, whatever their origin and model, a new type of search engine, designed primarily in the philological field, has been used here to obtain the required openness and granularity of metadata.

Testi in maschera: nuovi strumenti per la sicurezza e l'analisi linguistica di corpora giuridici

Laura Clemenzi;Francesca Fusco;Daniele Fusi;Giulia Lombardi

2024

Abstract

The Atti Chiari project, collecting the first large Italian corpus of judicial acts, presents strict legal requirements as well as many peculiarities in terms of language and content; to meet them, a number of processes and tools have been designed and implemented. The first issue is the requirement to remove any personal data from the documents, without however destroying their linguistic form, nor compromising their readability. To this end, a pseudonymisation procedure has been created based on a preliminary annotation stage, which adds information right in order to remove it in different ways, according to different purposes (linguistic analysis, legal analysis, etc.). At the same time, this light annotation provides data useful not only for pseudonymization, but also for the conversion of documents, from their original presentational format into a semantic one based on TEI. Once documents have been prepared in this way, they are then centralized in a corpus, ready to be indexed for linguistic research. Given the multiple search criteria that must be combined, whatever their origin and model, a new type of search engine, designed primarily in the philological field, has been used here to obtain the required openness and granularity of metadata.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2024
			
	Titolo della Rivista
	
				UMANISTICA DIGITALE
			
	N° Volume
	
				16
			
	Appare nelle tipologie:
	
				2.1 Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5060343

Citazioni

ND

ND

ND

social impact