A Benchmark Corpus for Topic Modeling on the Origins of Modern Antisemitism

The pace of digitized collective knowledge accumulation has become increasingly rapid in the last few years. That means we have tremendous amounts of information content to be organized, searched, and understood that can be arranged only by employing automatic methods. In the case of textual data analysis, topic modeling, a machine learning method, is definitely the most famous framework to uncover latent topics from text documents. Adopting topic modeling approaches for studying textual sources is a well-established practice in many scientific and humanities studies fields, including the historical research scope. In this paper, we present a benchmark corpus for topic models, a dataset containing an annotated real-world collection of texts focused on the antisemitism theme in 19th century France. The benchmark corpus has been developed to address a specific machine learning task but it can also support the enhancement of other natural language processing-based studies, in particular, those concerning the historical sphere.

A Benchmark Corpus for Topic Modeling on the Origins of Modern Antisemitism

Giorgia Minello;Deborah Paci

2022

Abstract

The pace of digitized collective knowledge accumulation has become increasingly rapid in the last few years. That means we have tremendous amounts of information content to be organized, searched, and understood that can be arranged only by employing automatic methods. In the case of textual data analysis, topic modeling, a machine learning method, is definitely the most famous framework to uncover latent topics from text documents. Adopting topic modeling approaches for studying textual sources is a well-established practice in many scientific and humanities studies fields, including the historical research scope. In this paper, we present a benchmark corpus for topic models, a dataset containing an annotated real-world collection of texts focused on the antisemitism theme in 19th century France. The benchmark corpus has been developed to address a specific machine learning task but it can also support the enhancement of other natural language processing-based studies, in particular, those concerning the historical sphere.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2022
			
	Titolo della Rivista
	
				UMANISTICA DIGITALE
			
	N° Volume
	
				13
			
	DOI
	
				https://dx.doi.org/10.6092/issn.2532-8816/14767
			
	Appare nelle tipologie:
	
				2.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
14767-Article Text-60209-1-10-20221024.pdf accesso aperto Tipologia: Versione dell'editore Licenza: Accesso libero (no vincoli) Dimensione 402.33 kB Formato Adobe PDF Visualizza/Apri	402.33 kB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5011562

Citazioni

ND

1

ND

social impact