Quantitative Aspekte der Modalpartikelverwendung. Untersuchungen zum automatisch annotierten Korpus für gesprochenes Deutsch FOLK

The topic of this article are quantitative aspects of the use of modal particles (MPs) in corpora of spoken German, i. e. the token rates of individual or all MPs and the ranking of their fre-quency. Since MPs consistently have heterosemes in other word classes (adverb, focus particle, interjection, etc.), in the past such analyses had to be conducted manually. It was only in 2017 that automatic POS tagging adapted to spoken language was released for the FOLK corpus, enabling an automatic search for MPs. By comparing the automatic counts in FOLK to the frequency data of the manually analysed corpora of Hentschel (1986) and Brünjes (2014) and by checking the POS tagging of samples randomly extracted from FOLK, the paper seeks to answer the question of how reliable the automatically generated MP-data of FOLK are. With regard to the list of lexemes considered in FOLK, errors are essentially limited to the MP ei-gentlich and to some quantitatively marginal cases. The overall frequency of MPs in FOLK (token rate 2.55%) also seems plausible. Major deviations from previous studies arise in the frequencies of some single MPs, of which auch, mal and halt are analysed in more detail. While the discrepancies for auch are due to deficits in POS tagging, for mal and halt corpus charac-teristics (discourse types and survey periods) play a major role. When extrapolating the adjusted frequencies found in the random samples to the whole corpus, the MP frequency rankings of FOLK however correlate just as well with those of manual counts (r=0.81/0.82) as the manually determined MP rankings of different corpora do with each other.

Quantitative Aspekte der Modalpartikelverwendung. Untersuchungen zum automatisch annotierten Korpus für gesprochenes Deutsch FOLK

Paschke, Peter

2026

Abstract

The topic of this article are quantitative aspects of the use of modal particles (MPs) in corpora of spoken German, i. e. the token rates of individual or all MPs and the ranking of their fre-quency. Since MPs consistently have heterosemes in other word classes (adverb, focus particle, interjection, etc.), in the past such analyses had to be conducted manually. It was only in 2017 that automatic POS tagging adapted to spoken language was released for the FOLK corpus, enabling an automatic search for MPs. By comparing the automatic counts in FOLK to the frequency data of the manually analysed corpora of Hentschel (1986) and Brünjes (2014) and by checking the POS tagging of samples randomly extracted from FOLK, the paper seeks to answer the question of how reliable the automatically generated MP-data of FOLK are. With regard to the list of lexemes considered in FOLK, errors are essentially limited to the MP ei-gentlich and to some quantitatively marginal cases. The overall frequency of MPs in FOLK (token rate 2.55%) also seems plausible. Major deviations from previous studies arise in the frequencies of some single MPs, of which auch, mal and halt are analysed in more detail. While the discrepancies for auch are due to deficits in POS tagging, for mal and halt corpus charac-teristics (discourse types and survey periods) play a major role. When extrapolating the adjusted frequencies found in the random samples to the whole corpus, the MP frequency rankings of FOLK however correlate just as well with those of manual counts (r=0.81/0.82) as the manually determined MP rankings of different corpora do with each other.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2026
			
	Titolo della Rivista
	
				LINGUISTIK ONLINE
			
	N° Volume
	
				146
			
	DOI
	
				https://dx.doi.org/10.13092/9wv7n672
			
	Appare nelle tipologie:
	
				2.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Paschke (2026) Quantitative Aspekte der Modalpartikelverwendung (pub).pdf accesso aperto Tipologia: Versione dell'editore Licenza: Creative commons Dimensione 889.13 kB Formato Adobe PDF Visualizza/Apri	889.13 kB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5117438

Citazioni

ND

ND

ND

social impact