The topic of this article are quantitative aspects of the use of modal particles (MPs) in corpora of spoken German, i. e. the token rates of individual or all MPs and the ranking of their fre-quency. Since MPs consistently have heterosemes in other word classes (adverb, focus particle, interjection, etc.), in the past such analyses had to be conducted manually. It was only in 2017 that automatic POS tagging adapted to spoken language was released for the FOLK corpus, enabling an automatic search for MPs. By comparing the automatic counts in FOLK to the frequency data of the manually analysed corpora of Hentschel (1986) and Brünjes (2014) and by checking the POS tagging of samples randomly extracted from FOLK, the paper seeks to answer the question of how reliable the automatically generated MP-data of FOLK are. With regard to the list of lexemes considered in FOLK, errors are essentially limited to the MP ei-gentlich and to some quantitatively marginal cases. The overall frequency of MPs in FOLK (token rate 2.55%) also seems plausible. Major deviations from previous studies arise in the frequencies of some single MPs, of which auch, mal and halt are analysed in more detail. While the discrepancies for auch are due to deficits in POS tagging, for mal and halt corpus charac-teristics (discourse types and survey periods) play a major role. When extrapolating the adjusted frequencies found in the random samples to the whole corpus, the MP frequency rankings of FOLK however correlate just as well with those of manual counts (r=0.81/0.82) as the manually determined MP rankings of different corpora do with each other.

Quantitative Aspekte der Modalpartikelverwendung. Untersuchungen zum automatisch annotierten Korpus für gesprochenes Deutsch FOLK

Paschke, Peter
2026

Abstract

The topic of this article are quantitative aspects of the use of modal particles (MPs) in corpora of spoken German, i. e. the token rates of individual or all MPs and the ranking of their fre-quency. Since MPs consistently have heterosemes in other word classes (adverb, focus particle, interjection, etc.), in the past such analyses had to be conducted manually. It was only in 2017 that automatic POS tagging adapted to spoken language was released for the FOLK corpus, enabling an automatic search for MPs. By comparing the automatic counts in FOLK to the frequency data of the manually analysed corpora of Hentschel (1986) and Brünjes (2014) and by checking the POS tagging of samples randomly extracted from FOLK, the paper seeks to answer the question of how reliable the automatically generated MP-data of FOLK are. With regard to the list of lexemes considered in FOLK, errors are essentially limited to the MP ei-gentlich and to some quantitatively marginal cases. The overall frequency of MPs in FOLK (token rate 2.55%) also seems plausible. Major deviations from previous studies arise in the frequencies of some single MPs, of which auch, mal and halt are analysed in more detail. While the discrepancies for auch are due to deficits in POS tagging, for mal and halt corpus charac-teristics (discourse types and survey periods) play a major role. When extrapolating the adjusted frequencies found in the random samples to the whole corpus, the MP frequency rankings of FOLK however correlate just as well with those of manual counts (r=0.81/0.82) as the manually determined MP rankings of different corpora do with each other.
2026
146
File in questo prodotto:
File Dimensione Formato  
Paschke (2026) Quantitative Aspekte der Modalpartikelverwendung (pub).pdf

accesso aperto

Tipologia: Versione dell'editore
Licenza: Creative commons
Dimensione 889.13 kB
Formato Adobe PDF
889.13 kB Adobe PDF Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5117438
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact