An algorithm for automatic collation of vocabulary decks based on word frequency

This study focuses on computer based foreign language vocabulary learning systems. Our objective is to automatically build vocabulary decks with desired levels of relative difficulty relations. To realize this goal, we exploit the fact that word frequency is a good indicator of vocabulary difficulty. Subsequently, for composing the decks, we pose two requirements as uniformity and diversity. Namely, the difficulty level of the cards in the same deck needs to be uniform enough so that they can be grouped together and difficulty levels of the cards in different decks need to be diverse enough so that they can be grouped in different decks. To assess uniformity and diversity, we use rank-biserial correlation and propose an iterative algorithm, which helps in attaining desired levels of uniformity and diversity based on word frequency in daily use of language. In experiments, we employed a spaced repetition flashcard software and presented users various decks built with the proposed algorithm, which contain cards from different content types. From users' activity logs, we derived several behavioral variables and examined the polyserial correlation between these variables and difficulty levels across different word classes. This analysis confirmed that the decks compiled with the proposed algorithm induce an effect on behavioral variables in line with the expectations. In addition, a series of experiments with decks involving varying content types confirmed that this relation is independent of word class.

An algorithm for automatic collation of vocabulary decks based on word frequency

Yucel Z.;Supitayakul P.;Monden A.;Leelaprute P.

2020

Abstract

This study focuses on computer based foreign language vocabulary learning systems. Our objective is to automatically build vocabulary decks with desired levels of relative difficulty relations. To realize this goal, we exploit the fact that word frequency is a good indicator of vocabulary difficulty. Subsequently, for composing the decks, we pose two requirements as uniformity and diversity. Namely, the difficulty level of the cards in the same deck needs to be uniform enough so that they can be grouped together and difficulty levels of the cards in different decks need to be diverse enough so that they can be grouped in different decks. To assess uniformity and diversity, we use rank-biserial correlation and propose an iterative algorithm, which helps in attaining desired levels of uniformity and diversity based on word frequency in daily use of language. In experiments, we employed a spaced repetition flashcard software and presented users various decks built with the proposed algorithm, which contain cards from different content types. From users' activity logs, we derived several behavioral variables and examined the polyserial correlation between these variables and difficulty levels across different word classes. This analysis confirmed that the decks compiled with the proposed algorithm induce an effect on behavioral variables in line with the expectations. In addition, a series of experiments with decks involving varying content types confirmed that this relation is independent of word class.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2020
			
	Titolo della Rivista
	
				IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS
			
	N° Volume
	
				E103D
			
	DOI
	
				https://dx.doi.org/10.1587/transinf.2019EDP7279
			
	Appare nelle tipologie:
	
				2.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
j_08_ieice.pdf non disponibili Tipologia: Versione dell'editore Licenza: Copyright dell'editore Dimensione 1.76 MB Formato Adobe PDF Visualizza/Apri	1.76 MB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5079366

Citazioni

ND

7

ND

social impact