This paper explores the impact of ecologically and cognitively plausible data on the training of language models. It builds on prior work integrating child-directed speech, curriculum learning and instruction tuning to train Italian BabyLMs. To evaluate our BabyLMs, we compare their performance (trained on fewer than 100M words using various techniques) with that of native Italian Large Language Models using the Invalsi-ITA benchmark, designed to evaluate Italian students on text comprehension and linguistic abilities. The goal is to assess whether cognitively motivated training approaches (Curriculum Learning based on Child-Directed speech and child-friendly data), which are crucial for meaningful comparison between human learners and computational systems, yield greater efficiency than standard methods.
BAMBI Goes to School: Evaluating Italian BabyLMs with Invalsi-ITA
Alice Suozzi;Gianluca Lebani;Alessandro Lenci
2025-01-01
Abstract
This paper explores the impact of ecologically and cognitively plausible data on the training of language models. It builds on prior work integrating child-directed speech, curriculum learning and instruction tuning to train Italian BabyLMs. To evaluate our BabyLMs, we compare their performance (trained on fewer than 100M words using various techniques) with that of native Italian Large Language Models using the Invalsi-ITA benchmark, designed to evaluate Italian students on text comprehension and linguistic abilities. The goal is to assess whether cognitively motivated training approaches (Curriculum Learning based on Child-Directed speech and child-friendly data), which are crucial for meaningful comparison between human learners and computational systems, yield greater efficiency than standard methods.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025.clicit-1.16.pdf
accesso aperto
Tipologia:
Versione dell'editore
Licenza:
Creative commons
Dimensione
1.4 MB
Formato
Adobe PDF
|
1.4 MB | Adobe PDF | Visualizza/Apri |
I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



