This paper explores the impact of ecologically and cognitively plausible data on the training of language models. It builds on prior work integrating child-directed speech, curriculum learning and instruction tuning to train Italian BabyLMs. To evaluate our BabyLMs, we compare their performance (trained on fewer than 100M words using various techniques) with that of native Italian Large Language Models using the Invalsi-ITA benchmark, designed to evaluate Italian students on text comprehension and linguistic abilities. The goal is to assess whether cognitively motivated training approaches (Curriculum Learning based on Child-Directed speech and child-friendly data), which are crucial for meaningful comparison between human learners and computational systems, yield greater efficiency than standard methods.

BAMBI Goes to School: Evaluating Italian BabyLMs with Invalsi-ITA

Alice Suozzi;Gianluca Lebani;Alessandro Lenci
2025-01-01

Abstract

This paper explores the impact of ecologically and cognitively plausible data on the training of language models. It builds on prior work integrating child-directed speech, curriculum learning and instruction tuning to train Italian BabyLMs. To evaluate our BabyLMs, we compare their performance (trained on fewer than 100M words using various techniques) with that of native Italian Large Language Models using the Invalsi-ITA benchmark, designed to evaluate Italian students on text comprehension and linguistic abilities. The goal is to assess whether cognitively motivated training approaches (Curriculum Learning based on Child-Directed speech and child-friendly data), which are crucial for meaningful comparison between human learners and computational systems, yield greater efficiency than standard methods.
2025
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
File in questo prodotto:
File Dimensione Formato  
2025.clicit-1.16.pdf

accesso aperto

Tipologia: Versione dell'editore
Licenza: Creative commons
Dimensione 1.4 MB
Formato Adobe PDF
1.4 MB Adobe PDF Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5108948
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact