This article presents a new methodological resource for L2 Chinese acquisition research, BILCC (Bimodal Italian Learner Corpus of Chinese). BILCC was created to fill two major gaps in the literature: first, the absence of data from Italian learners in existing L2 Chinese corpora; second, to support research on L2 Chinese acquisition by Italian learners, given the growing interest in Chinese language learning and the flourishing research community devoted to L2 Chinese acquisition in Italy. BILCC has been constructed according to strict design criteria and collects written and spoken data from 106 Italian L2 Chinese learners at beginner, intermediate and advanced levels. It is a specific-purpose corpus, as it was specifically designed to explore the pragmalinguistic knowledge of Chinese shì...de clefts by L1 Italian learners. The written subcorpus consists of 53,437 Chinese characters and the spoken subcorpus of 54,212 Chinese characters. BILCC also includes an equivalent native speaker subcorpus, which collects data from 35 L1 Chinese speakers. The learner corpus, which is fully documented with rich metadata, is annotated at the error and pragmatic levels.
Design and collection of the Bimodal Italian Learner Corpus of Chinese (BILCC)
Alessia Iurato
In corso di stampa
Abstract
This article presents a new methodological resource for L2 Chinese acquisition research, BILCC (Bimodal Italian Learner Corpus of Chinese). BILCC was created to fill two major gaps in the literature: first, the absence of data from Italian learners in existing L2 Chinese corpora; second, to support research on L2 Chinese acquisition by Italian learners, given the growing interest in Chinese language learning and the flourishing research community devoted to L2 Chinese acquisition in Italy. BILCC has been constructed according to strict design criteria and collects written and spoken data from 106 Italian L2 Chinese learners at beginner, intermediate and advanced levels. It is a specific-purpose corpus, as it was specifically designed to explore the pragmalinguistic knowledge of Chinese shì...de clefts by L1 Italian learners. The written subcorpus consists of 53,437 Chinese characters and the spoken subcorpus of 54,212 Chinese characters. BILCC also includes an equivalent native speaker subcorpus, which collects data from 35 L1 Chinese speakers. The learner corpus, which is fully documented with rich metadata, is annotated at the error and pragmatic levels.I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.