Nowadays, the availability of a huge amount of data produced by a wide range of new technologies, so-called big data, is increasing. However, data obtain- able from big data sources are often the result of a non-probability sampling process and adjusting for the selection bias is an important practical problem. In this paper, we propose a novel method of reducing the selection bias associated with the big data source in the context of Small Area Estimation (SAE). Our approach is based on data integration and the combination of a big data sample and a probability sam- ple. An application on OBEC (on-line based enterprise characteristics) combining Istat sampling survey and web scraping data has been proposed.
Inference for big data assisted by small area methods: an application to OBEC (on-line based enterprise characteristics)
Bertarelli, Gaia;
2022-01-01
Abstract
Nowadays, the availability of a huge amount of data produced by a wide range of new technologies, so-called big data, is increasing. However, data obtain- able from big data sources are often the result of a non-probability sampling process and adjusting for the selection bias is an important practical problem. In this paper, we propose a novel method of reducing the selection bias associated with the big data source in the context of Small Area Estimation (SAE). Our approach is based on data integration and the combination of a big data sample and a probability sam- ple. An application on OBEC (on-line based enterprise characteristics) combining Istat sampling survey and web scraping data has been proposed.File | Dimensione | Formato | |
---|---|---|---|
Sis-2022-A-333-340 (1).pdf
accesso aperto
Tipologia:
Documento in Post-print
Licenza:
Accesso libero (no vincoli)
Dimensione
129.21 kB
Formato
Adobe PDF
|
129.21 kB | Adobe PDF | Visualizza/Apri |
I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.