This paper shows how data science can contribute to improving empirical research in economics by leveraging on large datasets and extracting information otherwise unsuitable for a traditional econometric approach. As a test-bed for our framework, machine learning algorithms allow us to create a new holistic measure of innovation built on a 2012 Italian Law aimed at boosting new high-tech firms. We adopt this measure to analyse the impact of innovativeness on a large population of Italian firms which entered the market at the beginning of the 2008 global crisis. The methodological contribution is organised in different steps. First, we train seven supervised learning algorithms to recognise innovative firms on 2013 firmographics data and select a combination of those with best predicting power. Second, we apply the former on the 2008 dataset and predict which firms would have been labelled as innovative according to the definition of the law. Finally, we adopt this new indicator as regressor in a survival model to explain firms' ability to remain in the market after 2008. Results suggest that the group of innovative firms are more likely to survive than the rest of the sample, but the survival premium is likely to depend on location.
I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.