This paper revisits a recent study by Posen and Levinthal (Man Sci 58:587–601, 2012) on the exploration/exploitation tradeoff for a multi- armed bandit problem, where the reward probabilities undergo random shocks. We show that their analysis suffers two shortcomings: it assumes that learning is based on stale evidence, and it overlooks the steady state. We let the learning rule endogenously discard stale evidence, and we perform the long run analyses. The comparative study demonstrates that some of their conclusions must be qualified.
Pack light on the move: Exploitation and exploration in a dynamic environment
LI CALZI, Marco;
2013-01-01
Abstract
This paper revisits a recent study by Posen and Levinthal (Man Sci 58:587–601, 2012) on the exploration/exploitation tradeoff for a multi- armed bandit problem, where the reward probabilities undergo random shocks. We show that their analysis suffers two shortcomings: it assumes that learning is based on stale evidence, and it overlooks the steady state. We let the learning rule endogenously discard stale evidence, and we perform the long run analyses. The comparative study demonstrates that some of their conclusions must be qualified.File in questo prodotto:
File | Dimensione | Formato | |
---|---|---|---|
Pack-Light.pdf
accesso aperto
Tipologia:
Documento in Post-print
Licenza:
Accesso gratuito (solo visione)
Dimensione
346.08 kB
Formato
Adobe PDF
|
346.08 kB | Adobe PDF | Visualizza/Apri |
I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.