Crawlers are critical for ensuring the dependability and security of web applications by maximizing the code coverage of testing tools. Reinforcement learning (RL) has recently emerged as a promising approach to improve crawler exploration. However, existing approaches based on Q-learning face two major limitations: being state-based, they rely on brittle state abstractions that fail to generalize across diverse applications, and they employ rewards that prioritize underused actions but are not necessarily proportional to the improvement in code coverage. In this paper, we first substantiate the limitations of two popular Q-learning-based crawlers. We then propose Multi-Armed Krawler (MAK), a new crawler based on the Adversarial Multi-Armed Bandit problem. MAK is stateless and does not require the definition of brittle state abstractions that do not generalize to new web applications. By modeling the crawling process through a traditional graph abstraction and introducing an extrinsic reward correlated with code coverage, MAK compensates for the loss of expressiveness coming from its stateless nature. Our experimental results on public web applications show that MAK achieves greater coverage and faster convergence than its counterparts.

Less is More: Boosting Coverage of Web Crawling through Adversarial Multi-Armed Bandit

Cazzaro L.
;
Calzavara S.;Kovalkov M.;
2025-01-01

Abstract

Crawlers are critical for ensuring the dependability and security of web applications by maximizing the code coverage of testing tools. Reinforcement learning (RL) has recently emerged as a promising approach to improve crawler exploration. However, existing approaches based on Q-learning face two major limitations: being state-based, they rely on brittle state abstractions that fail to generalize across diverse applications, and they employ rewards that prioritize underused actions but are not necessarily proportional to the improvement in code coverage. In this paper, we first substantiate the limitations of two popular Q-learning-based crawlers. We then propose Multi-Armed Krawler (MAK), a new crawler based on the Adversarial Multi-Armed Bandit problem. MAK is stateless and does not require the definition of brittle state abstractions that do not generalize to new web applications. By modeling the crawling process through a traditional graph abstraction and introducing an extrinsic reward correlated with code coverage, MAK compensates for the loss of expressiveness coming from its stateless nature. Our experimental results on public web applications show that MAK achieves greater coverage and faster convergence than its counterparts.
2025
Proceedings - 2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2025
File in questo prodotto:
File Dimensione Formato  
dsn25.pdf

non disponibili

Tipologia: Documento in Post-print
Licenza: Accesso chiuso-personale
Dimensione 569.66 kB
Formato Adobe PDF
569.66 kB Adobe PDF   Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5100968
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact