Layout analysis of historical handwritten documents is a key pre-processing step in document image analysis that, by segmenting the image into its homogeneous regions, facilitates subsequent procedures such as optical character recognition and automatic transcription. Learning-based approaches have shown promising performances in layout analysis, however, the majority of them requires tedious pixel-wise labelled training data to achieve generalisation capabilities, this limitation preventing their application due to the lack of large labelled datasets. This paper proposes a novel unsupervised representation learning method for documents' layout analysis that reduces the need for labelled data: a sparse autoencoder is first trained in an unsupervised manner on a historical text document's image; representation of image patches, computed by the sparse encoder, is then used to classify pixels into various region categories of the document using a feed-forward neural network. A new training method, inspired by the ISTA algorithm, is also introduced here to train the sparse encoder. Experimental results on DIVA - HisDB dataset demonstrate that the proposed method outperforms previous approaches based on unsupervised representation learning while achieving performances comparable to the state-of-the-art fully supervised methods.

Ancient Document Layout Analysis: Autoencoders meet Sparse Coding

Fiorucci, Marco;Traviglia, Arianna
2021-01-01

Abstract

Layout analysis of historical handwritten documents is a key pre-processing step in document image analysis that, by segmenting the image into its homogeneous regions, facilitates subsequent procedures such as optical character recognition and automatic transcription. Learning-based approaches have shown promising performances in layout analysis, however, the majority of them requires tedious pixel-wise labelled training data to achieve generalisation capabilities, this limitation preventing their application due to the lack of large labelled datasets. This paper proposes a novel unsupervised representation learning method for documents' layout analysis that reduces the need for labelled data: a sparse autoencoder is first trained in an unsupervised manner on a historical text document's image; representation of image patches, computed by the sparse encoder, is then used to classify pixels into various region categories of the document using a feed-forward neural network. A new training method, inspired by the ISTA algorithm, is also introduced here to train the sparse encoder. Experimental results on DIVA - HisDB dataset demonstrate that the proposed method outperforms previous approaches based on unsupervised representation learning while achieving performances comparable to the state-of-the-art fully supervised methods.
2021
2020 25th International Conference on Pattern Recognition (ICPR)
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/3741187
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 4
social impact