HTRogène, Medieval Latin corpus of ground-truth for Handwritten Text Recognition and Layout Segmentation [Data set]

Hermand, F.; Brootcorne, M.; Vlachou-Efstathiou, M.; Boschetti, F.; Fischer, F.; Chagué, A.; &amp;,; Clérice, T.

The dataset comprises carefully selected manuscripts, each containing approximately 10 columns of text (equivalent to 5 bi-column pages or 10 single-column pages). The data adheres to the Segmonto guidelines, ensuring consistency and compatibility with other datasets following the same standards. Each image is accompanied by two XML files: - Files suffixed with .chocomufin.xml are normalized for compliance with broader datasets. - The other XML files contain repository-specific information.

HTRogène, Medieval Latin corpus of ground-truth for Handwritten Text Recognition and Layout Segmentation [Data set]

Hermand F.;Brootcorne M.;Vlachou-Efstathiou M.;Boschetti F.;Fischer F.;Chagué A.;&;Clérice, T.

2025

Abstract

The dataset comprises carefully selected manuscripts, each containing approximately 10 columns of text (equivalent to 5 bi-column pages or 10 single-column pages). The data adheres to the Segmonto guidelines, ensuring consistency and compatibility with other datasets following the same standards. Each image is accompanied by two XML files: - Files suffixed with .chocomufin.xml are normalized for compliance with broader datasets. - The other XML files contain repository-specific information.