Sharing HTR datasets with standardized metadata: the HTR United initiative

Par Alix Chagué et Thibault Clérice.

Since some scholars adopted Ocropy in the mid-2010s, production of HTR or OCR ground truth has seen an impressive and steady growth. However, few projects share their gold dataset, and when they do, they are scattered across many different hosting options (Github, zenodo, gitlab, institutional repository, etc.) making them very hard to find. For reuse, when they are “discovered”, their description is often lacking crucial details. The HTR-United initiative is an answer to this problem: with a standardized metadata schema, a curated catalogue and tools focusing on helping them through every step, owners can now easily publish and make their dataset findable.

Partager sur les réseaux sociaux

Sur les mêmes thématiques

Publications aux éditions de l’École

Applications, éditions et jeux de données

Publications de chercheur