- Dans Document Analysis and Recognition – ICDAR 2024 Workshops
- Éditeur : Springer Nature Switzerland
- Pages : 140-158
Résumé
Document layout analysis is essential in Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR), especially for historical and low-resource scripts. This study explores a novel data augmentation technique using Generative Adversarial Networks (GANs) to generate realistic document layouts from semantic masks, enhancing layout analysis without increasing human annotation effort. Our lightweight pipeline, tested on historical manuscripts (Latin, Arabic, Armenian, Hebrew), newspapers, and complex document layouts, shows that GAN-generated layouts are convincing and difficult to distinguish from real ones, even for paleographers. This method significantly boosts data augmentation, yielding a 3% point improvement in layout analysis metrics (precision, recall, mAP), and a 12 point increase in precision and recall for damaged documents. Additionally, masks with character information enhance image quality, boosting text recognition performance.
Partager sur les réseaux sociaux
Publications de chercheur
Enhancing Arabic Maghribi Handwritten Text Recognition with RASAM 2: A Comprehensive Dataset and Benchmarking
Publication de chercheur
Communication dans un congrès
- Date de parution : 2024
Cross-Dialectal Transfer and Zero-Shot Learning for Armenian Varieties: A Comparative Analysis of RNNs, Transformers and LLMs
Publication de chercheur
Communication dans un congrès Nouveauté
- Date de parution : 2024
Yves Pérotin (1922-1981). L'archiviste inimitable
Publication de chercheur
Ouvrage Nouveauté
- Date de parution : 2024