- Dans Document Analysis and Recognition – ICDAR 2024 Workshops
- Éditeur : Springer Nature Switzerland
- Pages : 140-158
Résumé
Document layout analysis is essential in Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR), especially for historical and low-resource scripts. This study explores a novel data augmentation technique using Generative Adversarial Networks (GANs) to generate realistic document layouts from semantic masks, enhancing layout analysis without increasing human annotation effort. Our lightweight pipeline, tested on historical manuscripts (Latin, Arabic, Armenian, Hebrew), newspapers, and complex document layouts, shows that GAN-generated layouts are convincing and difficult to distinguish from real ones, even for paleographers. This method significantly boosts data augmentation, yielding a 3% point improvement in layout analysis metrics (precision, recall, mAP), and a 12 point increase in precision and recall for damaged documents. Additionally, masks with character information enhance image quality, boosting text recognition performance.