- Congrès : Computational Humanities Research (CHR) (2024-12-04 - 2024-12-06)
Résumé
Recent advancements in handwritten text recognition (HTR) for historical documents have demonstrated high performance on cursive Arabic scripts, achieving accuracy comparable to Latin scripts. The initial RASAM dataset, focused on three Arabic Maghribi manuscripts, facilitated rapid coverage of new documents via fine-tuning. However, HTR application for Arabic scripts remains constrained due to the vast diversity in spellings, ambiguities, and languages. To overcome these challenges, we present RASAM 2, an extended dataset with 3,750 lines from 15 manuscripts in the BULAC library, showcasing various hands, layouts, and texts in Arabic Maghribi script. RASAM 2 aims to establish a new benchmark for HTR model training for both Maghribi and Oriental scripts, covering text recognition and layout analysis. Preliminary experiments using a word-based CRNN approach indicate significant model versatility, with a nearly 40% reduction in Character Error Rate (CER) across new in-domain and out-of-domain manuscripts.
Partager sur les réseaux sociaux
Publications de chercheur
‘La Rochelle, notre commune patrie': the World of the Rochelais Huguenots before the Revocation of the Edict of Nantes
Publication de chercheur
Chapitre d’ouvrage
- Date de parution : 2025
Cross-Dialectal Transfer and Zero-Shot Learning for Armenian Varieties: A Comparative Analysis of RNNs, Transformers and LLMs
Publication de chercheur
Communication dans un congrès Nouveauté
- Date de parution : 2024
Une sorcière à la bibliothèque !
Publication de chercheur
Article dans une revue Nouveauté
- Date de parution : 2024