- Congrès : Computational Humanities Research (CHR) (2024-12-04 - 2024-12-06)
Résumé
Recent advancements in handwritten text recognition (HTR) for historical documents have demonstrated high performance on cursive Arabic scripts, achieving accuracy comparable to Latin scripts. The initial RASAM dataset, focused on three Arabic Maghribi manuscripts, facilitated rapid coverage of new documents via fine-tuning. However, HTR application for Arabic scripts remains constrained due to the vast diversity in spellings, ambiguities, and languages. To overcome these challenges, we present RASAM 2, an extended dataset with 3,750 lines from 15 manuscripts in the BULAC library, showcasing various hands, layouts, and texts in Arabic Maghribi script. RASAM 2 aims to establish a new benchmark for HTR model training for both Maghribi and Oriental scripts, covering text recognition and layout analysis. Preliminary experiments using a word-based CRNN approach indicate significant model versatility, with a nearly 40% reduction in Character Error Rate (CER) across new in-domain and out-of-domain manuscripts.
Partager sur les réseaux sociaux
Publications de chercheur
Cross-Dialectal Transfer and Zero-Shot Learning for Armenian Varieties: A Comparative Analysis of RNNs, Transformers and LLMs
Publication de chercheur
Communication dans un congrès Nouveauté
- Date de parution : 2024
Une coopération archivistique. La mission d’Yves Pérotin en Algérie (avril-juillet 1964)
Publication de chercheur
Chapitre d’ouvrage Nouveauté
- Date de parution : 2024
Yves Pérotin (1922-1981). L'archiviste inimitable
Publication de chercheur
Ouvrage Nouveauté
- Date de parution : 2024