Handling Heavily Abbreviated Manuscripts: HTR engines vs text normalisation approaches

Congrès : International Conference on Document Analysis and Recognition 2021 (2021)
Pages : 306-316

Consulter la fiche HAL

Résumé

Although abbreviations are fairly common in handwritten sources, particularly in medieval and modern Western manuscripts, previous research dealing with computational approaches to their expansion is scarce. Yet abbreviations present particular challenges to computational approaches such as handwritten text recognition and natural language processing tasks. Often, pre-processing ultimately aims to lead from a digitised image of the source to a normalised text, which includes expansion of the abbreviations. We explore different setups to obtain such a normalised text, either directly, by training HTR engines on normalised (i.e., expanded, disabbreviated) text, or by decomposing the process into discrete steps, each making use of specialist models for recognition, word segmentation and normalisation. The case studies considered here are drawn from the medieval Latin tradition.

Disciplines

Humanités numériques

Partager sur les réseaux sociaux

À découvrir

Découvrez d'autres productions de l'École sur les mêmes thématiques.

Nous suivre

Handling Heavily Abbreviated Manuscripts: HTR engines vs text normalisation approaches

Résumé

Résumé

Disciplines

Humanités numériques

Partager sur les réseaux sociaux

À découvrir

Humanités numériques

SegmOnto: A Controlled Vocabulary to Describe and Process Digital Facsimiles

Intelligence artificielle et institutions patrimoniales

Enhancing Arabic Maghribi Handwritten Text Recognition with RASAM 2: A Comprehensive Dataset and Benchmarking

Cross-Dialectal Transfer and Zero-Shot Learning for Armenian Varieties: A Comparative Analysis of RNNs, Transformers and LLMs

Generative Artificial Intelligence and Historical Research: Challenges, Potentials, and Limitations. Application of RAG to French Parliamentary Debates of the Third Republic (1881-1940)

Accountable AI for Authentic Records?

Optimizing HTR and Reading Order Strategies for Chinese Imperial Editions with Few-Shot Learning

Detecting and Deciphering Damaged Medieval Armenian Inscriptions Using YOLO and Vision Transformers

OCR

Affaires de style

Manuscrits

Le service des manuscrits contemporains de la BnF

Les manuscrits d’auteur en Italie (XIV^e-XXI^e siècles)

Une nouvelle version de Mandragore, la base iconographique des manuscrits de la BnF

Lettres en lumières

HTR of Handwritten Paleographic Greek Text as a Function of Chronology

Une cursive du XVII^e siècle

EpiSearch. Recognising Ancient Inscriptions in Epigraphic Manuscripts

Ouverture du colloque « Documents anciens et reconnaissance automatique des écritures manuscrites »

Nous suivre

Handling Heavily Abbreviated Manuscripts: HTR engines vs text normalisation approaches

Résumé

Résumé

Disciplines

Humanités numériques

Partager sur les réseaux sociaux

À découvrir

Humanités numériques

SegmOnto: A Controlled Vocabulary to Describe and Process Digital Facsimiles

Intelligence artificielle et institutions patrimoniales

Enhancing Arabic Maghribi Handwritten Text Recognition with RASAM 2: A Comprehensive Dataset and Benchmarking

Cross-Dialectal Transfer and Zero-Shot Learning for Armenian Varieties: A Comparative Analysis of RNNs, Transformers and LLMs

Generative Artificial Intelligence and Historical Research: Challenges, Potentials, and Limitations. Application of RAG to French Parliamentary Debates of the Third Republic (1881-1940)

Accountable AI for Authentic Records?

Optimizing HTR and Reading Order Strategies for Chinese Imperial Editions with Few-Shot Learning

Detecting and Deciphering Damaged Medieval Armenian Inscriptions Using YOLO and Vision Transformers

OCR

Affaires de style

Manuscrits

Le service des manuscrits contemporains de la BnF

Les manuscrits d’auteur en Italie (XIVe-XXIe siècles)

Une nouvelle version de Mandragore, la base iconographique des manuscrits de la BnF

Lettres en lumières

HTR of Handwritten Paleographic Greek Text as a Function of Chronology

Une cursive du XVIIe siècle

EpiSearch. Recognising Ancient Inscriptions in Epigraphic Manuscripts

Ouverture du colloque « Documents anciens et reconnaissance automatique des écritures manuscrites »

Les manuscrits d’auteur en Italie (XIV^e-XXI^e siècles)

Une cursive du XVII^e siècle