- Revue : Armeniaca - International Journal of Armenian Studies (1)
- Pages : 73-96
Résumé
Creating a digital corpus enriched by full linguistic annotations is a work which classically integrates several manual steps of acquisition, processing, and data display. Processing presupposes the existence of dedicated and specialised analysis tools, adapted to the state of the language used in the corpus. This paper describes a semi-supervised process for building Armenian corpora from scanned documents. This method is based on a chain of applications pre-trained by Calfa and GREgORI and enabling the complete processing of texts, from their automated input to their linguistic analysis and data display. We provide an assessment of this methodology and benefits of model specialisation, based on digitised copies of a 17th-century manuscript of the Four Gospels (Walters MS W541 = BAL W541, Amida Gospels, ff. 113v-117r: Lk 1:1‑78).
Partager sur les réseaux sociaux
Publications de chercheur
‘La Rochelle, notre commune patrie': the World of the Rochelais Huguenots before the Revocation of the Edict of Nantes
Publication de chercheur
Chapitre d’ouvrage
- Date de parution : 2025
Enhancing Arabic Maghribi Handwritten Text Recognition with RASAM 2: A Comprehensive Dataset and Benchmarking
Publication de chercheur
Communication dans un congrès Nouveauté
- Date de parution : 2024
Cross-Dialectal Transfer and Zero-Shot Learning for Armenian Varieties: A Comparative Analysis of RNNs, Transformers and LLMs
Publication de chercheur
Communication dans un congrès Nouveauté
- Date de parution : 2024