- Congrès : Corpus-Based Research in the Humanities CRH-2 (2018-01-25 - 2018-01-26)
- Pages : 55-64
Résumé
Witnesses of medieval literary texts, preserved in manuscript, are layered objects , being almost exclusively copies of copies. This results in multiple and hard to distinguish linguistic strata – the author's scripta interacting with the scriptae of the various scribes – in a context where literary written language is already a dialectal hybrid. Moreover, no single linguistic phenomenon allows to distinguish between different scriptae, and only the combination of multiple characteristics is likely to be significant [9] – but which ones? The most common approach is to search for these features in a set of previously selected texts, that are supposed to be representative of a given scripta. This can induce a circularity, in which texts are used to select features that in turn characterise them as belonging to a linguistic area. To counter this issue, this paper offers an unsupervised and corpus-based approach, in which clustering methods are applied to an Old French corpus to identify main divisions and groups. Ultimately, scriptometric profiles are built for each of them.
Disciplines
Partager sur les réseaux sociaux
Publications de chercheur
CATMuS-Medieval: Consistent Approaches to Transcribing ManuScripts
Publication de chercheur
Communication dans un congrès
- Date de parution : 2024
Layout Analysis Dataset with SegmOnto
Publication de chercheur
Communication dans un congrès
- Date de parution : 2024
Les registres médiévaux de Notre Dame : une archive numérique ouverte de la vie du chapitre
Publication de chercheur
Communication dans un congrès
- Date de parution : 2024