From HTR to Critical Edition: A Semi-Automatic Pipeline

Par Daniel Stoekl Ben Ezra.

This paper describes a pipeline for the creation of critical editions of literary texts from manually corrected HTR results of distinct manuscripts as prepared in the Sofer Mahir project. The Sofer Mahir project produces manually corrected transcriptions of 16 large medieval Hebrew codexes of all six main works of Tannaitic Rabbinic literature, redacted in the third or perhaps fourth century CE in Galilee. These works comprise Mishnah (~200k tokens), Tosefta (~300k tokens), Mekhilta deRabbi Yishmael (~80k tokens), Sifra (~120k tokens), Sifre Numbers (~60k tokens) and Sifre Deuteronomy (~60k tokens). Each work is extant in between 3 (Mishnah and Tosefta) to 5 witnesses (all others).

Partager sur les réseaux sociaux

Sur les mêmes thématiques

Publications aux éditions de l’École

Applications, éditions et jeux de données

Publications de chercheur