SegmOnto – A Controlled Vocabulary to Describe Historical Textual Sources

Par Simon Gabay, Ariane Pinche et Kelly Christensen.

Our initiative aims to design a controlled vocabulary for the description of the layout of textual sources: SegmOnto. Following a codicological approach rather than a semantic one, it is designed as a generic typology, coping with a maximised number of cases rather than answering specific needs. Systematise the layout description has a double objective: on the one hand it facilitates the exchange of annotated data and therefore the training of better models for image segmentation (a crucial preliminary step for text recognition), on the other hand, it allows the development of a shared post-processing workflow and pipeline for the transformation of ALTO or PAGE files into DH standard formats such as RDF or TEI.

Partager sur les réseaux sociaux

Sur les mêmes thématiques

Publications aux éditions de l’École

Applications, éditions et jeux de données

Publications de chercheur