Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

Revue : Journal of Data Mining and Digital Humanities

Consulter la fiche HAL

Résumé

This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.

Disciplines

Humanités numériques

Partager sur les réseaux sociaux

À découvrir

Découvrez d'autres productions de l'École sur les mêmes thématiques.

Humanités numériques

Consulter la page «Humanités numériques»

SegmOnto: A Controlled Vocabulary to Describe and Process Digital Facsimiles

Publication de chercheur
- Simon Gabay,
  Ariane Pinche,
  Kelly Christensen,
  Jean-Baptiste Camps
Intelligence artificielle et institutions patrimoniales

Vidéo
- Emmanuelle Bermès
Enhancing Arabic Maghribi Handwritten Text Recognition with RASAM 2: A Comprehensive Dataset and Benchmarking

Publication de chercheur
- Chahan Vidal-Gorène,
  Clément Salah,
  Noëmie Lucas,
  Aliénor Decours-Perez,
  Antoine Perrier
Cross-Dialectal Transfer and Zero-Shot Learning for Armenian Varieties: A Comparative Analysis of RNNs, Transformers and LLMs

Publication de chercheur
- Chahan Vidal-Gorène,
  Nadi Tomeh,
  Victoria Khurshudyan
Generative Artificial Intelligence and Historical Research: Challenges, Potentials, and Limitations. Application of RAG to French Parliamentary Debates of the Third Republic (1881-1940)

Publication de chercheur
- Aurélien Pellet,
  Julien Perez,
  Marie Puren
Accountable AI for Authentic Records?

Vidéo
Optimizing HTR and Reading Order Strategies for Chinese Imperial Editions with Few-Shot Learning

Publication de chercheur
- Marie Bizais-Lillig,
  Chahan Vidal-Gorène,
  Boris Dupin
Detecting and Deciphering Damaged Medieval Armenian Inscriptions Using YOLO and Vision Transformers

Publication de chercheur
- Chahan Vidal-Gorène,
  Aliénor Decours-Perez
Consulter la page «Humanités numériques»

Nous suivre

Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

Résumé

Résumé

Disciplines

Humanités numériques

Partager sur les réseaux sociaux

À découvrir

Humanités numériques

SegmOnto: A Controlled Vocabulary to Describe and Process Digital Facsimiles

Intelligence artificielle et institutions patrimoniales

Enhancing Arabic Maghribi Handwritten Text Recognition with RASAM 2: A Comprehensive Dataset and Benchmarking

Cross-Dialectal Transfer and Zero-Shot Learning for Armenian Varieties: A Comparative Analysis of RNNs, Transformers and LLMs

Generative Artificial Intelligence and Historical Research: Challenges, Potentials, and Limitations. Application of RAG to French Parliamentary Debates of the Third Republic (1881-1940)

Accountable AI for Authentic Records?

Optimizing HTR and Reading Order Strategies for Chinese Imperial Editions with Few-Shot Learning

Detecting and Deciphering Damaged Medieval Armenian Inscriptions Using YOLO and Vision Transformers