- Congrès : 4th International Conference on Natural Language Processing for Digital Humanities (2024-11-16)
- Directeur(s) : EMNLP 2024
Résumé
This paper evaluates lemmatization, POS tagging, and morphological analysis for four Armenian varieties: Classical Armenian, Modern Eastern Armenian, Modern Western Armenian, and the under-documented Getashen dialect. It compares traditional RNN models, multilingual models like mDeBERTa, and large language models (ChatGPT) using supervised, transfer learning, and zero/few-shot learning approaches. The study finds that RNN models are particularly strong in POS-tagging, while large language models demonstrate high adaptability, especially in handling previously unseen dialect variations. The research highlights the value of cross-variational and in-context learning for enhancing NLP performance in low resource languages, offering crucial insights into model transferability and supporting the preservation of endangered dialects.
Partager sur les réseaux sociaux
Publications de chercheur
Enhancing Arabic Maghribi Handwritten Text Recognition with RASAM 2: A Comprehensive Dataset and Benchmarking
Publication de chercheur
Communication dans un congrès
- Date de parution : 2024
Une coopération archivistique. La mission d’Yves Pérotin en Algérie (avril-juillet 1964)
Publication de chercheur
Chapitre d’ouvrage Nouveauté
- Date de parution : 2024
Yves Pérotin (1922-1981). L'archiviste inimitable
Publication de chercheur
Ouvrage Nouveauté
- Date de parution : 2024