• Congrès : 4th International Conference on Natural Language Processing for Digital Humanities (2024-11-16)
  • Directeur(s) : EMNLP 2024

Résumé

This paper evaluates lemmatization, POS tagging, and morphological analysis for four Armenian varieties: Classical Armenian, Modern Eastern Armenian, Modern Western Armenian, and the under-documented Getashen dialect. It compares traditional RNN models, multilingual models like mDeBERTa, and large language models (ChatGPT) using supervised, transfer learning, and zero/few-shot learning approaches. The study finds that RNN models are particularly strong in POS-tagging, while large language models demonstrate high adaptability, especially in handling previously unseen dialect variations. The research highlights the value of cross-variational and in-context learning for enhancing NLP performance in low resource languages, offering crucial insights into model transferability and supporting the preservation of endangered dialects.

Partager sur les réseaux sociaux

Publications de chercheur

Publications aux éditions de l’École

Sur les mêmes thématiques

Applications, éditions et jeux de données