• Dans Ancient Manuscripts in Digital Culture
  • Éditeur : BRILL
  • Pages : 87-114

Résumé

The task of automatically extracting semantic information from raw textual data is an increasingly important topic in computational linguistics and has begun to make its way into non-linguistic humanities research. That this task has been accepted as an important one in computational linguistics is shown by its appearance in the standard text books and handbooks for computational linguistics such as Manning and Schuetze Foundations of Statistical Natural Language Processing and Jurafsky and Martin Speech and Language Processing. And according to the Association for Computational Linguistics Wiki, there have been 25 published experiments which used the TOEFL (Test of English as a Foreign Language) standardized synonym questions to test the performance of algorithmic extraction of semantic information since 1997 with scores ranging from 20% to 100% accuracy. The question addressed by this paper, however, is not whether semantic information can be automatically extracted from textual data. The studies listed in the preceding paragraph have already proven this. It is also not about trying to find the best algorithm to use to do this. Instead, this paper aims to make this widely used and accepted task more useful outside of purely linguistic studies by considering how one can qualitatively assess the results returned by such algorithms. That is, it aims to move the assessment of the results returned by semantic extraction algorithms closer to the actual hermeneutical tasks carried out in the, e.g., historical, cultural, or theological interpretation of texts. We believe that this critical projection of algorithmic results back onto the hermeneutical tasks that stand at the core of humanistic research is largely a desideratum in the current computational climate. We hope that this paper can help to fill this hole in two ways. First, it will introduce an effective and yet easy-to-understand metric for parameter choice which we call Gap Score. Second, it will actually analyze three distinct sets of results produced by two different algorithmic processes to discover what type of information they return and, thus, for which types of hermeneutical tasks they may be useful. Throughout this paper, we will refer to the results produced by these algorithms as “language models” (or simply “models”) since what these algorithms produce is a semantic model of the input language which can then help answer questions about the language’s semantics. Our purpose in doing this is to demonstrate that the accuracy of an algorithm on a specific test, or even a range of tests, does not tell the user everything about that algorithm. We assert that there are cases in which an algorithm that might score lower on a certain standardized test may actually be better for certain hermeneutical tasks than a better scoring algorithm.

Partager sur les réseaux sociaux

Publications de chercheur

Publications aux éditions de l’École

Sur les mêmes thématiques

Applications, éditions et jeux de données