7. Juli 2021: Statistics in linguistics: Thoughts on recurrent issues and pedagogy
Im Rahmen des Oberseminars Computerlinguistik findet am
Mittwoch, den 07.07.2021, 16:15-17:45 Uhr
ein Vortrag statt, zu dem wir alle Interessierten herzlich einladen möchten:
Dr. Bodo Winter (University of Birmingham)
Statistics in linguistics: Thoughts on recurrent issues and pedagogy
Im Anschluss besteht die Möglichkeit, mit Bodo Winter noch ausführlicher über diese Grundlagenfrage der korpuslinguistischen Methodologie zu diskutieren.
Meeting ID: 923 4348 2845
Telefoneinwahl: +49 69 7104 9922, +49 30 5679 5800, +49 69 3807 9883, +49 695 050 2596
Herzliche Grüße vom
Team des Lehrstuhls für Korpus- und Compuerlinguistik
It is safe to say that linguistics is undergoing a quantitative revolution. Thanks to widely available open-source programming languages such as R and Python, analyses in our field are becoming increasingly more sophisticated. However, certain age-old issues persist despite these developments. In this talk, I want to openly reflect on what I personally see as the most pressing issues of statistical methodology in our field, based on my experience of teaching statistics workshops and consulting on projects in various subfields, from corpus linguistics over phonetics to typology. I will highlight that despite the increasing use of linear mixed effects models, violations of the independence assumption are still a persistent issue, in particular in corpus linguistics. I will review how linear mixed effects models are used across different subfields of linguistics, and discuss the fact that there are currently no standards whatsoever about what random effects structures are appropriate for corpus linguistics. Towards the end of my talk, I will argue that most issues we face in linguistics are ultimately rooted in a statistical pedagogy that is quite far removed from the complexities of linguistic datasets, and that still teaches classical significance tests even though these are rendered obsolete by the linear model framework. Instead of focusing on prefab procedures such as t-tests and ANOVAs, we should be teaching students how to reason about and build explicitly generative statistical models. Towards the end of my talk I will discuss some of the pedagogical advantages of moving teaching in linguistics towards a framework that endorses Bayesian multilevel modeling facilitated by the easy-to-use R package “brms”.