- Joe Sventek
In this talk, I introduce three recent and/or ongoing projects that are representative of the work we do in my lab (Learner Corpus Research and Applied Data Science Lab). The first project investigates the features of academic language using multidimensional analysis (Biber, 1988; 2004) and a wide array of linguistic features extracted using an NLP pipeline with a number of post-processing steps. In particular, we explore the ways in which the language used in technology-mediated learning environments are similar to and distinct from the language used in more traditional learning environments. The results of this project (which was funded by Educational Testing Service) have important implications for the design of gatekeeping assessments such as the Test of English as a Foreign Language (TOEFL iBT), which relies on data from traditional learning environments. The second project investigates the degree to which the strength of association of dependency bigrams (e.g., verb-direct object; verb-adverb) affect rater's perceptions of proficiency in both argumentative essays and oral proficiency interviews. The results indicate that particular dependency bigrams meaningfully increase proficiency score prediction accuracy when included in multivariate models with more traditional variables (e.g., word frequency, word concreteness). In the third project (which is currently in progress), we explore the accuracy of NLP processes in out of domain second language texts and explore whether the addition of small, manually tagged training corpora can reasonably improve part of speech tagging and parsing models.
PhD Georgia State University (2016)
Assistant Professor, Department of Second Language Studies, University of Hawaii (2016-2019)
Assistant Professor, Department of Linguistics, University of Oregon (2019-)