In this page you will find a list of the ongoing projects in the corpus lab.

At the bottom you'll find a list of questions that have yet to be answered in Corpus Linguistics.

Corpus-informed and corpus-based ESL/EFL teaching materials

The aim of the project is to bring corpora and corpus-informed tasks to the English Language Classroom. Corpora are large electronic collections of texts that can be analyzed automatically with corpus tools (e.g., Antconc, Sketch Engine, Lancsbox) to identify patterns in language use. These patterns identified by corpus analysis might be the result of sociolinguistic, psycholinguistic, historic, or semantic processes. That is, corpus tools are a way to explore linguistic phenomena from different areas of linguistics. The goal of this project is to use the results of corpus research to develop teaching materials that reflect real-life language use.

AITA Linguistic Characteristics

with Jordan Smith (University of North Texas) and Daniel Keller (Northern Arizona University)

Reddit is an increasingly popular social media website, which contains a number of forums (or subreddits) on different topics. One of these subforums is called Am I the Asshole (AITA), where the users narrate a personal situation in order to get feedback from other users on whether they were morally wrong (YA) or not (NA) in a specific situation. AITAs have become common even outside of Reddit, with podcasts, youtube channels and other social media accounts dedicated to discussing the most popular AITA posts of the week. In this study, we explore the language of AITA posts to determine which linguistic features are characteristics of posts voted as assholes or not the asshole.

The Role of Discipline and Register in Academic Spoken Discourse: A Structural Equation Modeling Study

with Maria Kostromitina (Northern Arizona University) and Tove Larsson (Northern Arizona University)

Recently, studies in written academic discourse, particularly MDAs, have shown that both register and discipline impact the use of elaboration and compression (see Egbert, 2015; Gray, 2015; Goulart, 2022). These studies demonstrated that these aspects of written production matter in the use of elaboration features to different extents. As such, within humanities, certain registers (e.g., argumentative and explanatory) may be more elaborate than others (e.g., procedural recounts and proposals). While the studies above have considered the role of elaboration and compression features in academic writing, little is known about the ways registers and disciplines might affect the use of elaboration and compression features in spoken academic discourse. More importantly, the features that characterize physical and life sciences are usually associated with academic writing, raising the question as to what extent this is also true in spoken academic discourse. Thus, the proposed study investigates the effects of register and discipline on two features of elaboration (verb complement clauses and noun complement clauses) and three features of compression (premodifying nouns, attributive adjectives and prepositional phrases) in spoken academic discourse.


with Shelley Staples (University of Arizona) and Maria Kostromitina (Northern Arizona University)

The goal of this project is to investigate the linguistic patterns that we can identify in a corpus of Spoken Academic English (the British Academic Spoken Corpus) across registers and disciplines using a multidimensional analysis.

Meta-analysis of Multi-dimensional Analysis

with Margaret Wood (Northern Arizona University) and Doug Biber (Northern Arizona University)

Since 1984 when Douglas Biber first conducted a multi-dimensional (MD) analysis, MD analyses have expanded in scope. However, there has yet to be a comprehensive survey of this body of research. Following a previously conducted methodological synthesis of MD research (Goulart & Wood, in press), this meta-analysis will explore the results of these MD studies in an investigation of Biber’s (2014) prediction that there exist two ‘universal’ dimensions of variation in language: an oral vs. literate dimension and a narrative vs. non-narrative dimension. The study collection currently comprises an unprecedented 230 studies employing multi-dimensional analysis, including peer-reviewed articles, dissertations, master’s theses, book chapters, and articles in press or in preparation.

You are welcome to join any of the projects described above, or browse the resources below for some project ideas if you are looking for something to work on:

Ideas for Final Papers and Theses: