Blog Post #1

Am I the A**hole? Lexical Bundles in Reddit's AITA posts

Thomas Sears and Thomas De Meola

“Am I the Asshole” is a popular thread on Reddit where users describe situations and ask other users to vote whether the original poster was the asshole in the situation described in the post (yes, the asshole - YTA) or not (not the asshole – NTA). The aim of this blog post is to analyze a corpus of YTA vs NTA texts and determine whether there are certain linguistic markers that could indicate whether a post will be voted as YTA or NTA.Looking for lexical bundles in a corpus can be a way of finding meaningful patterns that tell us something about the way in which people use language – in this case, who is the asshole and who is not.  In this short study, we utilized two corpora, one a collection of Reddit 300 posts where the poster is deemed “the asshole” by their peers, and the other 300 posts where the poster is “not the asshole”.The objective is to extract n-grams using AntConc to see if there is any language specific to the YTA posts compared to the language of those individuals considered to be ‘less assholeish’. After experimenting with the parameters available in AntConc, we decided to search each corpus for case-insensitive 3-word n-grams. N-gram parameters were set to a minimum frequency of 20 and a minimum range of 10 files across both corpora in order for the lexical bundles to be more representative of the sample corpora.The following table depicts the top 20 results for comparison between the YTA and the NTA corpora.

Both NTA and YTA results seem similar upon first inspection. Looking at 3 word n-grams the top 5 were:

NTA’s: I don’t, I didn’t, I told her, (don)t want to, and aita for not.
YTA’s: I don’t, I didn’t, am I the, I the asshole, and I m not. 
Both have a lot of use of I, focusing on oneself and negation; NTA’s has the first use of not having first-person singular by having her in I told her. Looking at the complete top 20 results, NTA has 4 instances of bringing up another person, as in I told her, I told him, she didn’t, told her that, while YTA has I told her and she didn’tBoth NTA and YTA results showed a focus on first person usage and negation with phrases such as I don’t, I didn’t, I don’t think, etc. This makes sense given that these threads are both asking whether they are the a*hole and don’t want to be. Another common element in both is the usage of told: I told her, I told him, told her that. These could indicate that the people posting are say why they are not the a*hole by explaining in the post that they told someone else to do something, possibly redirecting responsibility and blame for their actions to the other party. An interesting pattern that emerged was that there were more instances of 3rd person in NTA vs YTA, but also that there were more instances of she/her in NTA with no instances of him in YTA. This could indicate a few things. Since there was at least 1 instance of him in NTA and none in YTA, it could possibly indicate that more men were posting questions in YTA vs NTA. However, there were twice the number of instances referring to the 3rd person in NTA vs YTA, with a greater frequency in general, which could indicate that more people generally go to this Reddit thread with the intention of explaining why they’re not the a*hole vs confirmation that they are. After exploring the 3-word n-grams, we applied a minimum frequency of 3 to extract bundles. This new threshold revealed a number of bundles with the word “because”  in the NTA corpora.More specifically, the n-grams “because i don’t”, “because i was” and “because she was” occurred in the NTA list, and we decided to look at these in context. Using the word “because” in language is offering an explanation for an action or elaborating on the reason why a situation took place. Looking for the words following these entries, we found phrases like:
  • “I ended up doing it because I was not comfortable with what happened”
  • “I did not retaliate because I was too shocked at why she would be so…”
  • “...despite having a busy schedule because I don’t want her working excessively.”
  • “ tell me that I am a horrible mother because she was one too.”

Of course, a larger corpus would yield better results, but it seems like we could say that the OP’s in the NTA corpus are at least providing a reason behind some action in their posts. It is possible that offering this reasoning shows that OP has thought about the situation and at least has considered why and how it transpired. In turn, readers may come to the conclusion that, due in part to this menial level of reasoning, OP did not act on emotions and, consequently, are deemed to be NTA more frequently
In addition to looking at 3-word bundles, we also extracted 4-word bundles. In these longer bundles, we observed that the entry “am i the asshole” occurred frequently in both corpora. We decided to investigate this in context as well, paying particular attention to what comes immediately after the entry. Here, we found contexts like:
  • NTA
○“Am I the asshole for not just drinking the ruined tea?” ○“Am I the asshole for wanting to stay at my dad’s during the week?”○“Am I the asshole for canceling on them because I selfishly want to spend my birthday celebrating myself and spoiling me?”○“Am I the Asshole for asking them to take responsibility for locking the door behind them regardless of if I’m home?” YTA○“Am I the asshole for bugging him a lot about reading some webcomics?” ○“Am I the asshole for not just "sucking it up and going" ○“Am I the Asshole for feeling betrayed about that?
While we thought that we would find meaningful differences between the YTA and the NTA corpora in the concordance lines of “am I the asshole”, we actually found that there are two times the amount of entries in the YTA corpus that contain this term than in the NTA corpus (range of 16 and 8, respectively). The YTA corpus also included more instances of OPs asking “Am I the asshole?” as a standalone phrase, without including the action they took that may make them the asshole. We do not think that the inclusion of this phrase as a full sentence in and of itself makes a given post more asshole-like or makes the reader think that the OP is more of an asshole, but it is more frequent in the YTA posts that this question appears in its own sentence, rather connected to a subordinate clause.More analysis and larger corpora are needed to provide a true representation of the entire subreddit as a whole. Yet, from this data, it seems that OPs who add explanations of an action using subordinate clauses beginning with “because” are deemed NTA more than those OPs who do not.  Moreover, it appears that OPs who ask AITA explicitly in their posts are usually judged to be the asshole, but it would be interesting to see if these observations held over a larger sample of posts.