Autoencoder Human Language Modeling

Check out my senior thesis I wrote on Autoencoder Human Language Modeling!

Language is inherently social, and each of us has our own individual style of writing and speaking. However, language models (LMs) typically treat language in isolation, with no information on who wrote the input text, or what else that person has written. My project explores a new method to training non-generative (autoencoder) LMs in a way that allows them to understand when two documents (e.g. blog posts, tweets) were written by the same author or different authors.

This type of “human-aware” approach could help autoencoder LMs gain a better understanding of the individual differences in our language, which could improve an LM’s performance in social tasks like sentiment analysis and stance detection.

A big thanks to my PhD student mentors Nikita Soni and Adithya Ganesan, and to my advisor, Professor H. Andrew Schwartz, on their continuous guidance and support through an incredibly cool NLP project.

The thesis can be accessed through my Linkedin post. I would love if you gave it a read!