A  A

AD·VNVM·DATVM Down to a single bit of data : Long Notes : Latest Post I am a PhD Graduand

Posted in || , , 4 min. to read

In short:

I have successfully defended my dissertation in Psychology at the University of Oregon, and am a PhD graduand (i.e., awaiting the conferral of the degree). My dissertation and its code and supporting files are freely available.

On November 22nd, 2016, I successfully defended my dissertation, "The Axiology of Necrologies: Using Natural Language Processing to Examine Values in Obituaries," making me a graduand for my Writing this post marks the first time I've used the phrase "my PhD;" previously, and despite the possible motivating effect of referring to it in that way, I felt uncomfortable with claiming it before having earned it. PhD in Personality and Social Psychology from the University of Oregon (i.e., having completed all requirements for the degree, and awaiting its conferral).

A summary of my dissertation project

A researcher named Shalom Schwartz proposed that when it comes to values, cultures across the world all have the same 10 basic "taste buds." These are (grouped into 4 overarching categories):

  1. "Self-transcendence:" Universalism, and benevolence
  2. "Conservation:" Security, tradition, and conformity
  3. "Openness to change:" Stimulation, self-direction, and hedonism
  4. "Self-enhancement:" Achievement, power, and hedonism

Schwartz's theory is that all cultures and subcultures value these same basic things, but that they value them in different levels (similarly, humans around the world have the same basic taste buds, while different cultures prefer different flavors).

Thanks to the generosity of Legacy.com, which publishes obituaries from 1,500+ newspapers internationally and whose officers allowed me to scrape obituary text from their website, I analyzed 140,599 obituaries from 832 newspapers in the USA to understand how much obituaries in different areas talk about those 10 values. Specifically, I analyzed every word in each of those obituaries to see how distant in a thesaurus it was from prototype words for each of those 10 values. I then created or added to computer programs to read each obituary and attempt to categorize the gender and age at death of the deceased, so that I could understand whether values are talked about differently based on those characteristics. I also used US Census data to look at whether those values were talked about differently based on places' income levels, education levels, and ethnic demographics.

A summary of some of the findings

To my surprise, of those 10 Schwartz values, the authors of the obituaries I analyzed most indicated about power, conformity, and security (I expected power to be least talked about, alongside hedonism). However, in line with what I expected, universalism, hedonism, and stimulation were least indicated. Unexpectedly, it turned out that when hedonism was talked about, conformity was often also talked about; I wonder whether this means that obituary authors "compensate" whenever describing a hedonistic quality of the deceased, since across the USA, hedonism is often seen as a less-desirable value (since it's linked with indulgence). By far (more than any individual-level or community-level demographic predictors), the largest variance component in my statistical models was the newspaper in which an obituary had been published, indicating that newspapers "clump together" in the values-expressions of the obituaries they publish. This "clumping" may be attributable to both shared values among the authors of obituaries in a given community, and because of templating exhibited by obituary authors (either explicitly, as "Fill in these blanks," or implicitly, as "Other obituaries I've read used this phrase, so I'll use it, too").

The dissertation and its supporting files

The dissertation itself can be found here (until it gets a permanent address from the University of Oregon's libraries). It is released under a Creative Commons Attribution 4.0 (i.e., CC-BY 4.0) License.

The dissertation presentation can be found here. The presentation uses Reveal.js, and can be navigated with the arrow keys -- vertical for within-topic slide changes, horizontal for across-topic slide changes. Pressing the m key will display a menu of slides, while pressing o will show a slide overview. Pressing the s key will show speaker notes.

In addition, I've posted much of my code and some derived datasets (for example, the list of words found in the obituaries corpus) especially for other researchers to use at http://doi.org/10.7264/D3WC7S. The code can be cited at that DOI, and is openly licensed (license terms are provided alongside the files).

This project felt like a very special way to finish my graduate program, since it involved not just performing the research, but also working consciously to be ethical with the texts and respectful to the authors and subjects of the obituaries I was stewarding. The dissertation document includes a discussion about the privacy implications of this type of "morality mining;" This phrase was coined by Christen, Alfano, Bangerter, and Lapsley (2013), for a type of data mining that specifically seeks to understand moral or value-relevant properties of its subjects. I welcome further discussion about that aspect especially, and about the use of similar methods for other ethical data analysis applications.

The image at the top of this post is a diagram of a medieval labyrinth, published in Nordisk Familjebok (cf. this original image). To me, it accurately symbolizes the nature of the dissertation-writing process as I experienced it. Unlike a maze, it moves toward a single goal on a single path; but it is convoluted in its progression, requiring patience not only to reach its center, but also to then traverse back to its starting point (with something new having been gained on the journey). I'll be writing more about this metaphor soon.

More Posts:

  1. Introducing Markdown Mapper // I've written a command-line program in R to facilitate reverse-engineering concept maps from plain-text notes (especially those written in Markdown).
  2. Paperback is Fascinating, Exciting, and Funnily Funny // Paperback is an open-source program that was apparently created as a joke for paper-based backups, but works, and well.
  3. I Wrote a Pelican Plugin for Privacy // Following an idea from the Electronic Frontier Foundation, I wrote a plugin for Pelican to protect user privacy on pages with YouTube videos embedded.