Academy of Management 2014: New research techniques for computer-aided text analysis

by Tim Hannigan

The Content Analysis PDW on 1 August was a great event that was sponsored by the CCR, planned by International Research Fellow Mike Pfarrer with Moriah Meyskens. The year’s PDW presentations were structured around the process of computer-aided text analysis.

Jason Kiley kicked things off with an overview of the issues that surround preparing content for a computational text analysis. This emphasized key issues in cleaning and formatting texts from popular research sources such as Lexis-Nexis. He also covered more advanced topics such as using Python for manipulating textual data and extracting content features.

I followed up in a joint-presentation with Robert Vesco (Postdoc at Yale University) on "advanced approaches to content analysis”. We explored how four different advanced techniques of text analysis - topic modelling, named entity recognition, advanced sentiment analysis and concept networks- could be complementary. We focused much of the presentation on a sample topic modelling exercise.

We showed a sample topic modelling analysis of abstracts data-mined from the entire AOM 2014 online programme. This highlighted how a corpus of documents could be used to generate a series of “topics”, or statistically significant clusters of words that appear close to one another. Topics are only particularly useful in their interpretability, so the critical aspect of such an analysis is to adjust the parameters until the topics are semantically meaningful.

Once we had generated a statistical topic model, we reapplied this to each document, so it would be represented as distributions of topical proportions. This allowed us to think of documents in topical terms, so each would exhibit proportions of one or more topics. When considering that each AOM 2014 abstract was linked to one or more AOM division sponsors, we were able to aggregate topic scores within each division. This then meant that we could derive a “topical signature” for each division. In a nutshell, this is quite powerful because we were able to see how similar or different each division was, simply based on the expression of topics across associated AOM 2014 abstracts. Like cities on a map, we were able to show divisions in an MDS chart as being clustered. As predicted, OMT, MOC BPS, ENT and TIM were clustered close together. We were also able to see which topics were exhibited strongly in which divisions.

Such advanced approaches to computational text analysis are amazing in what they promise, but there also cautions and downsides. We emphasized that there is a high up-front cost to learning such advanced methods, and that there are many opportunities to collaborate with computer scientists. However, with these methods, it is also important to not stand too far from the data. Hence, we advocated for the best approaches as iterating between high level advanced analyses and then zooming into read texts and become immersed them. There are several examples of ethnographers collaborating with computer scientists to do studies that make some use of topic-modelling.

The PDW was closed with a presentation from Mike Bednar on the journey of getting content analysis work published. Mike shared his experiences with convincing reviewers in organizational research journals to accept computer-aided text analysis work. Despite these works using relatively simple computational approaches, many of the same critiques apply to more advanced approaches. As such, he offered valuable advice on how to structure such work so as to build human validated face-validity on constructs. This helped round out the PDW to cover the life-cycle of conducting computer-aided text analysis research.