Can Qualitative Data Be Quantified?

April 24, 2017

By Rachel Marias

Education researchers are responsible for decisions and policies that shape how education functions. Researchers often collect quantitative data (in the form of test scores, graduation rates, GPAs, course enrollments, etc.) to see if policies actually work. They then collect qualitative data to examine the mechanisms behind the policies (e.g. the “how” and “why”). Qualitative data is often thought to be better suited to answer questions about the experiences of students, but researchers may run into obstacles when faced with large qualitative datasets. This type of data requires a significantly greater investment of time for coding and analysis than quantitative data, which can be analyzed almost instantly through statistical software. However, recent advancements in technology seem to be opening up the prospects of more expedient qualitative research. It is worth asking, though, if this is necessarily a good thing.


One Promising Way to Handle Large Amounts of Qualitative Data

Educational researchers must find ways to efficiently analyze large amounts of qualitative data.
Photo courtesy of

 The Spartan Persistence Project, on which I work as a researcher, collects massive amounts of data throughout a social psychological intervention study. Part of the intervention includes a writing assignment where incoming Michigan State University (MSU) students respond to a prompt about one of the following: their sense of belonging on campus, their understanding of growth mindset, or the architecture on campus. In this blog, I focus specifically on the students who wrote about their sense of belonging on campus. Students who participated in the belonging treatment produced written responses that ranged from 5 words to multiple paragraphs. In this first year 1,621 students provided written responses over 5 words. The wealth of data created a barrier to rapid analysis due to the shear number of narratives.

Rather than read every response from every incoming student, I employed a systematic way of identifying themes in responses based on student language and word choice. This process began with reading a random sample of 50 responses. This sample provided evidence that students write about belonging in multiple ways. Following this reading, a larger random sample (20% of all responses) was coded for broad themes of belonging: both themes emergent from the students’ voices and themes rooted in the belonging literature. The language in each code was reduced to keywords and phrases. These keywords and phrases populated “dictionaries” in the Linguistic Inquiry and Word Count (LIWC) software I used. With these dictionaries, based entirely on student language, LIWC counted all instances of these words in the student responses fed into the software. This software allowed for analysis of the frequency of language used around particular ideas (such as belonging to a sports team), or a specific phrase with a technical meaning in the MSU context (such as “failing forward”). While the development of these codes, keywords, and phrases is somewhat time consuming, it allowed for the more rapid analysis of the full dataset.


Potential Drawbacks to Qualitative Software

While this methodology allowed for large-scale analysis by a single researcher, the ease of analysis is potentially a drawback. A flag for potential problems with this methodology is the reliance on a single researcher, whose own background, views, and attention may bias the results. For example, there may be parts of the students’ language to which the researcher is not sensitive and other parts, which the researcher notices much more frequently. These may be regional nuances of language, such as the use of the term “blind” that MSU students use in reference to a randomly assigned dorm roommate. Additionally, there is the potential for culturally-based bias when a single researcher may not be aware of cultural indicators in specific language choices.

It may be assumed that quantifying qualitative data would rid the data of some ambiguity. Especially with the promise of expedient analysis, this kind of methodology can be appealing. If the assumption underlying the analysis is to remain grounded in student language, it is imperative that choices made in the process of developing codes is rooted in student language. The process of quantifying this qualitative data does not make the data unambiguous, but rather opens up the potential of ambiguity hidden behind quantitative data. If researchers wish to employ this methodology, special care needs to be paid to addressing the potential problems around bias and how they may affect the larger policy arena. The Spartan Persistence Project addresses these potential issues of bias with constant evaluation of the dictionaries though renewed analysis in successive years of implementation.