I’ve been silent on my blog for a while, but in my lapse I have been working on an actual paper that is part of my application to join the Santa Fa Complexity Institute in 2013. I think having a better understanding of how to make qualitative (narrative, anecdotal) data yield quantitative results is a very valuable contribution to international development, and social science more generally.
My definition of robust data is a set of information that yields the same conclusion even when subdivided into smaller parts. The smaller parts are typically time intervals and geographic regions.
Robust data yields the same conclusion across many regions, over time, and from many perspectives.
This definition is operationally useful – because you can then subject any set of qualitative narratives to the test and determine whether they are “robust”, and if not, how close to being “robust” are they?
It also points exactly to why most qualitative data cannot yield robust conclusions. Qualitative data cannot be used quantitatively if…
- if doesn’t cover a broad enough sample (geographic, demographic)
- if doesn’t cover a long enough timespan
- it lacks multiple perspectives (insiders – those “affected” and outside “observers”)
- if consists of a few long stories instead of a larger number of shorter anecdotes.
That third point is significant – analyzing 30 long stories will more often yield conclusions that do not apply to the larger population than will analyzing 300 shorter stories.
In time (and with the support of the Santa Fe Institute) I hope to build a robust-analyzer that anyone can use along these criteria. But in this post I focus on the power of comparing two perspectives. I have been using several algorithms to map out the trending themes for a particular subject within the storytelling project. I hope this demonstrates that central themes become clearer as you compare perspectives and clean up the data sources.
1: Themes from stories using wordles
Wordles are the simplest form of theme extraction. It displays the frequency of words in a giant string of text. No attempt to parse the data by structure (i.e. looking at word frequency within each story) is involved. For my two test cases, Mrembo / VAP/ Vijana Amani Pamoja organization and “rape”, this is what you see:
Wordles don’t give a very clear idea of how the most frequently used words are connected in sentences within the narratives.
2: Adding context with a phrase network – connecting adjacent words in narratives
In this version, the same source stories are used, but now words that appear next to each other in multiple stories are included and connected with a line. Words that appear most frequently may not appear in the map if the words beside them are never the same. This gives you a clearer picture of the types of sentences that appear in 92 mrembo and ~780 rape stories, respectively:
At this level the ideas in stories start to become clear. It is clear that the main lessons of the Mrembo project branch into talking about HIV/AIDS and relationships with sex partners. Mrembo is an after-school programme in one Nairobi slum to educate 11-14 year old adolescent girls about life skills. You can clearly see which kinds of challenges they face in their life, and which skills matter. At this age, girls can only survive the slum by learning how to avoid HIV/AIDS and avoid being raped (two connected challenges, clearly). I’ve described this project before here.
The larger set of stories about rape offer a different picture: “Rape cases” are more often described and another project, Sita Kimya, is mentioned the most. Other parts of the map cover drug abuse, street children, school fees, and stories about victims.
3: Extracting the main idea by analyzing stories as a group of narratives with wordtrees
In the both the wordle and phrase network methods, stories were treated a one long string of words. In this version, what I call a wordtree, each story is treated as a data object. It then performs a recursive algorithm to extract out the main idea:
- Look at all stories, find the top words (typically above 88% percentile)
- Loop through stories and look for each of these words. For the subset of stories that contain that word, look again at all the other words and pick out the few words used most, above a certain threshold.
- Now, in the next round, repeat the search, but this time fetch all stories that contain both word X and word Y, and find the most common words from that subset of stories.
Eventually this recursive approach gives you a nicely structured list of words related to each other. You can also add additional filtering parameters, such as the point of view of the storyteller, as I’ll show in version 4 below.
Example of Mrembo wordtree (92 stories) and Rape wordtree (749 stories):
In the above example, the coloring simply underscores which words are used most. Unlike the phrase network, this wordtree explicitly maps out what the Mrembo program taught (though it is an “undirected learning algorithm” that doesn’t “know” what you are looking for). The words “Mrembo taught” appear in the center and point to early pregnancy, marriage, relationships, children, rape, and bad boys. Nancy, whose job it is to refine the curriculum for the Mrembo Project, can make use of these themes in deciding what to focus on. Yesterday I met with her and she said she had been focusing more on rape in 2012 because it appeared as a major theme among girls’ stories, though was not originally a focal point of the program.
Adding storyteller point of view to Mrembo wordtree:
Mrembo is now connected in the center of the map, as it should be. The red nodes are from stories where the storyteller describes herself as being affected by the events in her story, or helping to make the events happen. The blue words are from observers. The pink nodes are a mixture of both perspectives. As one might expect, the actors and those affected by their stories talk about HIV/AIDS, sex, and avoiding rape. The observers talk about victims and what the program taught them. In this version of the algorithm, the blue and red perspectives are mapped separately from each other and then merged later, so that branching around the Mrembo taught appears to have broken up. Mrembo clearly taught about HIV/AIDS, but rape has moved father away because rape is only mentioned in the stories of those who were affected by the events they spoke about. Those girls sharing “observer” stories don’t seem to talk about rape as much as they do about protection and victims.
Rape wordtree (749)
In this example with 749 stories mentioning rape, you see that Sita Kimya is still a major part of the map. You also see that raped girls are associated with men being arrested and mother, father intervening. In contrast, raped women are related to rights, (domestic) violence, and justice. Young boys and old men are also perceived as distinctly different problem groups of potential rapists. Child abuse and drug abuse also play a role. This map, which shows a boarder perspective on the problem, could also help Nancy refine her Mrembo program.
Rape from different points of view (749 stories)
In this map, words from those affected by or involved in the events told in their story are red. Observer stories mentioning rape give rise to the blue words. There is actually a lot in this map, making it hard to point to any one trend of note. So I reran it using a higher threshold (words needed to be in the 96%ile instead of the 88%ile to appear).
In the sparser map of rape stories, there is a shared perspective (between observers and those affected by events in their stories) that girls being raped may have something to do with school. Those affected talk about justice and violence against women, and about rape cases and victims. The word victim is interesting – because here it is associated with the story actors; in Mrembo stories victim is associated with observer stories. Observers also talk much more about stories involving police, child abuse, and hospital visits.
In the larger map, I did want to point out that Mrembo makes an appearance, and is most strongly associated with HIV/AIDS. Because of the way the algorithm works (comparing word associations and saving the strong ones) I can quantitatively say that the Mrembo project is doing more to teach young girls about the association between HIV/AIDS and rape than any other social intervention in Nairobi, with a sample of over 10,000 stories.
Can or should I call this definitive proof? Within the error limits of any typical evaluation of a complex social problem and a real world geographic catchment area, this proof is as good as any other out there. But I should also point out that people define “proof” in different ways:
- Proof to a mathematician means a complete and logically defensible explanation.
- Proof to a scientist means that the observed phenomenon X is caused by Y and will always be caused by Y. (Scientists use statistics to demonstrate that X has caused Y so often that the only cases where X did not cause Y are because of experimental error. Few lab scientists publish correlational results.)
- Proof to an evaluator means there is good chance that X and Y are usually associated, based on a sample of the whole picture. Now we are talking about social science – where X does not cause Y all the time, only part of the time. People are not machines and behave and react in many different ways to the same situations. In explaining their behavior and attitudes, we are looking for meaningful associations. Impact Evaluations have a harder time – they want some clear evidence that X has had some positive effect on Y (their definition of Impact).
- Proof in the case of wordtrees means that given a sufficiently diverse sample, you should be able to define the 2 or 3 things most strongly associated between X and Y. So it is not proof to a mathematician or a scientist, but it is comparable to proof as used by social scientists. We are looking for reasonably strong associations, not causality. Strong relationships between X and Y are useful for designing interventions. They are a much more valid basis for an intervention than one’s gut instincts or an academic paper that conducted research on the same subject ten years ago in a village far far away.
So how do you know if the sample is “sufficiently diverse”? Well, we track sources of our stories, which allows me to re-calculate the same map using only scribes that collected at least 10 stories. I could also assign a pass-fail check based on the whole sample having enough storytellers, locations, and scribes. For our 749 rape stories we see:
We have 435 storytellers, 240 scribes, and between 144 and 456 unique locations. (Storytellers don’t always use the same word for the same place, so it is not exact). There are probably a few hundred more storytellers that did not give their phone number, so we can’t tell whether they are the same people or different people.
From this, and only using a more qualified subset of stories, we get these maps:
Rape wordtree – point of view, from only qualified data sources
This looks a bit like the last map. Little has changed. So I would say that our analysis of “rape” stories are coming from qualified sources (diverse). I will attempt to define the threshold of sample diversity better in the future, but it will take a lot more statistics in order to make such a threshold work across all story subsets, and give you an up or down vote on the data.
UPDATE: If you want to see the error bars associated with each word in the map, here is the statistical treatment.
When we filter out the unqualified sources, the map of the Mrembo stories changes quite a bit. The two point-of-view perspectives clearly divide. This is likely because scribes were either collecting first-person actor/affected by stories, or observer reports. These scribes are not collecting a random sample. And I knew this was the case; Mrembo’s staff was purposefully collecting stories about their project using our story forms in order to compare their program to Sita Kimya and other social issues. So it good to see that the algorithm changes the map when we take the data sources into account.
If one of the organizations that was helping collect stories for our project had made a real effort to help people in the region where they work talk about the need for more water and wells, the map would have looked like this (a qualified, source-filtered map of “water” stories from 800 pepole):
Clearly, this set of organization-centered stories lacks any real detail about the nature of the water problem. And if you were to do your own qualitative project assessment without using our storytelling method, which aims at real diversity in perspectives, you would get back a similar picture lacking any real data. When most organizations talk about “qualitative” evaluations, they make no effort to get diverse perspectives from multiple locations and over a longer time. Doing a “one-off” story collection gives you a set of self-praising anecdotes. Following the “two-story rule”, where every storyteller must talk about some other community effort in a second story, after talking about an organization in the first story, leads to quantitatively useful data. External stories become some else’s baseline data when all stories are shared. That is what I am trying to systematically provide to NGOs around the world.
If you think this approach is a valuable contribution to improving evaluations in international development, I hope you’ll let me know, and endorse my application to join the Santa Fe Complexity Institute.