Until the Rosetta Stone was discovered in 1799, no one could read Ancient Egyptian Hieroglyphs. Because the inscription on this stone is identical in three languages, we were able to decode this ancient script.
By analogy, I am publishing a dictionary that allows us to understand what people on the receiving end of international aid really mean when they are given a chance to tell stories about how organizations have affected their lives. It works because the GlobalGiving Storytelling Project collected such a large sample of beneficiary feedback about every sort of community effort that we can reverse engineer what people mean in other contexts.
Building the word-tone dictionary
- Starting with the over 60,000 stories we’ve already collected from Kenyans and Ugandans about NGO work, I pulled a dictionary of 100,000 English words and queried the collection for stories that contained each word.
- Each story is associated with a series of mapping questions about what happened. Was it positive or negative? These outcome mapping questions allow me to associate specific words with specific outcomes on a range from positive to negative. For example, if everybody who tells a story about “measles” assigns the outcome to negative (the person wasn’t cured), the word “measles” would generally be a negative word in other NGO contexts.
- There are many kinds of positive and negative outcomes in the data already. We asked, “Who Benefited?” nobody? the wrong people? or the right people? Or “who did you feel about your story?” So even with just 100 stories that use a word, we often have several hundred data points averaged. As you would expect, most words are not strongly positive or negative.
- I then filtered out any word that wasn’t used in at least 100 stories, so that the remaining 1944 words (of 100,000) are pretty reliable as a reference dictionary. I will probably publish a larger dictionary with the remaining words if people ask for it in comments below.
- I then normalized the scores on a range from roughly -500 to +500, centered around zero by “turning up the gain” on the negative feedback. (Ask me how in comments in comments if you care). This step allows the data set to be used in other contexts, such as the import your own text analysis tool, where we don’t know whether stories had happy or sad ending.
This reference dictionary allows anyone to take a quick glance at the overall sentiment in any unstructured language from people who are affected by international development, based on how tens of thousands of people have used the same word previously.
Download your free word-tone dictionary
Click to download the 1944 word dictionary either as a CSV or a python pickled dictionary.
(Rename the files afterwards)
If you plot all the words in excel and their sentiment scores, they look like this:
If you turn that plot on its side, it will become a normal distribution, like this:
The corrected plot is centered around zero. That’s good. It means that these words are a good mix of positive and negative sentiments. I chose to adjust the raw positive-negative scores because there is such a huge positive-bias in all NGO feedback that it becomes ridiculous just how skewed the stories are, compared to how much peoples lives are really affected by these efforts.
If people really benefited as much as they say they are, we would have no poor people left.
Case in point: What word is outlier at the top of the chart?
Give up? This word has a positivity score of 10,125 after I corrected the data. The score of +10,125 is a measure of how consistently that word appears in positive success stories versus negative failure stories. A word with a score of zero is neutral, or used in stories with mixed positive-negative outcomes. And because only words used in at least 100 stories are used, these dots are not like to switch sides (or signs) if we repeated this experiment a different story collection from the aid world.
Still don’t know the mystery word?
Here is the answer:
The outlier is the word ‘organization.’ People are very eager to tell positive stories about organizations. Literally thousands of times more likely to be positive in stories with the word ‘organization’ than in stories that contain the words that fall along the zero line of the chart.
Previously, my other means of measuring positive bias in stories concluded that people tell 10 to 30 positive stories for every negative story, across all 60,000 stories. By this measure, the positive sentiment in stories that include the word ‘organization’ is even higher still. Positive bias is a real problem. But using the word dictionary I’ve published, you can find the negative sentiments within a sea of rosy feedback.
- The most negative words were “came” and “time.” As in, “one time these people came to our village…” That meta pattern is quite alarming. I just finished reading Bill Easterly’s “The Tyranny of Experts” yesterday, which is all about getting the outsiders to leave people alone and instead focus on advocating for the rights of poor people. This pattern is consistent with the failure of outsiders to come into a place on a “one time” basis and make any sort of lasting positive change.The implications of these outlier words should be a wake up call to the aid sector.
- People are more honest in Kibera and Uganda. Slum life in Kibera is poor, and people are ready to honestly talk about it. But across Uganda, people are almost as positive about everything as are people who talk about an “organization.”
- Narratives are positively biased in the development world, but there is no reason to believe that numbers are somehow free of this bias. In any kind of survey, when some asks the citizen, “how much money do you make?” or “how many kids do you have?” they get back wrong answers, and always wrong in the direction of what a person knows the organization wants to hear. There are documented examples of women under-reporting the number of kids they have to Millennium Development Goals surveyors in Uganda because they knew the “smaller families” was the outcome measure outsiders were looking for. Likewise, people lie about income and say they are poorer when money might have handed out, or overestimate their income in surveys from micro-loan foundations that use this “success metric” as the basis for granting them larger loans.
- You could throw up your hands and blame other people for lying, but I prefer to treat this as a symptom of the larger disease: Our programs are generally not making life better, and the only way to make life better is to play the game to get as much immediate aid as possible. No one has ever proven that putting money in a person’s pocket makes them poorer in the short-term. Yes, in the long term, they could become poorer, but poverty has a tendency to focus people on short term gains.
- The difficulty is that there are two kinds of positive stories – the ones where things really turned out great, and the ones where they are saying good things but they’re still not happy about the outcomes. This one method alone doesn’t do enough to tease out these two kinds of positive, but when combined with other lines of evidence, other structural aspects in the narratives, it is possible to tell the difference between authentic praise and manufactured praise. For example, check out my first attempt at this.
Quick scan of the most positive / most negative words
Most negative words
Most positive words