A Rosetta Stone is now available for NGO sentiment analysis

Until the Rosetta Stone was discovered in 1799, no one could read Ancient Egyptian Hieroglyphs. Because the inscription on this stone is identical in three languages, we were able to decode this ancient script.

rosetta_stone

By analogy, I am publishing a dictionary that allows us to understand what people on the receiving end of international aid really mean when they are given a chance to tell stories about how organizations have affected their lives. It works because the GlobalGiving Storytelling Project collected such a large sample of beneficiary feedback about every sort of community effort that we can reverse engineer what people mean in other contexts.

Building the word-tone dictionary

Starting with the over 60,000 stories we’ve already collected from Kenyans and Ugandans about NGO work, I pulled a dictionary of 100,000 English words and queried the collection for stories that contained each word.
Each story is associated with a series of mapping questions about what happened. Was it positive or negative? These outcome mapping questions allow me to associate specific words with specific outcomes on a range from positive to negative. For example, if everybody who tells a story about “measles” assigns the outcome to negative (the person wasn’t cured), the word “measles” would generally be a negative word in other NGO contexts.
There are many kinds of positive and negative outcomes in the data already. We asked, “Who Benefited?” nobody? the wrong people? or the right people? Or “who did you feel about your story?” So even with just 100 stories that use a word, we often have several hundred data points averaged. As you would expect, most words are not strongly positive or negative.
I then filtered out any word that wasn’t used in at least 100 stories, so that the remaining 1944 words (of 100,000) are pretty reliable as a reference dictionary. I will probably publish a larger dictionary with the remaining words if people ask for it in comments below.
I then normalized the scores on a range from roughly -500 to +500, centered around zero by “turning up the gain” on the negative feedback. (Ask me how in comments in comments if you care). This step allows the data set to be used in other contexts, such as the import your own text analysis tool, where we don’t know whether stories had happy or sad ending.

This reference dictionary allows anyone to take a quick glance at the overall sentiment in any unstructured language from people who are affected by international development, based on how tens of thousands of people have used the same word previously.

Download your free word-tone dictionary

Click to download the 1944 word dictionary either as a CSV or a python pickled dictionary.

(Rename the files afterwards)

Results

If you plot all the words in excel and their sentiment scores, they look like this:

normalized word distribution from tone dictionary

If you turn that plot on its side, it will become a normal distribution, like this:

normdist

The corrected plot is centered around zero. That’s good. It means that these words are a good mix of positive and negative sentiments. I chose to adjust the raw positive-negative scores because there is such a huge positive-bias in all NGO feedback that it becomes ridiculous just how skewed the stories are, compared to how much peoples lives are really affected by these efforts.

If people really benefited as much as they say they are, we would have no poor people left.

Case in point: What word is outlier at the top of the chart?

full scale normalized word distribution from tone dictionary

Give up? This word has a positivity score of 10,125 after I corrected the data. The score of +10,125 is a measure of how consistently that word appears in positive success stories versus negative failure stories. A word with a score of zero is neutral, or used in stories with mixed positive-negative outcomes. And because only words used in at least 100 stories are used, these dots are not like to switch sides (or signs) if we repeated this experiment a different story collection from the aid world.

Still don’t know the mystery word?

Here is the answer:

annotated full scale normalized word distribution from tone dictionary

The outlier is the word ‘organization.’ People are very eager to tell positive stories about organizations. Literally thousands of times more likely to be positive in stories with the word ‘organization’ than in stories that contain the words that fall along the zero line of the chart.

Previously, my other means of measuring positive bias in stories concluded that people tell 10 to 30 positive stories for every negative story, across all 60,000 stories. By this measure, the positive sentiment in stories that include the word ‘organization’ is even higher still. Positive bias is a real problem. But using the word dictionary I’ve published, you can find the negative sentiments within a sea of rosy feedback.

My interpretations:

The most negative words were “came” and “time.” As in, “one time these people came to our village…” That meta pattern is quite alarming. I just finished reading Bill Easterly’s “The Tyranny of Experts” yesterday, which is all about getting the outsiders to leave people alone and instead focus on advocating for the rights of poor people. This pattern is consistent with the failure of outsiders to come into a place on a “one time” basis and make any sort of lasting positive change.The implications of these outlier words should be a wake up call to the aid sector.
People are more honest in Kibera and Uganda. Slum life in Kibera is poor, and people are ready to honestly talk about it. But across Uganda, people are almost as positive about everything as are people who talk about an “organization.”
Narratives are positively biased in the development world, but there is no reason to believe that numbers are somehow free of this bias. In any kind of survey, when some asks the citizen, “how much money do you make?” or “how many kids do you have?” they get back wrong answers, and always wrong in the direction of what a person knows the organization wants to hear. There are documented examples of women under-reporting the number of kids they have to Millennium Development Goals surveyors in Uganda because they knew the “smaller families” was the outcome measure outsiders were looking for. Likewise, people lie about income and say they are poorer when money might have handed out, or overestimate their income in surveys from micro-loan foundations that use this “success metric” as the basis for granting them larger loans.
You could throw up your hands and blame other people for lying, but I prefer to treat this as a symptom of the larger disease: Our programs are generally not making life better, and the only way to make life better is to play the game to get as much immediate aid as possible. No one has ever proven that putting money in a person’s pocket makes them poorer in the short-term. Yes, in the long term, they could become poorer, but poverty has a tendency to focus people on short term gains.
The difficulty is that there are two kinds of positive stories – the ones where things really turned out great, and the ones where they are saying good things but they’re still not happy about the outcomes. This one method alone doesn’t do enough to tease out these two kinds of positive, but when combined with other lines of evidence, other structural aspects in the narratives, it is possible to tell the difference between authentic praise and manufactured praise. For example, check out my first attempt at this.

Quick scan of the most positive / most negative words

Most negative words

came	-3189
kibera	-2540
time	-2406
two	-2244
kenya	-2094
take	-1755
come	-1755
really	-1478
place	-1380
long	-1375
decided	-1368
years	-1339
person	-1292
lack	-1265
mother	-1208
green	-1206
father	-1182
able	-1138
man	-1116
ago	-1092
brought	-1081
certain	-1040
things	-1029
bad	-979
told	-943
water	-923
belt	-908
saw	-907
going	-907
hard	-902
bring	-867
see	-848
young	-848
girl	-839

Most positive words

organisation	10128
uganda	4192
provides	2774
providing	2652
children	2501
standards	2286
done	2194
development	2110
poor	2050
helps	1901
living	1899
support	1882
community	1808
orphans	1707
giving	1595
improve	1548
district	1446
aids	1409
agriculture	1373
health	1362
education	1349
farmers	1312
gives	1309
counselling	1293
mbarara	1284
provided	1274
given	1272
helping	1250
association	1245
materials	1241
women	1219
vision	1202
world	1190
role	1169
town	1113
seeds	1113
scholastic	1052
care	1038
save	1025

One thought on “A Rosetta Stone is now available for NGO sentiment analysis”

Joris Vandelanotte says:

February 13, 2019 at 6:30 am

Hi Marc, thanks so much for developing this lexicon – I was wondering if other similar lexicons (NGO, development aid, ODA) have been developed? Do you know of organisations that have used your lexicon?

Thanks again
Joris