A Rosetta Stone is now available for NGO sentiment analysis

Until the Rosetta Stone was discovered in 1799, no one could read Ancient Egyptian Hieroglyphs. Because the inscription on this stone is identical in three languages, we were able to decode this ancient script.


By analogy, I am publishing a dictionary that allows us to understand what people on the receiving end of international aid really mean when they are given a chance to tell stories about how organizations have affected their lives. It works because the GlobalGiving Storytelling Project collected such a large sample of beneficiary feedback about every sort of community effort that we can reverse engineer what people mean in other contexts.

Building the word-tone dictionary

  1. Starting with the over 60,000 stories we’ve already collected from Kenyans and Ugandans about NGO work, I pulled a dictionary of 100,000 English words and queried the collection for stories that contained each word.
  2. Each story is associated with a series of mapping questions about what happened. Was it positive or negative? These outcome mapping questions allow me to associate specific words with specific outcomes on a range from positive to negative. For example, if everybody who tells a story about “measles” assigns the outcome to negative (the person wasn’t cured), the word “measles” would generally be a negative word in other NGO contexts.
  3. There are many kinds of positive and negative outcomes in the data already. We asked, “Who Benefited?” nobody? the wrong people? or the right people? Or “who did you feel about your story?” So even with just 100 stories that use a word, we often have several hundred data points averaged. As you would expect, most words are not strongly positive or negative.
  4. I then filtered out any word that wasn’t used in at least 100 stories, so that the remaining 1944 words (of 100,000) are pretty reliable as a reference dictionary. I will probably publish a larger dictionary with the remaining words if people ask for it in comments below.
  5. I then normalized the scores on a range from roughly -500 to +500, centered around zero by “turning up the gain” on the negative feedback. (Ask me how in comments in comments if you care). This step allows the data set to be used in other contexts, such as the import your own text analysis tool, where we don’t know whether stories had happy or sad ending.

This reference dictionary allows anyone to take a quick glance at the overall sentiment in any unstructured language from people who are affected by international development, based on how tens of thousands of people have used the same word previously.

Download your free word-tone dictionary

Click to download the 1944 word dictionary either as a CSV or a python pickled dictionary.







(Rename the files afterwards)


If you plot all the words in excel and their sentiment scores, they look like this:

normalized word distribution from tone dictionary

If you turn that plot on its side, it will become a normal distribution, like this:


The corrected plot is centered around zero. That’s good. It means that these words are a good mix of positive and negative sentiments. I chose to adjust the raw positive-negative scores because there is such a huge positive-bias in all NGO feedback that it becomes ridiculous just how skewed the stories are, compared to how much peoples lives are really affected by these efforts.

If people really benefited as much as they say they are, we would have no poor people left.

Case in point: What word is outlier at the top of the chart?

full scale normalized word distribution from tone dictionary

Give up? This word has a positivity score of 10,125 after I corrected the data. The score of +10,125  is a measure of how consistently that word appears in positive success stories versus negative failure stories. A word with a score of zero is neutral, or used in stories with mixed positive-negative outcomes. And because only words used in at least 100 stories are used, these dots are not like to switch sides (or signs) if we repeated this experiment a different story collection from the aid world.

Still don’t know the mystery word?

Here is the answer:

annotated full scale normalized word distribution from tone dictionary

The outlier is the word ‘organization.’ People are very eager to tell positive stories about organizations. Literally thousands of times more likely to be positive in stories with the word ‘organization’ than in stories that contain the words that fall along the zero line of the chart.

Previously, my other means of measuring positive bias in stories concluded that people tell 10 to 30 positive stories for every negative story, across all 60,000 stories. By this measure, the positive sentiment in stories that include the word ‘organization’ is even higher still. Positive bias is a real problem. But using the word dictionary I’ve published, you can find the negative sentiments within a sea of rosy feedback.

My interpretations:

  • The most negative words were “came” and “time.” As in, “one time these people came to our village…” That meta pattern is quite alarming. I just finished reading Bill Easterly’s “The Tyranny of Experts” yesterday, which is all about getting the outsiders to leave people alone and instead focus on advocating for the rights of poor people. This pattern is consistent with the failure of outsiders to come into a place on a “one time” basis and make any sort of lasting positive change.The implications of these outlier words should be a wake up call to the aid sector.
  • People are more honest in Kibera and Uganda. Slum life in Kibera is poor, and people are ready to honestly talk about it. But across Uganda, people are almost as positive about everything as are people who talk about an “organization.”
  • Narratives are positively biased in the development world, but there is no reason to believe that numbers are somehow free of this bias. In any kind of survey, when some asks the citizen, “how much money do you make?” or “how many kids do you have?” they get back wrong answers, and always wrong in the direction of what a person knows the organization wants to hear. There are documented examples of women under-reporting the number of kids they have to Millennium Development Goals surveyors in Uganda because they knew the “smaller families” was the outcome measure outsiders were looking for. Likewise, people lie about income and say they are poorer when money might have handed out, or overestimate their income in surveys from micro-loan foundations that use this “success metric” as the basis for granting them larger loans.
  • You could throw up your hands and blame other people for lying, but I prefer to treat this as a symptom of the larger disease: Our programs are generally not making life better, and the only way to make life better is to play the game to get as much immediate aid as possible. No one has ever proven that putting money in a person’s pocket makes them poorer in the short-term. Yes, in the long term, they could become poorer, but poverty has a tendency to focus people on short term gains.
  • The difficulty is that there are two kinds of positive stories – the ones where things really turned out great, and the ones where they are saying good things but they’re still not happy about the outcomes. This one method alone doesn’t do enough to tease out these two kinds of positive, but when combined with other lines of evidence, other structural aspects in the narratives, it is possible to tell the difference between authentic praise and manufactured praise. For example, check out my first attempt at this.

Quick scan of the most positive / most negative words

Most negative words

came -3189
kibera -2540
time -2406
two -2244
kenya -2094
take -1755
come -1755
really -1478
place -1380
long -1375
decided -1368
years -1339
person -1292
lack -1265
mother -1208
green -1206
father -1182
able -1138
man -1116
ago -1092
brought -1081
certain -1040
things -1029
bad -979
told -943
water -923
belt -908
saw -907
going -907
hard -902
bring -867
see -848
young -848
girl -839

Most positive words

organisation 10128
uganda 4192
provides 2774
providing 2652
children 2501
standards 2286
done 2194
development 2110
poor 2050
helps 1901
living 1899
support 1882
community 1808
orphans 1707
giving 1595
improve 1548
district 1446
aids 1409
agriculture 1373
health 1362
education 1349
farmers 1312
gives 1309
counselling 1293
mbarara 1284
provided 1274
given 1272
helping 1250
association 1245
materials 1241
women 1219
vision 1202
world 1190
role 1169
town 1113
seeds 1113
scholastic 1052
care 1038
save 1025

One thought on “A Rosetta Stone is now available for NGO sentiment analysis

  1. Hi Marc, thanks so much for developing this lexicon – I was wondering if other similar lexicons (NGO, development aid, ODA) have been developed? Do you know of organisations that have used your lexicon?

    Thanks again

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s