Five Holy Books in five images

Since it is Holy Week, here are some rather intriguing visuals of the Quran and three competing perspectives on Jesus (The Canonical Gospels, Paul’s attributions, and The (non-canonical) Gospel of Thomas):

The whole Holy Quran as a wordle

whole quran wordle

The Gospel of Thomas
gospel thomas wordle

The Gospel of John

gospel john wordle

All sayings attributed to Jesus in Paul’s Letters

pauls letters - all sayings attributed to jesus - wordle

The Gospel of Mark gospel mark wordle

A while back I wrote a simple python script that would perform differential wordles (like I used in these two rape-prevention programs) but I lost it. If I rewrite it, you would be able to see an adjusted view of what these different stories emphasize about God, Allah, Jesus, etc.


Or you can read my series on how the Passion Narrative relates to international development:

One: Empire – and the hierarchy of aid power

Story-centered learning: Gather “big data” before hypothesis testing

Reblogged from my ThinkNPC guest post:

In the last half-century thousands of scientists have rigorously studied the causes and risk factors in heart disease, but a single longitudinal experiment has revealed more about this disease than any other approach.

In 1948, researchers began tracking health records from all participants in the town of Framingham, Massachusetts. This was an observational study; they did not formulate causal theories or test specific hypotheses, but simply let nature take its course and observed what happened.

In 1960, they found a link between smoking and heart disease. In 1961, they found a link with cholesterol. And in the coming decades, they also found correlations with obesity, exercise, high blood pressure, hypertension, stroke, diabetes—virtually everything that now matters to clinical treatment.

So why aren’t we in the philanthropy world copying this approach—observing what’s out there and looking for patterns over time?

As a neuroscientist, I have a confession to make. My type have been responsible for propagating a lie they still teach in schools, that scientists always devise a hypothesis and test it in controlled experiments. This is simply not true. The human genome project mapped 3 billion base pairs before understanding what variation in the genetic code meant. human-genome

The drugs you take were “discovered” in massive drug discovery libraries using a screening process that quickly conducts millions of tests, rather than hypothesizing. 

My point is that complex problems cannot be understood from a pre-defined framework; what matters emerges most efficiently from open-ended data collection that is later organised and then studied.

We already create more information every two days than existed in the first two millennia of human civilization, and this pace is accelerating. However, the rate with which we convert all this “information” into useful “knowledge” is slowing down.

all-story-topics-2011It was with this problem in mind that we started the GlobalGiving Storytelling project. We needed to dissociate two requirements: to collect rich information about development in a flexible, easily re-structurable way, and to turn these stories into data so we can interpret and contextualize what we see. We’ve come up with a survey design tool which you can use to do a custom evaluation and compare your results to stories told by others, with the overall aim of helping everyone share knowledge and improve project design. The  approach will save you time but it will also enable you to get more back than you could ever put in.

So why do we use storytelling, you wonder? It turns out that managing this process with metrics, indicators, spreadsheets, and a numbers-only mindset is far more difficult and time-consuming. Narratives and a few survey questions are sufficient to see common patterns emerge from many perspectives.

Continue reading on ThinkNPC

Marc Maxson is an innovation consultant with Globalgiving, where he manages their global storytelling project. Previously, he worked as a PhD Neuroscientist and did Fulbright research on the impact of the internet on rural education in West Africa. He writes about evolution and international development at

When toys tell stories

I first learned about GoldieBlox from their superbowl ad, where they aggressively combat the toy industry’s stupid assumptions about what girls like (It’s not just about making it pink and putting a pony tail on it).

They are on a mission:

Only 13% of engineers are women and they believe that women innovators are our greatest untapped resource. 

They have a theory of change:

We inspire girls during a critical period, between age 6 and 13, and allow them to realize for themselves that building, creating, and owning their own ideas is what it means to be a girl.

Their latest ad campaign continues their message more thoughtfully:

(Note that begins as a parody of a 1980s anti-drug commercial, and so their ads are also targeting parents)

How is GoldieBlox “for” girls? (From their website)

Our founder, Debbie, spent a year researching gender differences to develop a construction toy that went deeper than just “making it pink” to appeal to girls. She read countless articles on the female brain, cognitive development and children’s play patterns. She interviewed parents, educators, neuroscientists and STEM experts. Most importantly, she played with hundreds of kids. Her big “aha”? Girls have strong verbal skills. They love stories and characters. They aren’t as interested in building for the sake of building; they want to know why. GoldieBlox stories replace the 1-2-3 instruction manual and provide narrative-based building, centered around a role model character who solves problems by building machines. Goldie’s stories relate to girls’ lives, have a sense of humor and make engineering fun.

That was an “aha!” statement for me. “Finally, something I can sink my teeth into!” I thought. So building blocks can be thought of as a storytelling tool, like the magic cards I made earlier. I know about character driven stories, and putting conflict into scenes to move it along and draw in the audience.

And in a way, GoldieBlox is using a conflict narrative to draw in their audience – girls. What a brilliant way to get girls on board, by reminding them from age 6 onwards that playing with these toys is an act of defiance against gender stereotypes.

And another company, play-i, offers a complementary approach to the same goal, for a younger audience:

I just wished they had similar toys for the teenage crowd? What will these Goldie girls do when they outgrow their blocks? Perhaps this?





A good proxy indicator for organizational learning culture

A recent Huffington Post article brought an interesting tool to my colleague Nick’s attention. Collusion helps you spy on the companies that are colluding to spy on you as you surf the internet. For example, every time you check the weather all of these sites are informed about you:


A list of websites that receive information from are shown on the left. About half are red and crossed out because collusion (this chrome plugin) blocked their access.

As you browse, collusion creates a network map showing how the different sites you visit talk to each other. You can hover over any node in the network to see a site’s connections and automatically block the transmission of data to known tracking sites like Google ad services,, etc. As you sift through your browsing’s connections, it quickly becomes clear that not all sites are created equal when it comes to tracking your metadata.

Our insight was that this tool could serve another purpose. You see, Nick and I are responsible for building up GlobalGiving’s database on organizational behavior and curiousity. This is used to measure each organization’s performance in a real-time, comprehensive way. If we could sort all organizations in the world into “good” and “bad” groups based on their habits, such as being responsive to the community they serve, demonstrating a tendency to learn from mistakes and remember what they’ve tried before (knowledge management), or their making effective use of free performance tools in their daily work (agility), we could help more money reach better NGOs, and ultimately improve more lives with the same amount of resources.

This is the same as saying “we’re going to make the whole aid world more efficient,” but when we say it, we mean it – because we have a way to do what we say. In the “big data” era, information will be used to make thousands of little evidence-based decisions that will improve the system overall.

But on to specifics. What do organizations’ websites reveal about their agility? A lot.

Look at these organization websites:

Each of these have hundred-million-dollar budgets. So how much effort to they make to optimize learning about visitors to their homepages?









I see a correlation between how much the organization focuses on public donations (versus government or private support) and whether they use free analysis software, such as google analytics. Of the ten organizations shown above (which are close to a top ten list of worldwide organizations by size) only Save the Children, Care, and World Vision made a serious effort to learn from their website traffic. Five our of ten at least have some kind of basic (free) analytics (google-analytics and/or google tag manager).

For the other half that do not, it is telling. These organizations don’t really need public support to survive, and are also (in my opinion) less accountable to community feedback because they are “too big to fail” in the aid world:

  • World Bank
  • BRAC
  • MSF
  • United Way
  • Heifer International

Types of 3rd party data collection sites

Analysis (curiosity)

  • GoogleTagManager
  • kissmetrics
  • vmmpxl – quantcast web traffic demographics
  • mxpnl — mixpanel is like google analytics, but you pay for it and it offers more features

Visualization or dissemination

  • mapbox
  • openlayers


  • anything in red (advertising)
  • youtube

Faster web loading and cloud data 

  • amazonws
  • visualwebsiteoptimizer
  • rackcdn — rackspace cloud storage

Social Media Plugins

  • twimg — twitter
  • facebook

Design iteration and testing (curiosity)

  • optimizely
  • omniture

For comparison, I took snapshots of GlobalGiving and various other online giving marketplaces or organizations we partner with:


agile-betterplace-org agile-razooagile-great-nonprofits agile-give-directly


Clearly, all of these organizations take their web traffic seriously. Each of GlobalGiving, DonorsChoose, Kiva, BetterPlace, and Razoo uses at least one analytics tool, one cloud hosting tool to speed up website load times, and many use an iterative design and testing tool like optimizely.

The surprise here is that GiveDirectly (the recent darling of the aid world and the media world) does nothing to learn about their traffic. It makes me question how much of a learning focus their organization has internally.

And that is what this is all about. I believe that organizations stamp an imprint of their internal learning on their external websites.

Curious, learning, experimenting organizations use web-based tools that help them achieve their goals (and leave a trace for us to track).

Large bureaucratic “stick-in-the-mud” organizations do not use any of these tools, leave no trace of their learning, and thus are probably not focused on learning.

Web footprints for a few randomly chosen GlobalGiving partner orgs

These organizations are much smaller than the ones listed above, but they still use more learning tools than even the world bank or BRAC uses, ergo they are probably learning more with fewer resources in my assessment:









Five out of seven local GlobalGiving partner organizations use google analytics. 

That’s a small sample, but a larger fraction of the group are still using more tools to learn about web traffic than the million dollar orgs.

These are just screen shots to show that there is useful data out there. Once you realize that the tools exist to ask old questions in a new (and more efficient) way, you simply need to write a little code to gather all the information. This will be my take home message at the Georgetown University master’s program class I’m teaching this week:

Graduate School should help you learn how to ask better questions and to recognize when the status quo of information is insufficient to fix the problem.

We live in a world that clings to the “myth of evidence”‘: We think our leaders make decisions based on weighing evidence, but they do not. They never have. Throughout history they have made instead made experience-based decisions, limited by their own wisdom and prior failures. This is about to change.

Decisions used to be made using tiny scraps of information, because that is all that was available. But this decade is the turning point when evidence becomes cheaper to aggregate and interpret than the cost of making decisions without it. Some giants will fall and others will rise to take their places, all because they understand the new calculus of “big data.”

And when the dust clears, a new kind of democracy will be possible* where in the past is was merely theoretical: policy decisions will reflect all peoples’ opinions where choices are a matter of preference, or based on sound science and observing human behavior on a macro scale (like Isaac Asimov’s psychohistory idea) where policy depends on truth rather than preference.

(though this kind of democracy will be made possible, it will almost certainly be tried somewhere outside of North America or Europe first. My guess: somewhere in the middle east where people want real democracy)

How organizations are adopting the storytelling method to their local context

I am frequently asked for specific examples of how an organization can adopt the storytelling method to its specific programs. Here are case studies from my recent visits to UK-based organizations that are on the verge of implementing listening projects to evaluate their programs.

Case #1

For decades this organization has sought to bring together peoples and foster cultural understanding. The impact of their programs focuses on bridging social gaps, exposing people to different cultures, and changing attitudes and perceptions about the “other.” But instead of using a blunt survey that might ask, “how do you people about the other?” they arrived at this:
Share an experience where you had to work with someone different from yourself. 
This question will add context to the all-purpose story prompting question that we encourage all organizations to use:
Talk about a time when a person or organization tried to help someone or change something in your community.
So if you put them together, respondents will share a “community effort” story with a focus on their personal experience of working with someone different.
Out of this, they hope to gleam insights about the way that attitudes and behaviors are changing. They will ask internal “beneficiary” and external “community” people to both share stories for comparative analysis. They will ask each person to share two stories; one will focus on the difficulty of working with the “other” and the other story will be more open ended, about any meaningful community effort:
Who: They have a network of a dozen “alumni” that they will train as scribes. Then they plan to bring on groups in Syracuse, NY, Los Angeles, Indonesia, and Gaza.

Case #2

This organization helps thousands of teens in the big city. They measure impact as improved self-confidence, educational attainment, and long-term community involvement. Their programs help young people get “back on track” and help them find fulfilling careers. Though they manage dozens of community programs for youth, their storytelling question adds this flavor:
In your community effort story, talk about an event that personally changed you in some way.
They currently use a 24-question “life effectiveness questionnaire” that was validated by an academic expert [pdf]:
Time Management The extent that an individual makes optimum use of time.
Social Competence The degree of personal confidence and self-perceived ability in social interactions.
Achievement Motivation The extent to which the individual is motivated to achieve excellence and put the required effort into action to attain it.
Intellectual Flexibility The extent to which the individual adapts his/her thinking and accommodates new information from changing conditions and different perspectives.
Task Leadership The extent to which the individual leads other people effectively when a task needs to be done and productivity is the primary requirement.
Emotional Control The extent to which the individual maintains emotional control when faced with potentially stressful situations.
Active Initiative The extent to which the individual initiates action in new situations.
Self Confidence The degree of confidence the individual has in his/her abilities and the success of his/her actions
Clearly, the standard approach is rigorous and defensible, because it has been used in over 20 studies, but it isn’t very flexible. It prescribes the factors to be measured and then uses a cumbersome approach to measure things in a not-so-fun way. Our storytelling form will be front and back of a single sheet of paper and takes just a few minutes to complete, with most of that time devoted to a personal narrative.
I’m most excited that in one of their three programs, they will test an approach we’re borrowing from the book, “The Secret Life of Pronouns” by James Pennebaker. This program pairs youth with older volunteers and they work together to revitalize the neighborhood. At regular intervals, these pairs will interview each other in the storytelling/listening project. Later, we will compare these pairs as conversations and look for language mirroring. 
Mirroring is a measure of engagement. In this context, when young and old start to adopt the other’s way of speaking in their stories, we infer that they are building a relationship with some intimacy:
conversation langauge mirroring
Even without this mirroring measure, the broader 2-question approach is more likely to reveal community needs than the narrower life effectiveness questionnaire.

Case #3

This organization works with disabled youth, providing them with opportunities to do something wonderful, like the Make A Wish foundation. After some debate, they settled on adding this context to the storytelling question:
Talk about a childhood experience where you were able to do something you never thought you could have done.

They can use this with four different populations they serve: children, parents, donors (to build empathy), and volunteers/public/schools. This is an exciting aspect because unlike other evaluation frameworks, they gain a deeper understanding of what kind of difference they are making in the life of a disabled child through the many others that are effected by this child’s experience.

“Our impact is much more than mere ‘fun’,” the director said. “Providing the inspiration to achieve more is what our events are all about.”

To that end, they are excited that one of the benchmarking follow-up questions in our design is:

“What would have made a difference in this story?”

That allows them to learn how to expand and refine their programs in an open-ended way. Asking this question of four groups will refine their messaging and grant writing, as well as improve their programs and build relationships with the volunteer network they will need to sustain this listening project.

Case #4

This organization will bring storytelling to the 30 schools where they do life skills training. They define success in much the same way Case #2 does. They want to use the open-ended storytelling question to look at how youth define the soft skills they receive, as well as build up an evidence base of the needs that these children have.

They expect it will be very difficult to get children to participate. I suggested that they engage teachers by offering to share the learning that emerges from stories with them. Teachers would probably like to know what their students think about, and this storytelling project offers them a lens into that. They may also explore a young-old mentoring program with the conversation mirroring approach.

Case #5

This organization runs a network of business startup incubators around the world. And while they would like to eventually find a common framework for measuring the impact everywhere, they planned to start with the local hubs.

They plan to ask business leaders and aspiring entrepreneurs to share two stories. One will be about “any community effort” they know/care about, and the other is their own community effort:

Talk about your journey of trying to start a business.
Through this journey narrative, they hope to see what elements define success and failure in an open-ended way. Perhaps their first 100 stories won’t reveal much, but they will have a benchmark over 1250 stories from East Africa about people trying to start a business there. As they grow their narrative collection, they’ll also be forced to build up relationships with people outside their narrow pool of incubator companies. As all of these companies are based on delivering some social benefit to society, the broader “community effort” stories will necessarily be a useful business intelligence database for future aspiring entrepreneurs to mine for ideas.
They were worried they wouldn’t find volunteers who wanted to interview these entrepreneurs. The next day I heard the friend I was staying with complain that no clubs offered him a way to meet like-minded people who are trying to start their own business. My friend tried starting three businesses in Kenya over the years, so I connected him with this organization and suggested they advertise a “meet up” to find more of these kinds of people.
To in effect, the evaluation scheme forces the organization to build up relationships with the community. That is what should be happening – evaluation improves design.

Case #6

This organization helps a half million volunteers find places to work. They too decided to pilot this storytelling with older volunteers. They use volunteering to improve life quality for the elderly and reduce social isolation. They added this context to the storytelling prompt:

Talk about an event that happened long ago and how that affects your life today.

By mining these narratives for emotion words they can quantify reduced social isolation. Isolated people use pronouns and articles differently than highly socialized people. By collecting stories monthly, they can plot the “journey” and look for trends across their volunteers, regardless of what else is talked about.

Topically, these stories will reveal life-transformation events that can be useful for designing future programs.

“And the requirement that we drop off and pick up story forms monthly will give our project managers an excuse to get out and visit these places,” the head person said happily.

They also have hundreds of narratives that they plan to import into our system and explore for more meaning.

One data system with many frameworks

I believe this is a real step forward in fixing our approach to impact evaluation. Instead of 6 organizations with 6 different ways to measure their “impact” we have 6 approaches that share a common back-end data collection system. Each of these organizations must collect as many open-ended narratives as they will of the more constrained questions outlined here.

They will have benchmarking. Even among the constrained questions, we see that there are some likely clusters for comparison:

storytelling context map

That is the beginning of a storytelling context map. With just five organizations, we see that three will likely have some overlap with each other’s themes, and the remaining two have reasonable overlap with similar stories from our existing collection of 57,000.

As dozens of organizations try this out, we may find that evaluation frameworks emerge from the choices that individual organizations make as they take their specific objectives up to a higher level of abstraction. The essential trick is to flip the design by not asking for exactly what you want to know, but to ask communities to react thoughtfully to the core elements of what define our struggles to be more human to each other.

Already proven to work: From Gay Rights to Marriage Equality

Today I attended a talk at NTEN titled “How RED changed everything [for marriage equality]“. For a generation, the gay rights movement lost every ballot referendum that they poured money into fighting. After 30 straight losses, they decided that their messaging wasn’t working. (Yes, 0:30 seems obvious in retrospect, but the nonprofit/advocacy world is very afraid to admit failure). They hired a media company and started focus grouping with straight people who opposed gay marriage.

They eventually got to the heart of the matter:

Tell us why you got married?

Straight people described how they fell in love. But when these people talked about gay marriage, they perceived the issue to be exactly what decades of pro-gay messaging had told them: They thought gay people wanted to be married for the legal benefits, or for tax breaks, or to prove that their lifestyle was acceptable because the government condoned it.

The movement took a hard look at their own messages. They started featuring actual gay people in their ads (instead of judges and legal experts). They told stories. The focused on families and love. And they flipped the public from being 60% opposed to 60% in support in just 5 years. I’m going to take this approach to my local church, which is trying to do the same for voter suppression in North Carolina this year.

This is an example of the power of storytelling. When the prompting question is broad enough to allow surprises to emerge, an idea that begins as “gay rights” becomes a story of “marriage equality.” Reframing an idea starts by asking the people whose mindset and behavior you want to change to speak openly about it. As much as possible, our job is to is to listen.

marriage equality emergence

Follow this thread: Examples of story analysis

Information is not knowledge

Shannon Information Theory defines information in a specific way: Information is the amount of “surprise” in communications. If I gave you a print out of this blog post, covered up part of a word in it, and asked you to predict the word after showing just the first two letters…


You might answer therapist, but you’re more likely to answer


That is a very common word, and easily predictable. Hence, the “the” in this post doesn’t carry much information. Certainly a lot less than the word “Theroux” – who might mean a specific person, like Novelist Paul Theroux.

The most information dense communication would be  string of random characters. You cannot predict the next character from the previous one. But practically speaking, a bunch of random letters are meaningless.

One reason why the storytelling project can better inform the world is because it allows more information to flow from communities, and provides a better way to filter out the noise and help people find the knowledge in all that information. Instead of this:


It allows this:


Normally, too much information is a problem. Evaluators design narrow, specific surveys with tightly defined questions because they want the most knowledge to come out of the least information entered in. They seek to achieve a 1:1 information:knowledge conversion. The top diagram represents the way evaluators collect information with community surveys.

But if you have better filtering tools, you can instead maximize the information flow and rely on better filters to control what pieces of this information is meaningful. You can tolerate noise. You can fetch only the knowledge you need from a ton of information. But the next person with a different need can also retrieve the knowledge he needs. Google search does for the web, and the framingham heart study did this for medical risk factors. So why hasn’t anyone succeeded in doing this for poverty and social problems?

This would allow us to learn without starting over each time. Suddenly one set of information has two uses, and eventually hundreds of users – all because the information “firehose” was opened and the filtering was good.

This is smarter design. Maximum information input plus reasonably good filtering yields more knowledge to more people.

I encourage you to go back and read examples I posted on the knowledge we’ve been able to extract from stories with good relevance filtering.

Using big data to infer how people would’ve answered

I recently wrote an algorithm that would use the answers from 57,000 stories to predict what three topics people might choose for a story with similar words in it.

How does it work?

People tell a lot of stories, and the words they use are correlated with the topics they choose. So if the correlation is strong enough, a computer algorithm can correctly “guess” the topic the person would have chosen. The guess is based on (1) generating a dictionary of words and their frequency of use in stories a human has assigned to one of ten topics then (2) scoring a test story by adding up the relevance of each word in that story to the topic, based on that topic dictionary.

The rigorous way to do this is set aside 10-20% of the data to test the algorithm and use the rest to “train” it, then run the algorithm on the test set to estimate how likely it will be to choose the correct topic from among these 10 choices:

topic question from story form

I was surprised to see that the reliability of this approach depends on which topic you mean:

Fetched 19343 records, 1 fields, with 8010659 characters. Conn: Closed food
Fetched 15743 records, 1 fields, with 6517898 characters. Conn: Closed sec
Fetched 22009 records, 1 fields, with 9192587 characters. Conn: Closed fam
Fetched 19246 records, 1 fields, with 8186335 characters. Conn: Closed fre
Fetched 24335 records, 1 fields, with 10342187 characters. Conn: Closed phy
Fetched 30365 records, 1 fields, with 12079326 characters. Conn: Closed know
Fetched 16717 records, 1 fields, with 6985293 characters. Conn: Closed self
Fetched 8678 records, 1 fields, with 3274556 characters. Conn: Closed resp
Fetched 14378 records, 1 fields, with 5838550 characters. Conn: Closed cre
Fetched 5633 records, 1 fields, with 2050559 characters. Conn: Closed fun

Accuracy rates (percent match between the algorithm and what people choose)
{'kno': 95.7, 
'fre': 6.2, 
'res': 67.5, 
'cre': 16.8, 
'phy': 85.5, 
'sec': 2.8, 
'fam': 47.2,
'fun': 67.2, 
'slf': 0.4, 
'foo': 6.1}

That means that I can accurately predict stories about “knowledge” 96% of the time, but only 2.8% correct for “security” stories. Correlation with number of stories tagged with a topic is low. Fun is a seldom used topic, but matches with 67% accuracy;  self-esteem is 0nly 0.4% accurate, but tagged in 3X the number of stories that fun was.

Next I thought, “maybe the most common words in each reference dictionary are too similar among all 10 topics.” I noticed the top words are similar in many of the 10 topics. Words like ‘school’, ‘organization’, and ‘community’ are present in all stories, and so offer no differentiating ability. I should remove them.

creativity [('organization', 5200.40795559667), ('school', 3543.777062566668), ('community',
3248.152150989258), ('child', 2862.176422375521), ('day', 1558.2406172604306), ('helped',
1528.985994397759), ('village', 1518.7758112094393), ('area', 1459.6429306441655), ('organisation',
1431.8204927035933), ('aid', 1339.7938144329896),...]

security [('organisation', 1797.8478738427743), ('helped', 839.5979011322839), ('hiv',
758.4263051629651), ('pupil', 757.4545341769011), ('school', 749.9667855960569), ('month',
731.5803097814555), ('provides', 578.9633375474084), ("i'am", 544.4925373134329), ('child',
522.5630079912575), ('business', 519.3864168618267), ('standard', 480.0096525096525), ('money',
464.8109119558795), ('aid', 460.0247422680413), ('just', 455.86785009861933), ('happy',
422.7314842729374), ('mzesa', 405.6), ('thanks', 402.25015556938394), ('gulu', 395.3125763125763), ...]

knowledge [('child', 36082.86353391162), ('school', 33818.189588161),
('community', 32907.04868545692), ('helped', 32814.49078786444),
('organisation', 32659.84693237094), ('group', 18383.11439114391), ('life', 16962.78238448316), ('woman', 14369.672232361278), ('money', 14049.368721686034),
('good',13293.343451864701), ('youth', 13202.397977609246), ('food', 12707.99451382372), ('living',
12395.504079003864), ('poor', 12331.987821235045), ('parent', 11596.92557475659),
('education', 11534.22393346681), ('aid', 11186.01649484536), ...]

When you exclude all words that are in the 60th percentile of frequency or above, you get the opposite pattern for accuracy:

{'kno': 0.8,
'fre': 51.8,
'res': 2.6,
'cre': 24.4,
'phy': 0.8,
'sec': 79.8,
'fam': 2.9,
'fun': 3.3,
'slf': 97.3,
'foo': 59.1}

pythonWell that won’t do either. So I decided I needed to get serious. Oddly in python, that means writing a whopping five more lines of code instead of just the usual single line of code to do something amazing like “take all the words in all dictionaries and drop the words that are present at the 60th percentile or greater.”

Python code typically looks like this:

    def inall(key,topic_dicts):
        # returns True/False if a key is present in all the dicts of topic_dicts
        in_all = 0
        for k,v in topic_dicts.items():
            if key in v:
                in_all += 1
        if len(topic_dicts) == in_all: #if every dictionary has the word, these will match.
            return True
            return False
        alt_topic_dicts[k] = {x:y for x,y in v.items() if inall(x,topic_dicts) == False}

On my third try, I decided to exclude any words that are present in all 10 topics from each of the 10 respective topic (word:frequency) dictionaries. It took 75 seconds to rerun all the analysis, and the accuracy was much better:

{'kno': 86.7,
'fre': 83.8,
'res': 73.1,
'cre': 57.3,
'phy': 64.8,
'sec': 58.9,
'fam': 62.4,
'fun': 43.2,
'slf': 61.1,
'foo': 60.8}

So with the exception of stories with the topic “fun,” I can use this simple algorithm to predict the topic of a story (from a list of ten topics representing the hierarchy of human needs) correctly over 50% of the time.  The probability of randomly picking the right topic would be one in ten — 10% success — so I’m quite happy with this result.

But is 65% accuracy (on average) “good”?

In 2009 we ran this experiment with humans. This what what storytellers chose:

What people talked about in stories from Kenya

And this is what human “experts” predicted:


When we asked 65 aid experts to pick the top 6 out of 12 topics in that survey question, and rank-order them, only one of out 65 got #1 correct! And later, he admitted in email that he just guessed. Overall, people performed worse than chance (8%) at this task, because they were biased by what they thought the main topics would be for everyone.

So in that context, this algorithm does surprisingly well, and much better than humans for this specific task.

By another measure, in the sense of Shannon Information Theory, it provides 3X to 6X more information than we would have about the story had we not included this new “meta data.” The exact number is tricky to calculate (at 3am) because storytellers were asked to choose 3 of 10 topics on the form and if the algorithm’s #1 choice is in the top 3, then I count that as a hit. A rigorous result would only count cases where all three topics matched the human’s choice as correct. That’s a bit more involved that what I care about. This does bring up an interesting point about surveys. Most questions only allow for one right answer on forms, and we required 3 of 10 answers. It makes it easier for the algorithm to “learn” how to be mostly right because each story has multiple topics that overlap. Good to think about doing this on more surveys in the future Big Data Era.

The Big Idea Behind Big Data

This topic prediction approach works because of some very simple math and a huge, rather complete amount of empirical data (57,000 stories about the types of things people talk about when they describe community efforts in East Africa). International Development suffers from having the smallest and most disconnected data systems on Earth. This is a rather large training data set, where poverty is concerned. But once you have this, you can do a lot more with it – such as categorize future narratives along a hierarchy of needs with about 65% accuracy – without having to collect more data and waste more peoples’ time.

Learning can happen faster.

People can take action quicker.

It’s not a replacement for listening, but it can aid our understanding.

And importantly, this approach can work with other questions that we included in our survey.

Read more: The future of big data is quasi-unstructured

Which was quoted in this wired blog: The growing importance of natural language processing

This is the kind of thing described in the book, “The Secret Life of Pronouns.”


Predicting GlobalGiving Project Report Topics

I extended this test by applying the ten topic dictionaries to a totally new set of narratives: 24,392 project reports on GlobalGiving from 2006-2013. All of these are about real project work, though the words people use are different. According to these topic dictionaries, the breakdown of topics among the GlobalGiving project reports is as follows:

Sum of top three assign topics:

{'knowledge': 24045,
'freedom': 33,
'respect': 19467,
'creativity': 181,
'physical needs': 18675,
'security': 24,
'family': 1035,
'fun': 9689,
'self-esteem': 10,
'food & shelter': 17}

Clearly, this method does not assign topics to updates in the same proportion that people assigned these topics to their stories. This could be because the narrative words are quite different for the subjects that are underrepresented. These scores are both a measure of how similar the language (words) are between reports and stories on a topic, as well as a measure of how many report contain these topics.

Organizations probably use very different language to describe security, freedom, self-esteem, and food-shelter projects on GlobalGiving from the way people talk about them in stories.

Knowledge (education) and physical needs are described similarly in both places.

Respect is overrepresented in project-speak. There is no corresponding project theme on GlobalGiving, although “women” and “children” projects are the largest category on the site.

Food & shelter is described in terms of disaster relief on GlobalGiving, but appears more in the context of poverty in stories.

Freedom in stories maps to human rights and democracy projects on GlobalGiving.

Coherence between story role and predicted story point of view based on pronoun use

In general, people use “I” and “me” in stories where they were affected or played an active part. And they use less personal pronouns in observer stories:

Fetched 39714 records, 1 fields, with 15281132 characters.
'Saw it happen','Heard about it happening'
third plural 39.1%
first plural 20.4%
third singular 17.5%
fourth 17.1%
first singular 5.9%
Fetched 13346 records, 1 fields, with 6136291 characters.
'Was affected by what happened'
first singular 29.3%
first plural 24.1%
third plural 22.9%
third singular 12.5%
fourth 11.2%
Fetched 7756 records, 1 fields, with 3468508 characters.
'Helped make it happen'
third plural 26.4%
first plural 23.7%
first singular 18.8%
third singular 17.3%
fourth 13.8%

“Fourth POV” is my short hand for when stories contain more organization words than pronouns. They are impersonal and lacking in details. More like press releases. But luckily, not too common overall.

This analysis continues elsewhere: It turns out, teller a story from a different point of view can make a project report more compelling, leading to more donations.

Examples of meta stories from narrative analysis

In my previous post, Narrative analysis with benchmarking, I explained how you can search and filter among tens of thousands of stories in the GlobalGiving Storytelling project in a few steps:

story-exploring-babyblueMy hope is that by making it easy to explore the rich data we already have, we encourage project leaders, community activists, entrepreneurs, researchers, and other curious globally-minded people to think about our world, and continuously refine their ideas:

story-exploring-curiosityThis behavior is the essence of a knowledge feedback loop; you learn things that help you, so you keep trying to learn more. As the diagram also shows, the tool requires your own curiosity, ideas, and sweat to work.

As a tool builder, I can help by creating a simpler interface and the means to manage knowledge. I have started baking in controls that hide data when the quality is poor, so that you can trust what you see. Future upgrades will allow users to import any data set from a spreadsheet (CSV), google spreadsheet, or RSS. And more advanced statistics are coming. The system is already extensible, for those who are thoughtful and creative in how they filter stories.

But this tool will only change projects — and improve lives — when the people who use it are free to work within their organizations in a true idea-experiment-cycle:


As of today, I’m am happy to announce that everything one needs for the analysis and experiment parts of the loop is available online, for free, and has been extensively tested:


We’re just looking for the missing ingredients – thoughtfulness and curiosity – that only you can provide. If you work for an organization, you should sign up for a training program that will not only help you “jump tracks” onto the innovation-cycle one shown here, but might also help you win more funding grants:

Apply Apply to the storytelling-grantwriting programme

Owen Barder recently wrote an essay which underscores the importance of putting more tools like these into the hands of those who will change the world, because “solutions” cannot be directly copied. They must be reinvented for each local context:

owen-barder-twitterWhere it is not possible to replicate success directly, it may be possible to support systems to enable them evolve more rapidly and more surely towards the desired goals. – Owen Barder

“Evolve” is precisely the right word, as I’ve explained previously. If you’re looking for ways to boost the rate that your organization learns, you may find these next illustrations inspiring.

This approach is about applying simple rules to semi-structured content, with complex consequences. The compare tool allows you to search for two collections of stories. You “build” a collection by choosing which answers to questions matter to you, and which words in stories people share are relevant to your idea.



Example: Female Circumcision vs Female Genital Mutilation FGM

There is an organization in Kisii, Kenya that rescues girls from families and gives them a home in a boarding school, so that they can escape female genital mutilation (FGM). The language they use is very different from the language that Kisii tribe members use to describe the same thing.

On the left: stories about “female circumcision” excluding the word “hiv.”(male circumcision has been shown to reduce HIV infection rates, so I’ve excluded those stories)

On the right: “genital mutilation” or FGM:


The size of the people represent the proportion of stories that come from those demographic groups. The color (red-yellow-green) represents how negative or positive stories were compared to what we expected (based on all stories collected). This is how you read it:

reading demographics icons - and school

The teenage boy icon is larger because they are more likely to talk about “female circumcision”; the teen woman icon is smaller because they are less likely. No girl icon appears on the left because no girls used those words at all. Some girls did talk about FGM, and these icons are red because these stories are associated with negative emotions. (We asked them how they felt about the story they told and they checked the box for a negative emotion):


Upon merging (dividing left by the right), you see that women and men have very different perspectives on the issue.


Men are very positive about “circumcision” whereas women are negative. The women icon is smaller, because women are more likely to talk about FGM and not “cicumcision.” And looking back at the demographics that were merged, men were less likely to talk about this than some other topic. It seems to be unimportant to older men and women, and more important to younger people.

How is this useful? If you are Kakenya’s Dream, you could use this in a grant to underscore just how divided the community is about FGM. It would also be fair to include narratives from those who vehemently oppose your work, so you can talk about your efforts to reconcile “tradition” with the rights of women.

While I’m at it, why don’t we just broaden our search and see how people talk about those two ideas? I’ve searched for two new collections. On the top, stories with  (women and rights) or (FGM or mutilation). And on the bottom: (tribal tradition ethnic kisii) and “practice”:

women-rights vs traditional practices

Explore narratives with bubble plots

The bubble plot tool puts all the words that get used “enough” into bubbles and sorts them up or down, depending on how often they tend to appear in either the top or the bottom collection. Words more likely to appear in the overall 57,220 stories are excluded. Common two-word phrases (“human rights”) also appear and gobble up the individual words (“human” and “rights”).

Bubbles are more like the tea leaves of understanding people and cultures. Sometimes the patterns are meaningless, and sometimes they offer deep insights. It is up to the reader to decide what to focus on, but the basic computer filtering ensures that anything you see appeared in a good portion of the stories. When you understand what you are looking at with the basic bubble plots, click the [CUSTOMIZE] button next to the bubble button and change how it calculates and displays patterns. Like with the compare tool, I tried to hide the full barrage of options. This is a new way to interact with data, and it may take an hour of playing iteratively before you can really get the most out of it.

So what do these bubble say about women’s rights vs traditional, tribal practices? Well for starters, women’s rights are human rights (to those who talk about it) and female circumcision is NOT a part of it. Also – the other side is very concerned about HIV/AIDS, and stories about “practices” include specific mention of “old men,” “early marriages,” and “young girls.” So I would venture to say that any successful program needs to deal with the practice of old men marrying young girls under the guise of “tradition” head on to be effective. Rescuing girls from homes may not do much to improve their lives if they later return to the village and are forced into marriage to old men.

Example: What is “food security,” really?

By comparing stories with the NGO speak “food security” against a much larger collection of farm-grow-plant stories, we see who talks about it, and what words they use:

food security mostly adult women and positive biased

Adult women talk are more likely to about food security. The topics generally are more positive than the typical stories. Below: words above the dividing line are more likely to appear in stories about “food security” than other stories about farming, growing, or security vs grow-plant-farm

What do girls dream about, hope for, or want?

By searching the texts for phrases “I dream” “I hope” or “I want” and then splitting left/right by female-male, you can see…

what kenyan and ugandan females dream hope for or want

Girls are much more likely to frame their aspirations in a “if I work had… then…” mindset than are males. (Words above the line are more often in female-narratives; below = male centric words). Boys talked about World Vision much more. And both sexes talked about education and starting a business equally (bubbles on the line).

Taking a broader step, you can see cognitive patterns change in women throughout life in rather interesting ways:

women hope and thinkIn stories where women talk about “hope” and use at least one thinking word, they tend to be more negative than women who hope for things without thinking much about it. As women get older, stories of hope are more likely to be negative, but especially so if they have also thought and written about examining it.

The author of the book, The Secret Life of Pronouns, finds a similar pattern as we see here – critical thinkers are more negative about the events:

people are less introspective as they grow older

But the other trend is something unique to our international development storytelling: People become less likely to describe a story with introspection as they age. I’ve speculated this is because government and civil society don’t listen.

Returning to the “hope, dream, want” collection – after you merge both collections, divinding patterns on the left by those on the right –  fun, freedom, and respect are the talked about more by women than men. And whereas women are more positive in their stories tagged with fun and freedom than men, respect is neutral. From the two stand-alone data sets (upper left and upper right) it is clear that these aspirational stories tend to be more negative than the typical East African story collected.


School Uniforms in Busia

Innovations for poverty action ran a randomized controlled trial in Busia a decade ago, proving that providing school uniforms improves school outcomes and is more cost effective than school fees. Looking only at stories from Busia and comparing “uniforms” stories to “school” stories (it will automatically remove overlapping stories from the benchmark for you), we see:

uniform-vs-school-busia merge

Teen women talk about uniforms positively, but younger girls are slightly negative, compared to stories from them about school. Adult women are also negative, and men 17-30 do not talk about uniforms at all. The topic analysis shows that uniform stories are much more about security, and less about knowledge (the books are smaller). Looking at Busia, then Kenya, then East Africa – I find that uniforms are a much less talked about problem than school fees.

Point of view: When ” I ” go to school

Words above the line come from stories that mention “school” and include first person words, such as “I” or “my”. Below: a random selection of school stories. Note how the top is about who the person has to thank for education (mother, father, god, family) and include a lot of positive words. Below, impersonal groups appear (orphans, students, village, pupils, schools, youth, teachers, needy, girls). “I” stories are much richer (higher quality data) because they can teach us more from specific anecdotes than the generalized observations of the stories below the line.

Analyzing Tip: search for ” I ” instead of just “I” so that finds the space before and after the “I”. Otherwise, all stories containing a word with an “i” in them would be included.

school i-words or without

When you look across two of these examples (people who are thoughtful and introspective in their stories vs those who tell a school story from their own perspective, we achieve nearly opposite patterns in what is positive and negative:

school i-words vs why stories icons

Children (especially boys) are more likely to ask “why” in a story about anything, and these stories are always more negative. Telling a story about school and putting yourself in the story is typical neutral. But when it comes to stories about the topic of respect, both groups are more positive than the rest.

Program-specific benchmarking

The next four examples use stories about specific organizations or projects and compare them to their respective issues.

  • The Mrembo project was designed to train adolescent girls about life skills and avoid teen pregnancy and early marriage. It eventually focused on preventing sexual assault and rape because of the storytelling project.
  • Tysa is a youth-sports organization. Looking at stories about them compare to their benchmark (youth sports stories), they do a lot more to pay school fees, target young girls more, and work with parents more.
  • Retrak is an organization that works with street children in Kampala, Uganda. These stories show that family problems are a major influence in why kids run away.
  • Comparing the “Street children” stories to a random sample, we see what is least related in all of development: water, health, HIVAIDS, development, business, and women. All these other issues come up more often in stories NOT about street children.



Extensibility is the degree to which an existing system can accommodate new features with a minimum of changes. It’s a word that never escapes the lips of monitoring and evaluation experts, because evaluations rarely boast this feature (By rarely I mean, never, period. Until now). Not that that they couldn’t, mind you – it just requires reworking the way we gather evidence and rethinking the way we organize it.

I made these tools extremely flexible, both in how data gets in and how we pull insights out because it is much easier to innovate by changing your own world than to wait for others to change theirs. But enough philosophizing. Here are examples of new, meaningful ways to interpret story data that are as powerful as if we’d asked users more survey questions. In every case, you can simply cut and paste the “how to ask it” text into the story text search box, and it is as if you are filtering by answers to the question in the “question” column. You can combine them with specific topics (i.e. (“thank you” “to thank” ) and school):



 How to ask it in

gratitude words Is this story about thanking an organization for their effort? (“thank you” “to thank” )
cognitive words How thoughtful were you in the story you just told? (know knew realize understand understood think thought consider ponder wonder remember cogn conceive believe speculate why )
exclusives (but without except however )
aspirational words Did the storyteller hope for more than what actually happened? (hope aspir promise predict ambition )
organization words Words associated with narratives where an organization was involved. (organization organisation admin accountable addressing collaborating development association “women group” “self help” cooperative constituent intervention “youth group” ministry foundation project program initiative )
negative  words How bad did you feel about the story you told? (” no ” ”  not ” never noone nobody )
negative emotion words (angry depressed confused helpless irritated upset enraged disappointed doubtful alone hostile discouraged uncertain paralyzed insult shame indecisive fatigued powerless perplexed useless annoyed “not happy” embarrassed inferior upset guilty hesitant vulnerable hateful dissatisfied empty unpleasant miserable offensive detestable disillusioned hesitant bitter despair despicable skeptical frustrated resentful disgusting distrustful distressed terrible pathetic despair unsure tragic infuriated uneasy ” bad ” pessimistic indignant )
positive emotion words How good did you feel about the story you told? (open happy good great playful calm confident courageous peaceful reliable joyous energetic “at ease” easy lucky liberated comfortable amazed fortunate optimistic pleased delighted provocative encouraged sympathetic overjoyed joy impulsive clever interested glee surprised satisfied thankful frisky content receptive important animated accepting festive spirited certain kind ecstatic thrilled relaxed satisfied wonderful serene glad cheerful bright sunny blessed merry reassured elated jubilant love strong loving eager considerate keen affectionate fascinated earnest sure sensitive intrigued intent certain tender absorbed devoted inquisitive inspired unique attracted determined dynamic passion excited tenacious admir engrossed enthus hardy warm curious bold secure touched brave sympathy daring challenged loved optimistic comforted drawn confident hopeful )
question words Did the storyteller ask a question in the story? why
discrepancy words Did the storyteller talk about what could have happened? (could would should )
tentative (maybe perhaps sometimes might almost “more or less” )
first person “ I “ is used more by followers than leaders, more by truth-tellers than liars, (” I ” “I’m” “I’ll” “I’ve” “I’d” )
cause-effect Story shows cause-effect thinking (because reason effect ” if ” )
analytical (but without except) and (because reason effect) and [cognitive words above]
black-white thinking Does he/she see world in absolutes? (always never absolutely surely )
relationships (mother father sister brother son daughter grandfather grandmother parent friend lover husband wife relative uncle aunt )
time-space words Associated with truthfulness (day time started year morning evening night) and (after before while next around above often )

My “Claimer” (e.g. the opposite of a disclaimer)

If this kind of analysis seems too abstract to be useful in international development, I’d caution you to try using community feedback to think about the root causes of the problem before jumping to the conclusion that by measuring the countable goods and services delivered better (the “outcomes”), we solve the problem. Today you can study the root causes much easier than ever before, and our understanding of the problem is ultimately going to be the less complex part of the problem to “fix.” As Anais Nin says,

We don’t see things as they are…

…we see things as WE are.

Outcomes vs monitoring: While the logistics of every intervention requires a quantitative measurement and real-time tracking approach, this is not that. USPS, UPS, and FedEx are masters of logistics, but they can’t tell you what to buy your mother for Christmas. This is a tool to understand your mother.

Impact evaluations answer the question, “what would have happened if we did nothing?” and  “What tangible improvements with we make?” This also, is not that. It doesn’t want to be that. Take education, for example. If educators applied the “impact question” they would ask, “how will this lesson plan change a student’s income 25 years from now? How will it make them more likely to vote, to volunteer, to avoid breaking the law, or cheat on their taxes?” This is a performance monitoring system, with aspirations to be a real-time feedback loop system between citizens and civil society/government/corporations/media. I take my lessons from educators who learned long ago that the “impact question” cannot be answered quick enough to provide course-correction (pun intended). Instead, they ask, “what are students retaining from this lesson plan?” and “can they apply what they learned today to real world problems tomorrow?”

Yeah, we’re doing that too. This is part of a larger program GlobalGiving is launching next month to provide all of our partner organizations with real-time feedback on their performance. Specifically, how well they listen, act, and learn in cycles as they do the work they are already doing. By interacting with a website ( for a few years, they generate a behavioral profile that they can learn from, especially when benchmarked against similar organizations.

If the aggregate-filter-contexualize-benchmark-visualize features of that system resemble this system, it’s because I helped to create both. I think these may become the future steps of all big data learning feedback systems, but what do I know? We in the aid world are still talking about samples in the hundreds when corporations are talking about the coming brontobyte era – where we archive more data each day than we created in past 2000 years. Data, mind you, is not knowledge. You need to aggregate-filter-contextualize-benchmark-visualize it before you can listen, act, and learn from it.

Better narratives are simply better data: The problems in international development are going be an order of magnitude easier to solve if we have richer data. To get it, simply (1) Ask people, on a large scale, what they want. (2) Demand that they get involved in the process if they want it. Many will volunteer to improve their own lives. (3) Work with them to make sense of their own world, and fix the work being done “to them” and not “for them.” (4) When get it, you’ll “get it.”

This iterative approach requires more aggregation and less structure in the data itself. My next batch of tools released will support that need exactly.

Quality control: When working with narratives, “quality control” is not as hard as you think – but you need to follow the rules.

  1. Collect enough: Minimum viable sample size seems to be around 100 stories (when all answer the same prompting question). Collections of stories can be used to build a meta-narrative (and are viable for statistical significance testing), whereas individual stories can only be trusted as anecdotal evidence to support an idea.
  2. Calculate statistical power: Power is the chance of seeing a difference, if there is a difference there to see, and nobody pays attention to it enough. Power is related to sample diversity. Did you get enough independent sources? If you think this is “staff work” and not “community work” then you will fail. There are no experts you can outsource the evaluation to. You must engage your community for it to work. Future tool upgrades will auto-calculate the “statistical power” of collections for you. When you survey both the community and your organization’s beneficiaries you will have more power to detect patterns.
  3. Diversity makes it a meta analysis: The tool tells you how many scribes, storytellers, organizations, and locations are represented in each story collection you build with Future versions will have the power to calculate meta analyses, with the power to provide results as rigorous as randomized controlled trials, if you have the power in your sample to detect differences, but without depriving people of what they deserve. In the school classroom example, to truly measure the impact of education, you would need to randomize the classroom and deny half the class an education, just to prove that it matters. We can’t do that. Instead, we have to aggregate real world data and look for natural experiments, such as comparing the thousands of narratives of people already denied an education to those who got the opportunity. It’s not as rigorous, but it will reveal the same answer. The way we “control for other factors” is to have a colossal sample of narratives so that these other factors are in the story but differences in them cancel out. This is the future. And the greater the diversity of sources, the more likely any differences are to be real, robust, valid predictors on a vast scale.
  4. “Ahem, they can see your raw data”: Because the data is public, we can check claims people make against their data in minutes. Wildly speculative claims will be easy to refute. And smart advocates will make more public use of the data in their arguments and reasoning to lend credibility to their claims in a trackable way. It’s like “open-sourcing” the evidence-based-decision-making of the aid world (not that I actually think they make evidence-based decisions, but now they can stop pretending and start attending to the needs, opinions, and insights of citizens.)

Why this is better:

  1. Easier to manage than “quantitative” indicators: Collections are extensible, aggregatable, and comparable.
  2. We can detect and correct bias with narratives, as explained in The Secret Life of Pronouns (James Pennebaker).
  3. Emergence: narratives and brief surveys provide “enough”.
  4. Focused on listening and collecting multiple perspectives.


Why aid fails (syndicated on how-matters)

These are the images that appear in my guest blog on

I highly recommend you leave my blog and visit how-matters to read it :)

- Marc


fig1 how to use stories to sompare what groups of people think

fig2 adults and men more likely to be affected and less involved

fig3 learning lies at the heart of involvement and failure

This will blow your mind: As people get older, they become less introspective in the stories they tell, asking “why” less and less.

fig4 story perspective - the older you get the less you think critically

djotjog-searchdjotjog-compare-146x75freeBuild your own storytelling project

The curious aid worker





Get every new post delivered to your Inbox.

Join 930 other followers