Changing your point of view to tell a more compelling story

For years I’ve wanted to write an algorithm that would predict whether a story is emotionally compelling or not. This would be a major breakthrough for natural language processing. It would also allow us to automatically rate most of the narrative content on the Internet.

While I am not there yet, I am making progress. Using the wisdom of James Pennebaker from The Secret Life of Pronouns I was able to write a story point of view detector that seems to finally work. Not only does it tell what the story’s point of view is, it can also assign a confidence score to its prediction, as well as reliably detect stories that lack a dominant point of view (result is “none”),  or share two alternating points of view (result is “mixed”).

That’s what goes into any good algorithm. If asked to decide between A or B (the simplest choice), there are actually four possible answers: A, B,  Both, or Neither.

Storytelling: Seven Points of View

After many rounds of testing, I discovered 7 points of view:

pov_chart

This may come as a surprise to anyone who was taught about only three point’s of view (POV). Based on the evidence that people respond differently to these different points of view, they are distinct.

Emotionally Compelling: Mixed or “I” stories

The most powerful point of view if you want to tell an emotionally compelling story, according to The Secret Life of Pronouns is the “mixed” perspective, followed by first person singular (“I”) stories. “Mixed” perspectives alternate between two points of view. And after you realize this, it’s obvious. If you want people to connect with you, and find your point of view credible, you need to spend a little bit of time telling the story from their point of view.

What 98,447 stories can teach us

Below is a chart showing what fraction of stories are told from each of these perspectives for three large bodies of narratives. GlobalGiving requires that every project leader report back to donors four times a year for every project. The report is supposed to be informal, conversational, emotionally engaging blog-type writing. And since 2010 we’ve been collecting stories in East Africa written by regular citizens about some specific community effort they witnessed — the Storytelling Project.

Lastly, for the last two years I’ve been getting a “story of the day” by email from a project of my favorite Artist, Jonathan Harris of I want you to want me” fame. His storytelling site, Cowbird.com, manually curates good stories from the thousands of submissions. Their 812 stories are a positive control group in this experiment, answering the question:

“From what point of view should an emotionally compelling story be told?”

It stands to reason that all of the 812 cowbird stories are good, and their point-of-view (pov) patterns are reflective of what makes for good storytelling as a rule. Let’s compare these three groups:

GlobalGiving Project Reports (N=35,689) East African Community Stories (N=61,946) Cowbird.com Story of the day (N=812)
fourth “this org” 0.35 fourth “this org” 0.29 first singular “I” 0.514
first plural “we” 0.268 third plural “they” 0.197 third singular “he” 0.112
third plural “we” 0.126 None (no pronouns) 0.18 fourth “it” 0.108
third singular “he” 0.098 third singular “he” 0.117 None (no pronouns) 0.078
second “you” 0.069 first plural “we” 0.084 first plural “we” 0.07
first singular “I” 0.046 first singular “I” 0.078 second “you” 0.057
None 0.04 mixed 0.049 third plural 0.033
mixed 0.003 second 0.007 mixed 0.028

As you can see from the table, there are dramatic differences. A graph of this makes the differences clearer:

pov chart project reports vs community stories vs cowbird

How POV affects story quality: Three major conclusions

oneFirst, 51% of Cowbird stories are first person singular (I, me, my, mine), compared to 4.6% of GG project reports and 7.8% of East African stories. If you want to reach people emotionally, only your own story will work. Instead of you telling his story, have him tell his own story with “I” pronouns.

two_2Second, Not enough (only 1 in 300) GlobalGiving project reports are told from a mixed point view. About 4.9% of East African stories and 2.8% of Cowbird SotDs have a more complex, mixed, alternating point of view. Had these reports been written to better reflect the beneficiary’s viewpoint, they could have raised 50% more money from donors (see below).

threeThird, too many GlobalGiving project leaders have a “fourth person” pov perspective. “Fourth person” is my name for stories that lack any pronouns at all, or contain a lot of definite articles (a, an, the). They tend to focus on objects over people and relationships. Fourth person (in my algorithm) also uses more organization only jaron (such as the words “ngo”,”cbo”, and “foundation”) than pronouns. All of these make for reports that read like cold blooded reports and not warm, personal, emotionally compelling stories. And if you read on, you’ll see these report raise 30% less money.

Since the point of communication is to affect each other’s lives, we should drop the old style reports in favor or just telling the truth and being authentic. But changing your pronouns won’t make your story better, if it was never your story to begin with. You need to actually help people tell their own stories, and be a steward of their words. For too long we’ve let organizations harvest the words of others to further their (organizational) objectives, and this algorithm will finally allow me to out the worst of the bunch and force them to shape up.

chart how point of view POV affects emotion in story

Your English teacher mistaught you; get over it.

When we want to inspire, engage, comfort, challenge and connect with each other, we use short, personal, evocative writing, with a good deal of “I” words. Yet from an early age we are exposed to bad writing, reflecting outdated “beliefs” about what makes writing good. The evidence here shows that good writing is less “professional.”

Which world do you want to build today?  “Professionalized” language gave us global poverty, a financial crisis, and broken politics?

Creative and informal language gave us The Muppets, Neil Degrasse Tyson, and Doctor Who.

Follow-up conclusions:

give_nowChanging your point of view really DOES affect your ability to raise money with a project report

I took those project reports from thousands of GlobalGiving partner organizations and compared the dominant point of view in each report with the amount of donations that came from people clicking on the GIVE BUTTON in those reports. The results were striking:

 Effectiveness of project reports in raising money None third plural (they) fourth (this org, it) first plural (we) third singular (he) first singular (I) second (you) mixed
Total $$ raised 78 220 267 292 302 329 421 567
Donations per report 0.9 2.5 2.8 3.1 3.5 3.8 4.8 6.5
Average $$ per donation 24.9 46.7 53.9 55.8 52.4 51.9 58.0 60.5
Number of reports (N) 611 2519 7413 5881 2184 1056 1449 98

Notes: N = 25,337 published reports. Data includes cases where nobody gave any money after reading reports (23% of total). While reports don’t generate a ton of revenue (50% of reports raised less than $100) $1,077,000 was raised between 2007 and 2014 in precisely this way. This data represents the best example of giving tied directly to feedback loops in international development that I know of.

The results show that the best POV ‘mixed’ is more than twice as effective as the most common POV ‘fourth':

Project reports with a “mixed” perspective raise 111% more money and get 160% more donations than reports with “fourth” org-centric point of view.

Some caveats: These are not true “controlled” experiments. Nobody forced these organizations to adopt a first or third person perspective. Nor did we randomize what donors saw, as a true researcher might do. It could be that people who are naturally better at raising money tend to choose to use pronouns differently from those who don’t. And it turns out that women write these reports 2:1 over men. And what people talk about has a big influence over how much money one can raise. Here’s an estimate of how project theme affects donor giving after they read a fresh report:

animals gender disaster children hunger finance health climate edu rights econ devt sport
total $$ 1707 1656 1194 1044 1079 944 809 806 757 700 375 578
Reports (N) 928 2820 1315 4542 226 544 3397 712 4502 618 1185 284

The smartest way to fix your point of view is to talk to others and share their stories, instead of only writing from your perspective. And Globalgiving has for years been helping organizations listen, act, learn better. In fact we’re giving away money to encourage organizations to do this.

donations-volume-size-vs-pov-projrept

The Gap

There is a huge gap between how most organizations speak and what donors respond to. The green line near the center shows what fraction of stories have each of 6 points of view. The blue and red lines represent more donations and more money raised from a “you”, “I” and “you and I” mixed perspective.

(2) Humans are not very good at determining a story’s point of view

In order to validate the accuracy of this algorithm, I ran 406 of the 813 Cowbird stories through an experiment on Crowdflower. Crowdflower is a distributed tasking site where you pay people a few pennies to do a bunch of simple tasks.

In my task, the person would read two Cowbird stories, select the point of view for each, and then choose which story was the more “emotionally compelling” one. The secret life of pronouns predicts that “mixed” perspectives and “I” stories are more compelling to readers than “you” | “we” | “he” | “they” stories. So I tested our data set and had three people do the test for each comparison. Inter-subject agreement is an important part of seeing whether this task is easy or hard for humans.

Now I know from reading Cowbird that most of the stories actually are “I” stories, and my algorithm predicted 51% of these stories to be first person singular as I expected to see. The “mixed” perspective was much lower – only about 2%. But these are very short stories, and switching perspective isn’t as easy in 100 words, so 2% sounded reasonable.

The results from 406 human story comparisons:

Q: Select the story’s point of view (POV) from these 6 choices:

“I” –FS 118 0.29 vs algorithm: 0.514
“we” –FP 100 0.25 vs algorithm: 0.07
“he” –TS 64 0.16 vs algorithm: 0.112
“they” –TP 79 0.19 vs algorithm: 0.033
“the org” or “it”–4th 35 0.09 vs algorithm: 0.108
“mixed” –mixed 10 0.02 vs algorithm: 0.028

  • The humans were 40% LESS likely to choose first person singular than the algorithm, and three times MORE likely to assign first person plural to stories.
  • Both humans and the algorithm agreed when assigning “mixed” and 4th person perspectives.
  • Humans tended to want to assign stories to each POV more equally than a computer. (If given 6  choices, we seem to think that the stories SHOULD match up with categories equally. Same bias is seen on standardized tests.)
  • These humans were not very reliable, because the humans only agreed with each other 11 out of 406 times. 2 out of 3 agreed 50% of the time on what the perspective was.

Q: Of these two, which story was more compelling?

Same result. They agreed with each other 36% of the time. If choosing randomly, they would agree with each other 33% of the time, so that confirms that these Crowdflower humans are really very random and not worth the $16 I paid to test this data set on them. Had I asked 5 interns to do this, I would have gotten more agreement, because they care about agreeing with each other more than the $0.05 I was paying these folks to do a simple (though enjoyable) task.

It also confirms that seeing a story’s point of view is not so easy. If it was trivial, they would have agreed with each other more. Agreeing on which of two stories is more emotionally compelling is much harder, and likely impossible for any algorithm well at predicting what humans like. Even “human algorithms” are terrible at doing it.

A good story is more a matter of taste than of process, but people DO give to projects more often when stories are told from the right point of view – the beneficiary’s.

Try it yourself!

I created a simple tool for anyone to use. Paste your text into the box and it will analyze your point of view.

At djotjog.com/c/report/.

screenshot-by-nimbus (21)

Practicing what I preach

Old habits die hard. I ran my own algorithm against this blog post and it predicted that I am writing from a “fourth person” perspective with an 80 percent confidence rating.

OUCH! I soooo suck as a writer. Or so my computer tells me.

So I went back into this and changed some of my “you” and “we” statements to “I” statements and ran it again.

The Result: “fourth person”, 92% sure, 108 pronouns,  6.3% of text is pronouns

Pronoun counts by POV type:
[('fourth', 40), ('first singular', 31), ('second', 17), ('first plural', 10), ('third plural', 7), ('third singular', 3)]

The reason why I failed? I used too many “its” and “these” and “those” and not enough “I”s in it.

Oh well. [I'm] Hitting the publishing button now.

:)

Who’s Who of Organizations Ranked by Website Traffick

Alexa.com ranks all internet websites in the world based on how much traffic they get. I pulled a list of 3600 organizations and looked at their rankings in Alexa. These are the top 70 sites:

(Lower is better. i.e. Face = 2 and Google = 1)

Alexa Rank Site
10008 kiva.org
29295 wikimediafoundation.org
29719 autismspeaks.org
30470 worldwildlife.org/
35952 crc.uri.edu
42261 unicefusa.org
43454 nationalmssociety.org
45993 donorschoose.org
48806 oxfam.org.uk
52434 tigweb.org
53778 worldvision.org
54869 nature.org/
57005 rotary.org/endpolio
60860 livestrong.org
63450 globalgiving.org
65156 carleton.edu
68667 stbaldricks.org
70790 habitat.org/default.aspx
70862 nwf.org/
71303 japan.ashoka.org
74814 feedingamerica.org
75675 doctorswithoutborders.org
76757 bhf.org.uk/
83023 savethechildren.org
85042 inotherwords.org
85268 defenders.org
88988 nols.edu/
93188 thetech.org
100190 bestfriends.org/
101997 laneta.apc.org/desmiac
105183 uopeople.org
106004 pathfinder.org
110170 care.org
111217 alzheimers.org.uk
111265 us.movember.com/
118500 mercycorps.org
123953 teachforindia.org
130336 cry.org/index.html
144080 cff.org
144795 ccfa.org
149263 iucn.org/
156040 isa.org/
163908 lls.org
170456 psoriasis.org
177934 princes-trust.org.uk/
177965 heifer.org
184671 ijm.org
199416 bbbs.org/memphis
200255 worldpulse.com
202038 wcs.org
211427 americanhumane.org
240543 path.org/
242017 internationalmedicalcorps.org
245309 oxfamamerica.org/
263635 teriin.org
271126 documentary.org
274407 girlswhocode.com
280964 ineesite.org
281032 sustrans.org.uk
283065 kipp.org/
285384 us.tzuchi.org
291488 notforsalecampaign.org
292427 roomtoread.org
294625 janegoodall.org
300621 unfoundation.org
300746 womenforwomen.org
320616 liverfoundation.org
322227 humanityhealing.org
338673 sfaf.org/
354992 cityyear.org
357158 mariecurie.org.uk

That list is a little different than the typical who’s who lists for international development organizations. You won’t find BRAC or CHEMONICS or a whole host of UN agencies, or basically any organization that depends primarily on government support. This is a who’s who list of organizations that depend on the public for support.

Five Holy Books in five images

Since it is Holy Week, here are some rather intriguing visuals of the Quran and three competing perspectives on Jesus (The Canonical Gospels, Paul’s attributions, and The (non-canonical) Gospel of Thomas):

The whole Holy Quran as a wordle

whole quran wordle

The Gospel of Thomas
gospel thomas wordle

The Gospel of John

gospel john wordle

All sayings attributed to Jesus in Paul’s Letters

pauls letters - all sayings attributed to jesus - wordle

The Gospel of Mark gospel mark wordle

A while back I wrote a simple python script that would perform differential wordles (like I used in these two rape-prevention programs) but I lost it. If I rewrite it, you would be able to see an adjusted view of what these different stories emphasize about God, Allah, Jesus, etc.

Source: http://www.utoronto.ca/religion/synopsis/meta-6gv.htm

Or you can read my series on how the Passion Narrative relates to international development:

One: Empire – and the hierarchy of aid power

Story-centered learning: Gather “big data” before hypothesis testing

Reblogged from my ThinkNPC guest post:

In the last half-century thousands of scientists have rigorously studied the causes and risk factors in heart disease, but a single longitudinal experiment has revealed more about this disease than any other approach.

In 1948, researchers began tracking health records from all participants in the town of Framingham, Massachusetts. This was an observational study; they did not formulate causal theories or test specific hypotheses, but simply let nature take its course and observed what happened.

In 1960, they found a link between smoking and heart disease. In 1961, they found a link with cholesterol. And in the coming decades, they also found correlations with obesity, exercise, high blood pressure, hypertension, stroke, diabetes—virtually everything that now matters to clinical treatment.

So why aren’t we in the philanthropy world copying this approach—observing what’s out there and looking for patterns over time?

As a neuroscientist, I have a confession to make. My type have been responsible for propagating a lie they still teach in schools, that scientists always devise a hypothesis and test it in controlled experiments. This is simply not true. The human genome project mapped 3 billion base pairs before understanding what variation in the genetic code meant. human-genome

The drugs you take were “discovered” in massive drug discovery libraries using a screening process that quickly conducts millions of tests, rather than hypothesizing. 

My point is that complex problems cannot be understood from a pre-defined framework; what matters emerges most efficiently from open-ended data collection that is later organised and then studied.

We already create more information every two days than existed in the first two millennia of human civilization, and this pace is accelerating. However, the rate with which we convert all this “information” into useful “knowledge” is slowing down.

all-story-topics-2011It was with this problem in mind that we started the GlobalGiving Storytelling project. We needed to dissociate two requirements: to collect rich information about development in a flexible, easily re-structurable way, and to turn these stories into data so we can interpret and contextualize what we see. We’ve come up with a survey design tool which you can use to do a custom evaluation and compare your results to stories told by others, with the overall aim of helping everyone share knowledge and improve project design. The  approach will save you time but it will also enable you to get more back than you could ever put in.

So why do we use storytelling, you wonder? It turns out that managing this process with metrics, indicators, spreadsheets, and a numbers-only mindset is far more difficult and time-consuming. Narratives and a few survey questions are sufficient to see common patterns emerge from many perspectives.

Continue reading on ThinkNPC

Marc Maxson is an innovation consultant with Globalgiving, where he manages their global storytelling project. Previously, he worked as a PhD Neuroscientist and did Fulbright research on the impact of the internet on rural education in West Africa. He writes about evolution and international development at chewychunks.wordpress.com

When toys tell stories

I first learned about GoldieBlox from their superbowl ad, where they aggressively combat the toy industry’s stupid assumptions about what girls like (It’s not just about making it pink and putting a pony tail on it).

They are on a mission:

Only 13% of engineers are women and they believe that women innovators are our greatest untapped resource. 

They have a theory of change:

We inspire girls during a critical period, between age 6 and 13, and allow them to realize for themselves that building, creating, and owning their own ideas is what it means to be a girl.

Their latest ad campaign continues their message more thoughtfully:

(Note that begins as a parody of a 1980s anti-drug commercial, and so their ads are also targeting parents)

How is GoldieBlox “for” girls? (From their website)

Our founder, Debbie, spent a year researching gender differences to develop a construction toy that went deeper than just “making it pink” to appeal to girls. She read countless articles on the female brain, cognitive development and children’s play patterns. She interviewed parents, educators, neuroscientists and STEM experts. Most importantly, she played with hundreds of kids. Her big “aha”? Girls have strong verbal skills. They love stories and characters. They aren’t as interested in building for the sake of building; they want to know why. GoldieBlox stories replace the 1-2-3 instruction manual and provide narrative-based building, centered around a role model character who solves problems by building machines. Goldie’s stories relate to girls’ lives, have a sense of humor and make engineering fun.

That was an “aha!” statement for me. “Finally, something I can sink my teeth into!” I thought. So building blocks can be thought of as a storytelling tool, like the magic cards I made earlier. I know about character driven stories, and putting conflict into scenes to move it along and draw in the audience.

And in a way, GoldieBlox is using a conflict narrative to draw in their audience – girls. What a brilliant way to get girls on board, by reminding them from age 6 onwards that playing with these toys is an act of defiance against gender stereotypes.

And another company, play-i, offers a complementary approach to the same goal, for a younger audience:

https://www.play-i.com/

I just wished they had similar toys for the teenage crowd? What will these Goldie girls do when they outgrow their blocks? Perhaps this?

Python_For_Kids-2

http://python4kids.wordpress.com/

 

goldieblox-logo

fry_shut_up_and_take_my_money.jgp

A good proxy indicator for organizational learning culture

A recent Huffington Post article brought an interesting tool to my colleague Nick’s attention. Collusion helps you spy on the companies that are colluding to spy on you as you surf the internet. For example, every time you check the weather all of these sites are informed about you:

the-weather-channel-collusion

A list of websites that receive information from weather.com are shown on the left. About half are red and crossed out because collusion (this chrome plugin) blocked their access.

As you browse, collusion creates a network map showing how the different sites you visit talk to each other. You can hover over any node in the network to see a site’s connections and automatically block the transmission of data to known tracking sites like Google ad services, Doubleclick.net, etc. As you sift through your browsing’s connections, it quickly becomes clear that not all sites are created equal when it comes to tracking your metadata.

Our insight was that this tool could serve another purpose. You see, Nick and I are responsible for building up GlobalGiving’s database on organizational behavior and curiousity. This is used to measure each organization’s performance in a real-time, comprehensive way. If we could sort all organizations in the world into “good” and “bad” groups based on their habits, such as being responsive to the community they serve, demonstrating a tendency to learn from mistakes and remember what they’ve tried before (knowledge management), or their making effective use of free performance tools in their daily work (agility), we could help more money reach better NGOs, and ultimately improve more lives with the same amount of resources.

This is the same as saying “we’re going to make the whole aid world more efficient,” but when we say it, we mean it – because we have a way to do what we say. In the “big data” era, information will be used to make thousands of little evidence-based decisions that will improve the system overall.

But on to specifics. What do organizations’ websites reveal about their agility? A lot.

Look at these organization websites:

Each of these have hundred-million-dollar budgets. So how much effort to they make to optimize learning about visitors to their homepages?

agile-world-vision

agile-care-orgagile-save-the-children-org

agile-brac

agile-msf-org

agile-united-way

agile-world-bank

agile-helen-keller

agile-oxfamagile-heifer-international

I see a correlation between how much the organization focuses on public donations (versus government or private support) and whether they use free analysis software, such as google analytics. Of the ten organizations shown above (which are close to a top ten list of worldwide organizations by size) only Save the Children, Care, and World Vision made a serious effort to learn from their website traffic. Five our of ten at least have some kind of basic (free) analytics (google-analytics and/or google tag manager).

For the other half that do not, it is telling. These organizations don’t really need public support to survive, and are also (in my opinion) less accountable to community feedback because they are “too big to fail” in the aid world:

  • World Bank
  • BRAC
  • MSF
  • United Way
  • Heifer International

Types of 3rd party data collection sites

Analysis (curiosity)

  • google-analytics.com
  • GoogleTagManager
  • kissmetrics
  • vmmpxl – quantcast web traffic demographics
  • mxpnl — mixpanel is like google analytics, but you pay for it and it offers more features

Visualization or dissemination

  • mapbox
  • uservoice.com
  • chartbeat.com
  • openlayers

Marketing

  • anything in red (advertising)
  • youtube

Faster web loading and cloud data 

  • amazonws
  • visualwebsiteoptimizer
  • rackcdn — rackspace cloud storage

Social Media Plugins

  • twimg — twitter
  • facebook

Design iteration and testing (curiosity)

  • optimizely
  • omniture

For comparison, I took snapshots of GlobalGiving and various other online giving marketplaces or organizations we partner with:

agile-globalgivingagile-donorschooseagile-kiva

agile-betterplace-org agile-razooagile-great-nonprofits agile-give-directly

agile-development-gateway

Clearly, all of these organizations take their web traffic seriously. Each of GlobalGiving, DonorsChoose, Kiva, BetterPlace, and Razoo uses at least one analytics tool, one cloud hosting tool to speed up website load times, and many use an iterative design and testing tool like optimizely.

The surprise here is that GiveDirectly (the recent darling of the aid world and the media world) does nothing to learn about their traffic. It makes me question how much of a learning focus their organization has internally.

And that is what this is all about. I believe that organizations stamp an imprint of their internal learning on their external websites.

Curious, learning, experimenting organizations use web-based tools that help them achieve their goals (and leave a trace for us to track).

Large bureaucratic “stick-in-the-mud” organizations do not use any of these tools, leave no trace of their learning, and thus are probably not focused on learning.

Web footprints for a few randomly chosen GlobalGiving partner orgs

These organizations are much smaller than the ones listed above, but they still use more learning tools than even the world bank or BRAC uses, ergo they are probably learning more with fewer resources in my assessment:

agile-wildlife-alliance-org

agile-mountains-of-hope-uganda

agile-ayni-education

agile-afghan-institute-of-learning

agile-outreach-uganda

agile-ouelessobougou

agile-vision-africa

agile-african-rainforest-conservancy

Five out of seven local GlobalGiving partner organizations use google analytics. 

That’s a small sample, but a larger fraction of the group are still using more tools to learn about web traffic than the million dollar orgs.

These are just screen shots to show that there is useful data out there. Once you realize that the tools exist to ask old questions in a new (and more efficient) way, you simply need to write a little code to gather all the information. This will be my take home message at the Georgetown University master’s program class I’m teaching this week:

Graduate School should help you learn how to ask better questions and to recognize when the status quo of information is insufficient to fix the problem.

We live in a world that clings to the “myth of evidence”‘: We think our leaders make decisions based on weighing evidence, but they do not. They never have. Throughout history they have made instead made experience-based decisions, limited by their own wisdom and prior failures. This is about to change.

Decisions used to be made using tiny scraps of information, because that is all that was available. But this decade is the turning point when evidence becomes cheaper to aggregate and interpret than the cost of making decisions without it. Some giants will fall and others will rise to take their places, all because they understand the new calculus of “big data.”

And when the dust clears, a new kind of democracy will be possible* where in the past is was merely theoretical: policy decisions will reflect all peoples’ opinions where choices are a matter of preference, or based on sound science and observing human behavior on a macro scale (like Isaac Asimov’s psychohistory idea) where policy depends on truth rather than preference.

(though this kind of democracy will be made possible, it will almost certainly be tried somewhere outside of North America or Europe first. My guess: somewhere in the middle east where people want real democracy)

How organizations are adopting the storytelling method to their local context

I am frequently asked for specific examples of how an organization can adopt the storytelling method to its specific programs. Here are case studies from my recent visits to UK-based organizations that are on the verge of implementing listening projects to evaluate their programs.

Case #1

For decades this organization has sought to bring together peoples and foster cultural understanding. The impact of their programs focuses on bridging social gaps, exposing people to different cultures, and changing attitudes and perceptions about the “other.” But instead of using a blunt survey that might ask, “how do you people about the other?” they arrived at this:
Share an experience where you had to work with someone different from yourself. 
This question will add context to the all-purpose story prompting question that we encourage all organizations to use:
Talk about a time when a person or organization tried to help someone or change something in your community.
So if you put them together, respondents will share a “community effort” story with a focus on their personal experience of working with someone different.
Out of this, they hope to gleam insights about the way that attitudes and behaviors are changing. They will ask internal “beneficiary” and external “community” people to both share stories for comparative analysis. They will ask each person to share two stories; one will focus on the difficulty of working with the “other” and the other story will be more open ended, about any meaningful community effort:
org-case-1-benchmarks
Who: They have a network of a dozen “alumni” that they will train as scribes. Then they plan to bring on groups in Syracuse, NY, Los Angeles, Indonesia, and Gaza.

Case #2

This organization helps thousands of teens in the big city. They measure impact as improved self-confidence, educational attainment, and long-term community involvement. Their programs help young people get “back on track” and help them find fulfilling careers. Though they manage dozens of community programs for youth, their storytelling question adds this flavor:
In your community effort story, talk about an event that personally changed you in some way.
They currently use a 24-question “life effectiveness questionnaire” that was validated by an academic expert [pdf]:
Time Management The extent that an individual makes optimum use of time.
Social Competence The degree of personal confidence and self-perceived ability in social interactions.
Achievement Motivation The extent to which the individual is motivated to achieve excellence and put the required effort into action to attain it.
Intellectual Flexibility The extent to which the individual adapts his/her thinking and accommodates new information from changing conditions and different perspectives.
Task Leadership The extent to which the individual leads other people effectively when a task needs to be done and productivity is the primary requirement.
Emotional Control The extent to which the individual maintains emotional control when faced with potentially stressful situations.
Active Initiative The extent to which the individual initiates action in new situations.
Self Confidence The degree of confidence the individual has in his/her abilities and the success of his/her actions
Clearly, the standard approach is rigorous and defensible, because it has been used in over 20 studies, but it isn’t very flexible. It prescribes the factors to be measured and then uses a cumbersome approach to measure things in a not-so-fun way. Our storytelling form will be front and back of a single sheet of paper and takes just a few minutes to complete, with most of that time devoted to a personal narrative.
I’m most excited that in one of their three programs, they will test an approach we’re borrowing from the book, “The Secret Life of Pronouns” by James Pennebaker. This program pairs youth with older volunteers and they work together to revitalize the neighborhood. At regular intervals, these pairs will interview each other in the storytelling/listening project. Later, we will compare these pairs as conversations and look for language mirroring. 
Mirroring is a measure of engagement. In this context, when young and old start to adopt the other’s way of speaking in their stories, we infer that they are building a relationship with some intimacy:
conversation langauge mirroring
Even without this mirroring measure, the broader 2-question approach is more likely to reveal community needs than the narrower life effectiveness questionnaire.

Case #3

This organization works with disabled youth, providing them with opportunities to do something wonderful, like the Make A Wish foundation. After some debate, they settled on adding this context to the storytelling question:
Talk about a childhood experience where you were able to do something you never thought you could have done.

They can use this with four different populations they serve: children, parents, donors (to build empathy), and volunteers/public/schools. This is an exciting aspect because unlike other evaluation frameworks, they gain a deeper understanding of what kind of difference they are making in the life of a disabled child through the many others that are effected by this child’s experience.

“Our impact is much more than mere ‘fun’,” the director said. “Providing the inspiration to achieve more is what our events are all about.”

To that end, they are excited that one of the benchmarking follow-up questions in our design is:

“What would have made a difference in this story?”

That allows them to learn how to expand and refine their programs in an open-ended way. Asking this question of four groups will refine their messaging and grant writing, as well as improve their programs and build relationships with the volunteer network they will need to sustain this listening project.

Case #4

This organization will bring storytelling to the 30 schools where they do life skills training. They define success in much the same way Case #2 does. They want to use the open-ended storytelling question to look at how youth define the soft skills they receive, as well as build up an evidence base of the needs that these children have.

They expect it will be very difficult to get children to participate. I suggested that they engage teachers by offering to share the learning that emerges from stories with them. Teachers would probably like to know what their students think about, and this storytelling project offers them a lens into that. They may also explore a young-old mentoring program with the conversation mirroring approach.

Case #5

This organization runs a network of business startup incubators around the world. And while they would like to eventually find a common framework for measuring the impact everywhere, they planned to start with the local hubs.

They plan to ask business leaders and aspiring entrepreneurs to share two stories. One will be about “any community effort” they know/care about, and the other is their own community effort:

Talk about your journey of trying to start a business.
Through this journey narrative, they hope to see what elements define success and failure in an open-ended way. Perhaps their first 100 stories won’t reveal much, but they will have a benchmark over 1250 stories from East Africa about people trying to start a business there. As they grow their narrative collection, they’ll also be forced to build up relationships with people outside their narrow pool of incubator companies. As all of these companies are based on delivering some social benefit to society, the broader “community effort” stories will necessarily be a useful business intelligence database for future aspiring entrepreneurs to mine for ideas.
They were worried they wouldn’t find volunteers who wanted to interview these entrepreneurs. The next day I heard the friend I was staying with complain that no clubs offered him a way to meet like-minded people who are trying to start their own business. My friend tried starting three businesses in Kenya over the years, so I connected him with this organization and suggested they advertise a “meet up” to find more of these kinds of people.
To in effect, the evaluation scheme forces the organization to build up relationships with the community. That is what should be happening – evaluation improves design.

Case #6

This organization helps a half million volunteers find places to work. They too decided to pilot this storytelling with older volunteers. They use volunteering to improve life quality for the elderly and reduce social isolation. They added this context to the storytelling prompt:

Talk about an event that happened long ago and how that affects your life today.

By mining these narratives for emotion words they can quantify reduced social isolation. Isolated people use pronouns and articles differently than highly socialized people. By collecting stories monthly, they can plot the “journey” and look for trends across their volunteers, regardless of what else is talked about.

Topically, these stories will reveal life-transformation events that can be useful for designing future programs.

“And the requirement that we drop off and pick up story forms monthly will give our project managers an excuse to get out and visit these places,” the head person said happily.

They also have hundreds of narratives that they plan to import into our system and explore for more meaning.

One data system with many frameworks

I believe this is a real step forward in fixing our approach to impact evaluation. Instead of 6 organizations with 6 different ways to measure their “impact” we have 6 approaches that share a common back-end data collection system. Each of these organizations must collect as many open-ended narratives as they will of the more constrained questions outlined here.

They will have benchmarking. Even among the constrained questions, we see that there are some likely clusters for comparison:

storytelling context map

That is the beginning of a storytelling context map. With just five organizations, we see that three will likely have some overlap with each other’s themes, and the remaining two have reasonable overlap with similar stories from our existing collection of 57,000.

As dozens of organizations try this out, we may find that evaluation frameworks emerge from the choices that individual organizations make as they take their specific objectives up to a higher level of abstraction. The essential trick is to flip the design by not asking for exactly what you want to know, but to ask communities to react thoughtfully to the core elements of what define our struggles to be more human to each other.

Already proven to work: From Gay Rights to Marriage Equality

Today I attended a talk at NTEN titled “How RED changed everything [for marriage equality]“. For a generation, the gay rights movement lost every ballot referendum that they poured money into fighting. After 30 straight losses, they decided that their messaging wasn’t working. (Yes, 0:30 seems obvious in retrospect, but the nonprofit/advocacy world is very afraid to admit failure). They hired a media company and started focus grouping with straight people who opposed gay marriage.

They eventually got to the heart of the matter:

Tell us why you got married?

Straight people described how they fell in love. But when these people talked about gay marriage, they perceived the issue to be exactly what decades of pro-gay messaging had told them: They thought gay people wanted to be married for the legal benefits, or for tax breaks, or to prove that their lifestyle was acceptable because the government condoned it.

The movement took a hard look at their own messages. They started featuring actual gay people in their ads (instead of judges and legal experts). They told stories. The focused on families and love. And they flipped the public from being 60% opposed to 60% in support in just 5 years. I’m going to take this approach to my local church, which is trying to do the same for voter suppression in North Carolina this year.

This is an example of the power of storytelling. When the prompting question is broad enough to allow surprises to emerge, an idea that begins as “gay rights” becomes a story of “marriage equality.” Reframing an idea starts by asking the people whose mindset and behavior you want to change to speak openly about it. As much as possible, our job is to is to listen.

marriage equality emergence

Follow this thread: Examples of story analysis

Information is not knowledge

Shannon Information Theory defines information in a specific way: Information is the amount of “surprise” in communications. If I gave you a print out of this blog post, covered up part of a word in it, and asked you to predict the word after showing just the first two letters…

th…

You might answer therapist, but you’re more likely to answer

“the”

That is a very common word, and easily predictable. Hence, the “the” in this post doesn’t carry much information. Certainly a lot less than the word “Theroux” – who might mean a specific person, like Novelist Paul Theroux.

The most information dense communication would be  string of random characters. You cannot predict the next character from the previous one. But practically speaking, a bunch of random letters are meaningless.

One reason why the storytelling project can better inform the world is because it allows more information to flow from communities, and provides a better way to filter out the noise and help people find the knowledge in all that information. Instead of this:

information-knowledge

It allows this:

more-information-filtering-better-knowledge

Normally, too much information is a problem. Evaluators design narrow, specific surveys with tightly defined questions because they want the most knowledge to come out of the least information entered in. They seek to achieve a 1:1 information:knowledge conversion. The top diagram represents the way evaluators collect information with community surveys.

But if you have better filtering tools, you can instead maximize the information flow and rely on better filters to control what pieces of this information is meaningful. You can tolerate noise. You can fetch only the knowledge you need from a ton of information. But the next person with a different need can also retrieve the knowledge he needs. Google search does for the web, and the framingham heart study did this for medical risk factors. So why hasn’t anyone succeeded in doing this for poverty and social problems?

This would allow us to learn without starting over each time. Suddenly one set of information has two uses, and eventually hundreds of users – all because the information “firehose” was opened and the filtering was good.

This is smarter design. Maximum information input plus reasonably good filtering yields more knowledge to more people.

I encourage you to go back and read examples I posted on the knowledge we’ve been able to extract from stories with good relevance filtering.

Using big data to infer how people would’ve answered

I recently wrote an algorithm that would use the answers from 57,000 stories to predict what three topics people might choose for a story with similar words in it.

How does it work?

People tell a lot of stories, and the words they use are correlated with the topics they choose. So if the correlation is strong enough, a computer algorithm can correctly “guess” the topic the person would have chosen. The guess is based on (1) generating a dictionary of words and their frequency of use in stories a human has assigned to one of ten topics then (2) scoring a test story by adding up the relevance of each word in that story to the topic, based on that topic dictionary.

The rigorous way to do this is set aside 10-20% of the data to test the algorithm and use the rest to “train” it, then run the algorithm on the test set to estimate how likely it will be to choose the correct topic from among these 10 choices:

topic question from story form

I was surprised to see that the reliability of this approach depends on which topic you mean:

Fetched 19343 records, 1 fields, with 8010659 characters. Conn: Closed food
Fetched 15743 records, 1 fields, with 6517898 characters. Conn: Closed sec
Fetched 22009 records, 1 fields, with 9192587 characters. Conn: Closed fam
Fetched 19246 records, 1 fields, with 8186335 characters. Conn: Closed fre
Fetched 24335 records, 1 fields, with 10342187 characters. Conn: Closed phy
Fetched 30365 records, 1 fields, with 12079326 characters. Conn: Closed know
Fetched 16717 records, 1 fields, with 6985293 characters. Conn: Closed self
Fetched 8678 records, 1 fields, with 3274556 characters. Conn: Closed resp
Fetched 14378 records, 1 fields, with 5838550 characters. Conn: Closed cre
Fetched 5633 records, 1 fields, with 2050559 characters. Conn: Closed fun

Accuracy rates (percent match between the algorithm and what people choose)
{'kno': 95.7, 
'fre': 6.2, 
'res': 67.5, 
'cre': 16.8, 
'phy': 85.5, 
'sec': 2.8, 
'fam': 47.2,
'fun': 67.2, 
'slf': 0.4, 
'foo': 6.1}

That means that I can accurately predict stories about “knowledge” 96% of the time, but only 2.8% correct for “security” stories. Correlation with number of stories tagged with a topic is low. Fun is a seldom used topic, but matches with 67% accuracy;  self-esteem is 0nly 0.4% accurate, but tagged in 3X the number of stories that fun was.

Next I thought, “maybe the most common words in each reference dictionary are too similar among all 10 topics.” I noticed the top words are similar in many of the 10 topics. Words like ‘school’, ‘organization’, and ‘community’ are present in all stories, and so offer no differentiating ability. I should remove them.

creativity [('organization', 5200.40795559667), ('school', 3543.777062566668), ('community',
3248.152150989258), ('child', 2862.176422375521), ('day', 1558.2406172604306), ('helped',
1528.985994397759), ('village', 1518.7758112094393), ('area', 1459.6429306441655), ('organisation',
1431.8204927035933), ('aid', 1339.7938144329896),...]

security [('organisation', 1797.8478738427743), ('helped', 839.5979011322839), ('hiv',
758.4263051629651), ('pupil', 757.4545341769011), ('school', 749.9667855960569), ('month',
731.5803097814555), ('provides', 578.9633375474084), ("i'am", 544.4925373134329), ('child',
522.5630079912575), ('business', 519.3864168618267), ('standard', 480.0096525096525), ('money',
464.8109119558795), ('aid', 460.0247422680413), ('just', 455.86785009861933), ('happy',
422.7314842729374), ('mzesa', 405.6), ('thanks', 402.25015556938394), ('gulu', 395.3125763125763), ...]

knowledge [('child', 36082.86353391162), ('school', 33818.189588161),
('community', 32907.04868545692), ('helped', 32814.49078786444),
('organisation', 32659.84693237094), ('group', 18383.11439114391), ('life', 16962.78238448316), ('woman', 14369.672232361278), ('money', 14049.368721686034),
('good',13293.343451864701), ('youth', 13202.397977609246), ('food', 12707.99451382372), ('living',
12395.504079003864), ('poor', 12331.987821235045), ('parent', 11596.92557475659),
('education', 11534.22393346681), ('aid', 11186.01649484536), ...]

When you exclude all words that are in the 60th percentile of frequency or above, you get the opposite pattern for accuracy:

{'kno': 0.8,
'fre': 51.8,
'res': 2.6,
'cre': 24.4,
'phy': 0.8,
'sec': 79.8,
'fam': 2.9,
'fun': 3.3,
'slf': 97.3,
'foo': 59.1}

pythonWell that won’t do either. So I decided I needed to get serious. Oddly in python, that means writing a whopping five more lines of code instead of just the usual single line of code to do something amazing like “take all the words in all dictionaries and drop the words that are present at the 60th percentile or greater.”

Python code typically looks like this:

    def inall(key,topic_dicts):
        # returns True/False if a key is present in all the dicts of topic_dicts
        in_all = 0
        for k,v in topic_dicts.items():
            if key in v:
                in_all += 1
        if len(topic_dicts) == in_all: #if every dictionary has the word, these will match.
            return True
        else:
            return False
        alt_topic_dicts[k] = {x:y for x,y in v.items() if inall(x,topic_dicts) == False}


On my third try, I decided to exclude any words that are present in all 10 topics from each of the 10 respective topic (word:frequency) dictionaries. It took 75 seconds to rerun all the analysis, and the accuracy was much better:

{'kno': 86.7,
'fre': 83.8,
'res': 73.1,
'cre': 57.3,
'phy': 64.8,
'sec': 58.9,
'fam': 62.4,
'fun': 43.2,
'slf': 61.1,
'foo': 60.8}

So with the exception of stories with the topic “fun,” I can use this simple algorithm to predict the topic of a story (from a list of ten topics representing the hierarchy of human needs) correctly over 50% of the time.  The probability of randomly picking the right topic would be one in ten — 10% success — so I’m quite happy with this result.

But is 65% accuracy (on average) “good”?

In 2009 we ran this experiment with humans. This what what storytellers chose:

What people talked about in stories from Kenya

And this is what human “experts” predicted:

human-experts-topic-prediction

When we asked 65 aid experts to pick the top 6 out of 12 topics in that survey question, and rank-order them, only one of out 65 got #1 correct! And later, he admitted in email that he just guessed. Overall, people performed worse than chance (8%) at this task, because they were biased by what they thought the main topics would be for everyone.

So in that context, this algorithm does surprisingly well, and much better than humans for this specific task.

By another measure, in the sense of Shannon Information Theory, it provides 3X to 6X more information than we would have about the story had we not included this new “meta data.” The exact number is tricky to calculate (at 3am) because storytellers were asked to choose 3 of 10 topics on the form and if the algorithm’s #1 choice is in the top 3, then I count that as a hit. A rigorous result would only count cases where all three topics matched the human’s choice as correct. That’s a bit more involved that what I care about. This does bring up an interesting point about surveys. Most questions only allow for one right answer on forms, and we required 3 of 10 answers. It makes it easier for the algorithm to “learn” how to be mostly right because each story has multiple topics that overlap. Good to think about doing this on more surveys in the future Big Data Era.

The Big Idea Behind Big Data

This topic prediction approach works because of some very simple math and a huge, rather complete amount of empirical data (57,000 stories about the types of things people talk about when they describe community efforts in East Africa). International Development suffers from having the smallest and most disconnected data systems on Earth. This is a rather large training data set, where poverty is concerned. But once you have this, you can do a lot more with it – such as categorize future narratives along a hierarchy of needs with about 65% accuracy – without having to collect more data and waste more peoples’ time.

Learning can happen faster.

People can take action quicker.

It’s not a replacement for listening, but it can aid our understanding.

And importantly, this approach can work with other questions that we included in our survey.

Read more: The future of big data is quasi-unstructured

Which was quoted in this wired blog: The growing importance of natural language processing

This is the kind of thing described in the book, “The Secret Life of Pronouns.”

Postscript

Predicting GlobalGiving Project Report Topics

I extended this test by applying the ten topic dictionaries to a totally new set of narratives: 24,392 project reports on GlobalGiving from 2006-2013. All of these are about real project work, though the words people use are different. According to these topic dictionaries, the breakdown of topics among the GlobalGiving project reports is as follows:

Sum of top three assign topics:

{'knowledge': 24045,
'freedom': 33,
'respect': 19467,
'creativity': 181,
'physical needs': 18675,
'security': 24,
'family': 1035,
'fun': 9689,
'self-esteem': 10,
'food & shelter': 17}

Clearly, this method does not assign topics to updates in the same proportion that people assigned these topics to their stories. This could be because the narrative words are quite different for the subjects that are underrepresented. These scores are both a measure of how similar the language (words) are between reports and stories on a topic, as well as a measure of how many report contain these topics.

Organizations probably use very different language to describe security, freedom, self-esteem, and food-shelter projects on GlobalGiving from the way people talk about them in stories.

Knowledge (education) and physical needs are described similarly in both places.

Respect is overrepresented in project-speak. There is no corresponding project theme on GlobalGiving, although “women” and “children” projects are the largest category on the site.

Food & shelter is described in terms of disaster relief on GlobalGiving, but appears more in the context of poverty in stories.

Freedom in stories maps to human rights and democracy projects on GlobalGiving.

Coherence between story role and predicted story point of view based on pronoun use

In general, people use “I” and “me” in stories where they were affected or played an active part. And they use less personal pronouns in observer stories:


Fetched 39714 records, 1 fields, with 15281132 characters.
'Saw it happen','Heard about it happening'
third plural 39.1%
first plural 20.4%
third singular 17.5%
fourth 17.1%
first singular 5.9%
Fetched 13346 records, 1 fields, with 6136291 characters.
'Was affected by what happened'
first singular 29.3%
first plural 24.1%
third plural 22.9%
third singular 12.5%
fourth 11.2%
Fetched 7756 records, 1 fields, with 3468508 characters.
'Helped make it happen'
third plural 26.4%
first plural 23.7%
first singular 18.8%
third singular 17.3%
fourth 13.8%

“Fourth POV” is my short hand for when stories contain more organization words than pronouns. They are impersonal and lacking in details. More like press releases. But luckily, not too common overall.

This analysis continues elsewhere: It turns out, teller a story from a different point of view can make a project report more compelling, leading to more donations.

Follow

Get every new post delivered to your Inbox.

Join 933 other followers