Using BigML to dissect trends in 43,388 stories

BigML is a machine learning data analysis website that Rick Davies (author of the Most Significant Change Technique) recommended I try with our storytelling narratives and meta data. After all, we have the largest set of structured narratives in International Development (that I know of), so if anyone can apply machine learning to narratives and their story elements, we ought to be able to.

Machine Learning (from wikipedia) is process for recognizing complex patterns in data. From what I’ve seen, it is a lot like a principal components analysis in statistics. PCA, from wikipedia, transforms data so that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it be orthogonal to (i.e., uncorrelated with) the preceding components. In practice, PCA and BigML allow one to see which of the dozens of story characteristics are most strongly associated with success or failure of the community efforts they describe.

Strongest patterns in all of our stories:

Feelings predict the story outcome. Person who felt ‘horrible’ about their story shared something about a failed community effort, and those who shared  inspiring, important, or happy stories talked about success. Nothing shocking here.

Two other things stand out. Short and long stories describe different kinds of outcomes (success, failure, or mixed results).  How the person answered this first question on the form defines success and failure:

When filled in, many storytellers’ answers would look like this (image generated using SenseMaker(R) from

That’s pretty intuitive stuff. And if these three story elements didn’t hang together there would be something seriously wrong with our survey. But we do have a total of 87* story elements that BigML can split, so when these are combined with up to 7 other components of stories, you can detect some unexpected patterns. (*note: not all 87 elements are ‘orthogonal’, so this is not a true PCA matrix yet.)

Success, as BigML sees it

BigML displays the parsing of variables this way, with the darker, traced path explained to the right:

This pattern applies to 3.18% of all stories (n=1,379 of 43,388):

  1. Story made person feel happy
  2. Storyteller was 31-45 years old
  3. Story length was less than 242 characters
  4. Person saw (witnessed) events in the story
  5. Stories are all about success.

Here is another success pattern, explaining 2.45% of the data (1,063 stories):

What this says is that seven factors are shared by all these stories:

  1. The person saw what happened in the story
  2. The story was less than 400 characters long
  3. It was about a need 
  4. The person was 31-45 years old
  5. The story took place several years ago
  6. All of these were success stories.

There are many other patterns that represent smaller clusters of similar stories (but still hundreds of stories each). I’ve thumbnailed them here, along with brief titles I’ve given to each group:

  • 551 SUCCESS stories = Happy, organization-specific ,regional-in-scope
  • 724 SUCCESS stories = Hopeful, younger eye-witnesses
  • 675 SUCCESS stories = Inspired, teens (16-21) tell share longer “epic” stories that took places years ago
  • 663 SUCCESS stories = Important, teen (16-21) share regional stories they witnessed
  • 463 SUCCESS stories = Inspiring stories related to region, told by eye-witnesses, about solutions
  • 304 SUCCESS stories = Inspiring stories related to region, told by people whose lives were affected by the events, aged 21-30 that are NOT related to food and shelter

Click on a pattern to enlarge.

Two strongest failure patterns:

  1. Story told made the person feel horrible
  2. Story was a mix of need, problem, and solution
  3. Storyteller was under 30 years old
  4. Storyteller thought the story was relevant to family
  5. Story was NOT related to self-esteem. — this one surprised me, since I’ve read a lot of stories and never noticed something that wasn’t in the pattern. Computers are better for catching omission patterns like this one.

Other dominant failure pattern:

  1. Story makes person feel horrible
  2. Story is about a solution
  3. Storyteller is under 21
  4. Story is about social relations
  5. Story length is at least 148 characters

Both of these have only a small amount of the total data, but only 1 in 30 stories was negative to begin with.

Story length matters

In case you didn’t catch that, success stories tend to be shorter than failure stories or those with mixed, muddled outcomes. So I re-ran this whole data set with story length as the characteristic to be described. The one factor that best predicts short and long stories was how the person felt, followed by age:

Stories with mixed outcomes

Once again, how a person felt was the top-level factor, followed by something different: what type of entity was the main actor in the story (regional group, civic organization, family, geographic region, or tribal/ethnic group):

This story loyalty question was the fuzziest thing, and we could never phrase it quite right. What we’re after is an answer to “who or what is the community in your community effort story?” It’s tied up in identity, and how one perceives power structures, and who deserves credit for successes and blame for failures.

What this tells me is that we can’t abandon this question simply because it is hard to explain or ask clearly – because the poorly worded version of the question is one of the strongest components among the mixed-outcomes stories. Instead, we need to explore the concept deeper in the future, with multiple questions.

Another conclusion from this is that stories defined as regional or geographic are more prominent than stories dominated by family or ethnic group concerns. I don’t know what it means – but luckily we can drill down and read a bunch of these stories sharing these characteristics to learn what they are about.

You can search through stories yourself using the bubble-izer and search tools.



Next: [Part 5] Changing your point of view to tell a more compelling story

Go back: [Part 3] Scribes give feedback on the storytelling project

One thought on “Using BigML to dissect trends in 43,388 stories

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s