Mega reports without the jargon?

Vision by commitee

I harbor this notion that if you could take a giant, complicated 160 page pile of committee-approved text and filter out all the jargon, what you would be left with would be far more illuminating on what the leaders really know and don’t know.

Here I’ve applied this idea to two big fat documents that are supposed to be guiding the thinking and spending of billions (or trillions) of dollars:

  1. The Millennium Development Goals progress report of 2010 (revision #14, 80 pages long) and the
  2.  Kenyan Vision 2030 Statement (complete version, 160 pages).

Using python to strip out all the text from these PDFs (pyPdf module), I then compared the text in these against thousands of community stories from across Kenya and Uganda. I assume that if people don’t use a word in common speech, then that word is jargon and can be excluded from the overall picture of the text.

Here is what you see:

MDG 2010 report jargon:

Kenya Vision 2030 report jargon:

In both above examples, I built sets of words from the report and generated wordles that only contain words unique to the report, and not found across thousands of community stories in Kenya and Uganda. Some of these “words” are junk, artifacts of the converted PDF headers, but much of it is jargon.

The other half of this parsed data is the set of words that are emphasized in community stories and largely absent from government reports:

Stories, excluding MDG words:

Stories, excluding Vision 2030 words:

What are the patterns?

  • MDG report emphasizes comparative statistics at a continent-to-content level. Each world region is compared with each other, and the trends visualized in a lot of graphs.
  • MDG report avoids assigning root causes to global problems, and de-emphasizes prescribing what should be done in each case, unless there is strong global consensus. (The few “safe” examples include saying that biodiversity is a growing problem, and global climate change is a problem caused by humans.) Poverty is a problem, but the report doesn’t provide any insights to what to do, where, how much, and why.
  • Vision 2030 really emphasizes the role of women, even though women is a major theme in both stories and the Kenyan report. Much of the rest of the extracted words are government, specific jargon: inequalities, subsector, disparities, per capita, interventions, elasticity, evaluation, flagship, etc.
  • Stories emphasize the role of people and how individuals either help or were helped by others. The word organization/organisation is something absent from Vision 2030, but common to stories about help. Community is also absent from big reports, but present to some extent in stories.

This is a follow-up on a previous attempt at wordle-izing MDGs and development jargon. The big improvement with this method is that word set comparisons are now adjusted for different in overall length of documents automatically when the program runs. So true frequencies are compared here, not just the raw word usage counts.


4 thoughts on “Mega reports without the jargon?

  1. I forgot to add one more sorta OBVIOUS difference – there’s a WHOLE LOT MORE WORD density in the word cloud of words unique to the MDG 2010 report. Hence, MDG report has a lot more jargon in general, even compared to Kenya’s Vision 2030 report.

    Both reports contain a lot more technical jargon than do the stories contain slang and non-standard english. I find this surprising, and a little bit suspicious. Probably the absence of slang in stories is because I excluded words used less than twice from the millions of words in the story set.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s