Mapping advocacy networks

If you are an advocacy organization, wouldn’t it be nice if you could tell who among your network of contacts is taking up your message and effectively spreading it as your champions? Earlier this year, I did a consulting project for Open Contracting Partnerships (full story here) to help them measure the growth of their advocacy, using office email and twitter as proxies for measuring impact.

This explains how I did arrived at a colorized annotated network map and klout scores for the OCP’s key influencers.

Example of OCP’s social network map, colored by type of advocacy work, based on 2018 twitter data.

The approach, in brief:

Given a bunch of emails, I pulled out the top 300 people that the organization had talked to the most, over the last 6 months.
I had their team categorize each person as to which community they belonged to (out of a dozen different ones they track).
I then mined Twitter to build a social network map of everyone connected the organization, and related organizations (explained below).
I then found those 300 known people on the map and used them to categorize the rest: Any person who was directly connected to one of these people (through @mentions in recent tweets) was assumed to be in the same community. Ties between two or more people were labeled as “mixed.”
This revealed the overall map, segmented by different communities – shown above.
Then – (this step took a lot longer) – I mined twitter for ~~everyone~~ the top 5000 people in the map to generate klout scores. Klout is a measure of a person’s influence. Scores range from 1 to 100. In this case, a person’s klout score was specific to that community. A person could have great klout with “open data” folks and no klout with “open government” leaders. This is where the work started to yield specific information about leaders for each type of advocacy work, and provide OCP with targets to focus their collaboration.

Deriving the Network Map

Gathering Emails: For gathering all your email together to be a data source for this type of analysis, I used a tool I published in 2016: http://zendoscope.djotjog.com/. This produced an organization-wide record of over 100,000 emails to mine.
Mining twitter: After a number of failed starts, I eventually wrote an efficient algorithm for pulling down all the twitter accounts that are influential for the sphere of chatter around a particular screen name (@opencontracting). This algorithm is nice because it is short and recursive and can run for over a dozens iterations of discovery. Most algorithms stop after they run once, unless you teach it how to read their own output as their input – like this one does (See http://zendoscope.djotjog.com/twitter). This tool also avoids using parts of twitter’s API that are severely rate-limited.
Note: users are only connected when they @mention each other in tweets. Those tweets containing one of a variety of words that are relevant to OCP are prioritized for following through the network.
Using this, I scanned about 44000 users to find the core of OCP’s network.
Generating a map: I generated these using a popular library called networkx and uploaded to my server so others could interact with it. The script looks at the network and decides who to include/exclude from the total map, based on what a human can read on a screen — typically 250 to 500 names per map. The less connected people are excluded.
Categorizing people: I wanted to just map email contacts into twitter contacts directly, but this wasn’t possible. I used a variety of heuristic tricks to extract whatever names were available in email (including the python’s nameparser.HumanName library to tell if a string of characters looks like a name or not) from the “TO” lines of emails, such as `”Bob Jones” <bjones23@hotmail.com>` but only 330 of 44,246 twitter names matched OCP’s email names. Most twitter screen names don’t have real names listed for them.
Instead, I gave OCP’s staff a list of the core twitter names that were also known to them (because matched in email) and asked their team to assign each one to a category. I used this starter list to categorize over half of the network (of tens of thousands of people ) based off of a small number (~300) of influencers known to OCP.
Fun stuff! Generate a corpus to define each category: Given tons of people and tweets, it was rather easy to use natural language processing to create a dictionary, or corpus, of the words each community uses on twitter to speak to itself. I pulled down recent content from the 986 most influential twitter users out of the 23,277 names in OCP’s full twitter network — about 36MB of text across 7 categories. I excluded non-English tweets and ran them through my wordtree algorithm (feedbackcommons.org/wordtree) — a convenient way to extract a dictionary of keywords and key phrases for a given set of documents, and the relative weights for each word or phrase. The weights are a measure of how central the word/phrase is to the collection as a whole, and can be visualized as a network map of words with the most commonly shared words in the center. Examples are here: https://chewychunks.wordpress.com/2012/08/03/visual-aid-for-the-story-based-program-evaluation-method/ and here: https://medium.com/@marcmaxmeister/evolution-of-the-metoo-conversation-on-twitter-63731634bce. The API provides both a visual summary and a machine readable JSON dictionary of keywords and weights and how the words map to the rest of the dictionary as a whole. Because there was so much data, my script chunked the content into smaller pieces, maped each part, then merged all the resulting mappings back together for each of the categories. Scores were normalized on a 0 to 100 scale.
Psuedo-klout: I used the category corpus to assign a score to each user, and then sorted all users by a combination of weighted factors to determine the key influencers within a particular content, such as “open data” or “anticoruption.” Conveniently, this relevance-adjusted klout score filters out most celebrities from the rankings. (Celebs would otherwise appear influential because of the number of followers and retweets they get, despite rarely talking about a relevant issue.)
Summarize each community’s narrative: Finally, I contrasted with language each community uses with the others to generate a list or keywords that uniquely define that context. As a non-manager, non-activist, I find this result the most interesting: It tells you how people talk about the thing that they all agree to work together towards. It answers the question, “What does “open data” mean to open data advocates?”

Community and number of words in lexicon	Keywords or phrases that embody conversations in that community (relative weights)
open data (1113)	{u’european commission’: 12.4, u’trust’: 21.8, u’story’: 37.6, u’datos abiertos’: 33.5, u’fuel’: 21.8, u’fake news’: 15.9, u’open contracting’: 22.9, u’hub’: 14.1, u’public procurement’: 61.2, u’open data’: 47.6, u’open government’: 27.6, u’institutions’: 16.5}

open government (726)	{u’united states’: 15.2, u’story’: 19.7, u’capacity building’: 20.1, u’civil’: 14.9, u’climate change’: 19.4, u’cooperative’: 23.2}

anticorruption (257)	{u’impact’: 22.5, u’social media’: 22.5, u’south africa’: 13.2, u’energy’: 13.2, u’informacion’: 10.6, u’challenges’: 86.1, u’information’: 60.3, u’development’: 11.3, u’open data’: 100.0, u’sustainable’: 13.9, u’open government’: 21.9, u’public’: 15.9}

As a control experiment, I also mined twitter networks for the following: Feedback labs, Power2Change, IAF, UUA, and #metoo and calculated a number – called Network average connectivity. This is the number of people who share connections with other people within a defined network. Interconnected networks have a range from 8 to 16:

#metoo – 2.3
Open Contracting – 10.7
FeedbackLabs – 10.7
IAF (Inter American Foundation) – 13.6
Power2Change – 14.5
UUA (Unitarian Universalist Association) – 25.6

Oddly, FeedbackLabs’ and OpenContracting’s respective twitter networks have the same connectivity, but FeedbackLabs is at the edge, and OpenContracting is towards the center. This implies that OCP has done more to seed the conversation and has in turn been retweeted and mentioned by more of the community.

At the extreme, UUA is the absolute center of its network; every user that mentions @UUA seems to gravitate around it, and it is a bit isolated. #UUA is so central, with so few other actors around it, that it skews the whole network. There’s an absence of users who mention @UUA and also mention any other well-connected accounts.

Power2Change and IAFgrassroots are more typical network maps and have similar scores of 14.5 and 13.6, respectively. Both are on the map and well connected but not at the center of the network. Both feature a lot of Funders in the map, and act like Funders themselves. If ever you needed proof that Funders are a clique, these maps provide it.

#metoo was a special case, as it was based on a hashtag, and not a screen name. It is huge and largely disconnected. Any person who mentions #metoo is only connected to 2 other people who mention #metoo, on average. #metoo is not a community of people interacting, like the other networks are.

If you’d like to know more, you can contact me. And I’m a consultant; I can certainly be paid to run this analysis for your organization too, if you need it.