My buddy Nick and his team just finished their masters in data science. For their final project, they had to do something amazing with an open data set. The choose to fix the thousands of public finance reports curated by the the Intenational Aid Transparency Initiative, or IATI.
IATI is the first (and only) standard for how organizations and governments are supposed to publish information about how they spend their money, who it goes to, and what it’s for. Based on my experience, IATI data is mostly garbage. Government agencies like USAID, DFID, and the World Bank eventually get list of transactions out about years after it happens, and there is often no text description of the money’s intended purpose. It’s more like making sense of your monthly budget by reading your bank balance, but in this obscure XML format:
As of today, IATI’s own website says a mere 472 organizations (out of four million worldwide) have registered their published data sets (About 0.01% global coverage).
If you wanted to take all this data and add up the money spend on HIV drugs, for example, or see all the relevant actors to fighting child abuse in East Africa – it would take you days of work. And after you did all that work, how would you know whether the answer was any good?
IATI data sets are incomplete, poorly annotated, and rarely interconnected.
There’s also a lot of this in the “official” records:
That’s the page where they tell you “key considerations” in “getting started” in using IATI data. Embarrassing!
Nick and his pals scoured the Internet and found thousands of IATI documents, then did amazing things to improve this data’s usefulness.
They gobbled up all the data everywhere and got it into a single database (mongoDB) with a slick search engine, hosted at aidsight.org.
Then the explored the data using algorithms, machine learning, and their brains – to “generate features” that would allow them to connect all the data together.
They devised heuristics fill in the missing pieces of data sets. For example, if two organizations mention working together, and one describes the work, and both seem to be funded by the same government agency, that one description of the work is a reasonable approximation to the other organizations work.
They validated the world’s data. Then they applied a battery of tests to every organization’s data and gave them a letter score (A,B,C,D,F) on the quality.
This required a diverse set of skills and tools. That’s why it would be essentially impossible for any government to muster the will to do what four people with full time jobs did on nights and weekends over three months. I myself have only used have of these in my last 6 years working as a science-tech person.
Radical simplicity always requires a lot of complexity underneath it. The search engine lets you name an organization, or place, or type of aid work, or some combination and it quickly builds a network map of everyone who matches:
You can click on any of those dots and see a report on the quality (in terms of completeness, compliance (with the IATI formatting standard), and utility (meaning it contains the useful types of information).
In just 60 seconds, I was able to find all the HIV/AIDS related funding data for Africa and decide that this CDC data set is not worth downloading. It is missing everything except for project titles, dollar amounts, currency, and language (english). But I’ll never know where it went (besides Africa) or when it was spent (no dates). There’s no documentation, no results, and no budget data here – so the context is missing.
Keystone Accountability (where I am the Chief Innovator) has been training organizations for a long time on how to carry out successful work by getting multiple perspectives, listening to the people served, and by looking at reference data when deciding whether to act. Aidsight’s IATI report card gives us a badly needed benchmark score on all organizations. It also was able to identify some 30,000 organizations (implicit in data from the 2000 that reported) – giving us a much fuller view of International Development and “Official Development Assistance.” The average score is a C-. We finally have definitive evidence that they have been making a rather pathetic effort to publish useful data about the way the world spends over $200 billion each year to help the poorest of the poor.
The benchmarks also give useful, specific feedback to every organization on what it can do to improve.
Here the organization named Hivos has a pretty good score, but there are specific things it omits. And in the network map (not shown) it doesn’t work with any of the other 94 named organizations that address the problem of “child abuse” in Uganda. Really? We ought to be more aware of what others are doing, because Keystone Accountability’s INGO partnership survey shows that when organizations do work together, they accomplish more of what the people we aim to serve ask of us.
This is also a wonderful example of how combining data sets can power up what they can be used for. Taken individually, IATI data sets reveal very little about the problems and solutions in the world. But collectively, they quickly expose the bad actors that do as little as possible to share vital information that could help us fight poverty, disease, and abuse. Collectively, the data set is powerful enough to create new data (data scientists call this “imputing” data) from patterns in existing data. Aidsight provides a superior structure to an outdated XML data exchange format.
It’s an inspiration for the work I’ve been doing at FeedbackCommons.org (Keystone Accountability’s feedback loops manager) to make merging surveys from many NGOs into a common data set as easy as clicking a button.
Thanks to the AidSight team that did this work
Glenn “Ted” Dunmire