Four weeks ago I attended a stunning talk from Jean Ensminger hosted by the Center for Global Development. She presented her findings about a vast network of corruption within the Arid Lands project of northern Kenya. One of her approaches was to compare sets of numbers in documents to a Benford’s distribution. It works because the leading digits in any batch of numbers that count real objects or expenses will always form a logarithmic pattern, but when people try to make up random-looking numbers, they turn out to be very non-random.
Her talk stunned and inspired me – stunned because of the scope of what her team uncovered, and inspired because I realized that one of the methods was so simple even a computer could do it. So the next day I built that tool and blogged about it.
I called it a heuristic auditor. It looks for patterns in documents the way anti-virus programs detect new malicious code. And like Sesame Street, it operates on the “which of these things is not like the other” principle. If one uploads a document and it resembles a batch of known, legitimate documents, it passes. It’s as easy as that.
This is not a forensic auditing tool. Forensics is about determining the exact cause of death for one specimen. In contrast, a heuristic autopsy would tell you the probable cause of death, but not always the right one. But heuristics is a form of high-throughput analysis. All deaths are declared by someone using a heuristic because doing an autopsy on every patient would waste time and money. Coroners only perform autopsies when the signals are unclear. This distinction matters both to how one should interpret the results and also in explaining the time and cost-saving potential of inspecting all documents heuristically (for free) versus inspecting a tiny fraction of them forensically (as it is currently done).
The response to this tool was very positive. Sam Lee from the World Bank invited me to lead a team at their next DataKind Hackathon event where we made it both simpler and more sophisticated – simple enough that anyone could upload a document and understand whether it passed or failed, even if he/she barely understood English. It also grew more sophisticated because it analyzes all the words and phrases in the document in addition to the numbers. Soon it will also look at dates and show the user where they cluster along a time line. The results use images, placing your document on a “reference” bell curve, and changing the dot to red or green depending on whether it was a good or a bad sign.
I believe that the victims of fraud have the strongest incentive to report it. And yet, most victims in the developing world are poor, under-educated, and disconnected from power. While it would be nice for those with technical skills to be looking out for them, I believe it is far more practical to transform fraud detection tools into something any person can use to look out for himself or herself. Some of the poor still have access to documents, but until now they had to convince a politician, journalist, or organization worker to take their allegation seriously. That job just got a whole lot easier with this simple tool.
This could transform the victims of fraud into agents of change. Here are some use cases:
A journalist in Zimbabwe gets handed a CD with 500 finanical documents that allegedly show fraud. His article deadline is 36 hours away. Where does he start? Instead of sifting through them for hours, he can scan a sample of them and get a picture of how sketchy they are in minutes. He therefore knows whether this lead is credible, and therefore how best to spend his remaining time. He also has data from which he can begin a conversation with some “expert” economist who typically provides color commentary in articles – but this time the question is pointed, specific, about how that person interprets the data in this specific case.
A grandmother in Kenya pays school fees for two granddaughters, matched 50:50 by a local organization. One month she has realized the organization stopped paying. After getting the run around from the organization, she suspects misconduct and gets her nephew to access the internet and find the organization’s documents on their website. Pulling the budgets, she realizes that they fail the test. She can now launch allegations with journalists or funders that have credibility.
A policeman in Afghanistan believes his boss is siphening off his paycheck and those of many other patrolmen. He “borrows” a sample of financial records the boss has signed off on and runs them to realize that according to Benford’s law, specific digits in numbers are being doctored.
Read the full story: Thanks for the Raise!
A NGO program manager has been receiving fishy reports from another country post for months. She knows they’re remote and haven’t been visited in a year. To justify a surprise inspection, she runs all of this post’s reports through the tool to justify the expense and need for secrecy to her boss.
A census team suspects that a few of their surveyors are not going door to door, but merely sitting in a Starbucks sipping lates and making up numbers. They run all their reports through this check and archive results as soon as each form is uploaded using the tool’s API. As a result, a few of their surveyors stand out as producing consistently bad data, and are terminated.
An NGO is about to submit their report, but they run all their receipts and budget proposals through the system just to be sure it passes, because they know the funder will also be doing the same thing. This form of “defensive self-auditing” could become a standard behavior when both sides of a financial relationship know the other one is going to auto-check numbers heuristically.
A funding organization background checks thousands of potential partner grantee organizations’ budgets each year by hand. But using this tool, they can immediately predict the likeliness that each uploaded batch of financial documents will pass muster as soon as it hits the server – before a human even reads them. Where documents are incomplete or simply lacking in the kinds of detail found in hundreds of other approved budgets, the system rejects the document and sends instant feedback email to the partner requesting them to provide a more detailed budget. This not only reduces turnaround time, it also reduces staff time spent on each application, and increases the rate that organizations learn.
This last example is why GlobalGiving – where I work – is going to benefit from this idea. While it is just a small part of the larger and more sophisticated “heuristic due diligence” process I helped develop there, it is exactly the sort innovation that in the aggregate helped this non-profit achieve a 100% cost-recovery-from-services model in just 10 years.
The business case
(Thanks to Dominick and Dennis from my DataKind team for this part):
- Turns the victims of fraud into agents of change
- Contains fraud at the source
- Cheap, easy, scalable, automatable process
- Easier to analyze unfiltered data (not the aggregated reports that get sent to the central office).
- Incredibly simple to use
- Write a simple tutorial, with examples of where it is appropriate to use this tool. (Some guidelines found here)
- Gather other large “reference” data sets for other financial document types, including receipts, invoices, and contract bids.
- Engage the end user and gather feedback on how to improve the tool and the mechanisms for reporting corruption.
However, this tool doesn’t solve the incentive problem.
With any innovation there are four criteria that determine whether the masses adopt or ignore a new tool, idea, process, or technology. If a person answers YES to these four questions:
- “I care about this.” – relevance
- “It is easy to use.” – simplicity
- “I believe it will change things.” – agency
- “I feel like I’m being heard now.” – democracy
It gets adopted. Even three out of four is good. Victims of fraud care about fraud, may find this tool easy to use, and believe that their action will change things. Even if things don’t change quickly – they will feel like they are being heard if there are obvious mechanisms for where to send the feedback (i.e. IPaidABribe.com)
I can’t improve environment in which victims of fraud find themselves, but I hope this gives them “agency.” We must ask ourselves, who listens to the victims of fraud? And who acts on allegations?
If the answers are unclear to us, you can be sure they are unclear to the victims as well.
This story: A completed feedback loop in 30 days
I am quite satisfied that within a month of the talk at CGD, there is something tangible that has changed. The CGD talk raised questions about whether our current system has the ability to keep itself honest and catch fraud. It inspired actions that produced a tool that would have given the very victims of fraud in that system – village leaders alarmed at the unfair distribution of goats and cash within the village – the power to detect, inspect, and correct it (by raising the alarm with journalists and government leaders who would not ignore allegations backed by evidence). This does not solve the incentive problem (i.e. these leaders could gain more by ignoring the problem than they could by reporting it), but it does give the power to good people who are driven by a morality that yields greater riches than wealth itself.
Try it out! http://djotjog.com/audit - heuristic auditing tool
Related post: The Weekend I audited the World
Related video from the DataKind Hackathon: