Reframing the problems of big data:
When the implications of big data are debated the problem most often presented is that of data surplus. Questions of what applications produce what data, or what companies have access to it, all rest on the assumption that a great deal of data is available for analysis.
These questions are not insignificant, but it’s worth considering the reverse condition: the absence of data. For big data to make a difference in people’s lives, data needs to exist. For the poor at the base of the pyramid, this is rarely the case. As a result, they risk being subject to what I call “data poverty.”
According to Viktor Mayer-Schönberger, and Kenneth Cukier, co-authors of “Big Data: A Revolution that Will Transform the Way We Work, Live, and Think”, big data “refers to things one can do at a large scale that cannot be done at a smaller one”, and is characterized by:
- The ability to get a whole set rather than relying on a supposedly representative sample;
- A preference for more, “messy data” over limited clean data; and
- A shift away from knowing exactly why something happens and an increased focus on knowing what has happened.
These changes can help unravel otherwise intractable problems. In “Big Data”, the authors give the example of the New York City’s crusade against illegal conversions. Initially, the inspectors acted only in response to complaints, conducting many inspections but rarely honing in on the buildings that needed to be closed. In response, they took a big data approach, drawing not only on complaints but also on sources from construction permits to geo-tagging in order to identify buildings likely to have the most serious problems. Their success rate in finding housing poor enough to merit evacuation increased from 13 to 70 percent.
Now imagine inspectors in Dhaka who want to take the same approach. Where would they get the data? Buildings are often not registered, or built in illegal locations. There may not be a complaints line, or if there is, complaints are almost certainly not logged regularly. Construction permits wouldn’t work either. They may not even have access to data software. As a result, it is likely that the Dhaka building closure rate would remain very low.
Big data relies on large amounts of data from a large number of sources. However, for many of the poor, there is no data at all.
The poorest often are not registered at birth, marriage, or death. The data available on their lives is often limited, outdated, and of dubious quality. This, combined with governance and market failures, makes bad situations worse. As big data becomes the go-to tool for everyone from policy makers to business leaders, this absence risks widening the gap between the knowledge, services, and opportunities available for the data rich and the data poor.
There are three ways that the challenge of data poverty can be overcome.
- Create more forms of data: In the case of those who do not leave a heavy data footprint, it will be particularly important to find or create innovative ways to reuse what data does exist. One example of this, albeit at a small scale, is that of micro-finance institutions which use mobile purchases as evidence to establish a credit history.
- Encourage “data donations”: In many emerging economies, the private sector, not the public sector, may hold the most data. To unleash it, companies should identify what data can be openly shared and distributed.
- Gather more data: Big data relies on data from a number of sources, but solid numbers for populations and other key statistics could vastly expand the number of available applications. Censuses aren’t normally accorded high developmental impact, but perhaps in the future they will be.
Big data will quite literally transform the way we work, make decisions, and what services are available to us. It may be a threat, but it is also an important tool. We need to bring this tool to the base of the pyramid and protect against data poverty.