Paul Pillar

Big Data, Public and Private

The collection and maintaining of huge files of information on our communications, our movements, our online searching, and much else about our individual lives is, as Laura Bate notes, hardly something that the National Security Agency or any other arm of government originated. By far the greater share of the assembling, and the exploitation, of storehouses of data about the activities of individual Americans occurs in the private sector. So why should there be so much fuss about what a government agency may be doing along this line, while there is equanimity about the much greater amount of such activity by non-government enterprises? Is there something intrinsic to government that ought to make us more worried about such data mining? Let us consider the possible bases for concluding that there might be.

Potentially the strongest such basis has to do with the presence or absence of a free market, and related to that, whether or not the activity of the individuals on which data are being collected is voluntary. When I use a search engine on the Internet I am voluntarily using a free service in return for being exposed to some advertising and allowing the operator of the search engine or my Internet service provider to collect, and exploit, data about my interests. Most interactions with government agencies and especially security agencies do not involve as much voluntarism. So maybe it is logical to be more persnickety for this reason about what government entities are doing.

That makes sense as far as it goes. But in practice the logic quickly runs up against the fallacy of equating the private sector with free markets and free will. If I want land-line telephone service at my home (and I very much do), I'm stuck with Verizon. I am forced to let Verizon collect comprehensive records of my calls—the “metadata” we've heard so much about. And of course, if someone at Verizon wanted to listen in on the substance of my calls that could be done as well, although it is a reputable company and I would be surprised if that were happening. The point is that there is much less free will and free choice in private sector data-generating activity than we might like to think, and in many cases little or no more free choice than when a government agency is involved.

This is true not just of local utility monopolies such as land-line telephone systems but to a large degree of other services in the Internet age. Some such services, including online access itself, have quickly transitioned from being seen as nifty innovations to being regarded as necessities. And again, free choice is often much less than we would like. This fact was recognized with the antitrust action against Microsoft, which was using its commanding position in operating systems to muscle into a bigger share of the market for browsers and other applications.

When there is enough market competition for users theoretically to vote with their feet—or with their fingers on the keyboard—if they are worried about what is being done with data collected on them, in practice any market correction mechanism would be very slow and clumsy. Imagine that a rogue employee at Google started using information about embarrassing web searches to ruin the reputations of particular people he was out to get. If that sort of abuse happened enough times, then perhaps significant numbers of users would abandon Google's wonderfully effective search engine in favor of Bing or something else, and Google would become less able to sell as much advertising as it does now. But the corrective process would be slow and awkward, and in the meantime a bunch of people would have their reputations ruined.

Another possible basis for distinguishing the amassing of data in the public and private sectors is to ask what controls or checks apply to each. Here there is indeed a big difference, and the difference is in the direction of there being far more controls and checks applied to government agencies than to private sector enterprises. For the security agencies there is the whole legal structure, dating back to the 1970s and strengthened since then, of restrictions and Congressional oversight. Nothing remotely resembling those sorts of external controls exists for data mining in the private sector. Then there are all the internal checks and controls, which as Bate mentions in the case of NSA are extensive. These include compartmentation of information—second nature to the security agencies, which use compartmentation to protect sensitive national security information even if there is no issue of the personal privacy of U.S. citizens. NSA senior management says publicly that only 22 people at their agency are able to query the telephone metadata that are of concern. How many people at Verizon can do something with the comprehensive record of my telephone calls? I don't have the faintest idea, and probably no one else outside Verizon does either.

Pages