NSA Data Mining: Three Points to Remember

What's still missing from the debate about domestic surveillance.

On Thursday, the Washington Post printed yet another above-the-fold headline pulled from leaked National Security Administration documents. Using location information from cell phones, the NSA has reportedly been collecting nearly 5 billion records a day totaling some 27 terabytes of data, by one account. The data can then be analyzed to flag unknown individuals traveling with known targets. As in previous cases, the NSA has asserted that their programs are not used to monitor Americans, but some incidental collection happens.

Certainly, the scope of the project is considerable, as is the audacity of the NSA in undertaking it. However, that the NSA has the technical means to conduct such a program should hardly come as a surprise. In fact, the private sector employs technologies that are potentially much more intrusive. Given that, one takeaway from this most recent revelation is that it is just one instance of a larger pattern of action by the NSA, which means there will almost certainly be more data-mining programs after it. As long as the technology exists to develop new surveillance programs, national security practitioners will likely employ them to some extent.

This latest revelation provides an excellent opportunity to think critically about the use of big data in national security. Below are three points to keep in mind as the debate unfolds:

1. Your movements are being watched—and that is not news.

The state of the art in extracting novel information from massive data sets has moved far beyond merely identifying associates by their concurrent locations, as the NSA reportedly did. Private companies have been using location services to sell advertising, to learn about users' movements, and to otherwise turn a profit in ways that makes the NSA's activities seem markedly less impressive. For the sake of demonstration, iOS7, the new iPhone operating system, has dramatically ramped up Apple's location services and recently started informing users of the expected duration of their commute to and from work. The impressive part: users do not need to input their home or work address; the phone will learn to identify home, office, and other favorite locations, all of which are viewable on a map that shows how often the phone is in each place. Location data is used to enhance the user experience, but also to automatically crowd-source information like traffic patterns and wi-fi hotspots.

Regardless of the sector, information on user location is becoming an increasingly popular and powerful tool, but that fact passes below the radar of many users. Cell phone monitoring is just one of many mechanisms available for obtaining troves of data on individuals' locations. E-ZPass cards, RFID tags affixed to a car's windshield to automatically pay tolls, are being surreptitiously scanned in many more places than toll plazas in New York City. According to New York Department of Transportation officials, the tags are used to monitor traffic information. While the purpose of the program is ostensibly not for law enforcement, the unannounced program is sparse on details, and users are not informed that the passes will be used to track location and travel data.

If a standard iPhone can extrapolate a user's daily work habits using the location information that it collects by default and an E-ZPass card can be shanghaied into service to monitor the location of individual vehicles, it is easy to imagine how location data could be used to generate any number of inferences about users. In fact, location-based profiling was patented back in 2002. The private sector has been mining location data for north of a decade. It is no surprise that the government also possesses that capacity.

2. The NSA is probably doing a pretty good job of self-monitoring, but oversight of data mining programs is still a really tricky problem.

Pages