Federal Data Aggregation: A Bad Idea

by Jeffrey Barlow <barlowj@pacificu.edu>
Editor, Interface


Index:

.01 Introduction
.02 Origins of the Issue
.03 Purposes of Data Aggregation
.04 Creeping Revelations Erode Trust in Government
.05 Massive Data Aggregation is Unneccessary and Counterproductive
.06 Notes

.01 Introduction: (Return to Index)

Since late 2005 and continuing well into the present it has been revealed that the U.S. Federal government has been engaging in massive aggregation [1] of citizens' data. These data sets by and large have been created and used by private businesses or consist of materials hitherto not collected and believed to have been private.

These facts have set off a continuing discussion and no few legal actions. [2]After following this discussion closely for some months, we now wish to argue that we think the federal actions are ill advised. There are many currently available alternatives which are much less threatening to the status quo. In addition, they are unlikely to be effective, and the possible adverse outcomes far outweigh the potential positive ones. [3] We leave the discussion of potential adverse affects to a subsequent editorial, to be published in September and found at XXXX.

.02 Origins of the Issue: (Return to Index)

The origins of the federal initiatives, like so much else dealing with contemporary electronic issues, lie in the tragic events of 9/11. The Department of Homeland Security (DHS) proposed a number of initiatives, beginning with “Total Information Awareness” (TIA) in November of 2002. A 2004 Government Accounting Office report to the Subcommittee on Financial Management, the Budget, and International Security, Committee on Government Affairs, of the United States Senate, [4] made public the scope and number of these programs, both proposed and operative.

Privacy wonks and digital security experts noted these events with interest and often with alarm, but the issue became a spectacularly public one with the revelation in USA Today in May 2006 that the data banks had been expanded by federal requests for the telephone records of private citizens from the giant telecoms such as AT&T and BellSouth. [5]

Since those revelations, the issue has settled down to the usual grinding war of words and ideas between those who support and those who oppose continued federal intrusions into areas hitherto marked as “private” in American life. [6] Each new revelation of potential or actual misuse inflames the privacy guardians, and each indication of a terrorist threat or a success against terrorism heartens the intrusionists. [7]

.03 Purposes of Data Aggregation: (Return to Index)

The purposes of these operations, known variously as data aggregation or data mining, are, of course, given our much-heightened concerns about security, praiseworthy ones. The concept comes out of the long history of intelligence gathering and simply holds that the more data that can be aggregated in one place and thus analyzed for its inter-connections, the more that can be learned, and the more likely that potential threats can be neutralized.

The total scope of the U.S. federal government's actions was breathtaking. Some supporters proposed nothing less than than the centralized aggregation of ALL digital data taken in real time from the data banks of all transportation, communications, financial transactions---from so many sources that to continue listing them would go on virtually indefinitely.

The purpose of these programs is that once sufficient data were aggregated and analyzed, terrorist plots could be pre-empted. The assumption is that the data would reveal interconnections between the actors and their preparations, such as purchasing tickets, communicating with each other and with other known terrorists, etc.

.04 Creeping Revelations Erode Trust in Government (Return to Index)

I do not propose here to take a side in the privacy vs. intrusion debate, because it seems to me to be so tied to one's expectations, and the balance of fear vs. idealism in the minds of individuals. However, it is appropriate to point out the less than full disclosure that the government, and particularly the executive branch, has offered us, the citizens of a democracy, so far. These creeping disclosures justifiably erode trust in government.

Each revelation has usually been denied initially, then a series of grudging admissions of the increasingly larger scope of these operations gets underway. [8]These too often climax with negative characterizations of the doubters, such as when Senator Trent Lott asked Democrats in Congress who were requesting a hearing on the NSA programs, “ "What are people worried about? What is the problem?" "Are you doing something you're not supposed to?" [9]

Moreover, the admissions of the executive branch are often perhaps literally true, but seemingly aimed at a naïve public audience which fails to understand the implications of some of the revelations. As example, I would cite the initial protestations from the executive branch that the National Security Administration, while it was indeed collecting phone records, was not really “listening” to phone calls or indexing them to individuals. [10] One has to ask how in the world supposed connections can possibly be explored if no records are kept at some point of actual conversations. Certainly, if the records are not a trigger for an actual recording process at some point, then they are of no conceivable utility.

A related issue, of course, is federal requests for records of web searches. This data, too, which was provided by some Internet Service Providers, was supposedly stripped of data that could lead back to the searcher. Recently, researchers followed the data trail back to one such AOL subscriber with relative ease. [11]

.05 Massive Data Aggregation is Unnecessary and Counter-productive. (Return to Index)

There are other process-related reasons for doubting the efficacy of such gargantuan efforts. The root process for which this data is supposedly accumulated is “social network analysis.” Calls and communications, as well as observed physical meetings, enable an analyst to define not only relationships between “nodes”---i.e. individuals---in the social network but also degrees of closeness. The most noted post-hoc analysis of the 9/11 hijackers by Valdis Krebs clearly revealed not only the group involved but also its leader, Mohammed Atta. [12]

Krebs, however, has repeatedly pointed out that simple police work followed up by legal processes such as warrants for wire-taps easily accomplishes the same thing. In fact, alternative methods are more effective than massive accumulations of data. These give no real entry point for analysis, if they do not follow upon painstaking police investigations. Krebs goes so far as to argue that increasing the body of data to be analyzed makes useful outcomes Iess likely, not more likely. [13]

Conclusion:

Here we examine the case for centralized federal collection and aggregation of vast bodies of digital data. We argue that these are unnecessary and counter productive. Later, in a subsequent editorial, we argue that such processes in fact expose us to many more negative outcomes than they can possibly provide positive ones. See it at: (The Next Issue of Interface)

.06 Notes. (Return to Index)

[1] By aggregation we mean simply that data collections which earlier existed as proprietary collections belonging to individual agencies or corporations and independently held (“Siloed”) are now being pulled into a larger collection controlled by the federal government. The ultimate purpose is to combine these siloes into one large aggregate which can be simultaneously analyzed. The federal government has necessarily been reticent to discuss its means and ends in this process, and lacking specifics various news reports have often featured worst-case scenarios.

[2] For a good analysis of the issues around U.S. federal government data-mining, and a very useful time line of events discussed here, see Mark Clayton, “US plans massive data sweep” Little-known data-collection system could troll news, blogs, even e-mails. Will it go too far?” The Christian Science Monitor, Feb 9, 2006. http://www.csmonitor.com/2006/0209/p01s02-uspo.html

[3] In preparing this piece I have been informed by presentations I attended at the 2006 conference held in Victoria , British Columbia , the 7th Annual Security and Privacy Conference, “What Can You Trust? Privacy and Security is Everyone's Responsibility” http://www.mser.gov.bc.ca/privacyaccess/Conferences/Feb2006/ConfIndex.htm I am indebted to the conference organizers for their kind invitation to me to present at the conference, and to the many other participants who much increased my understanding of the issues presented here.

[4] For the 2004 Government Accounting Office report, see: www.gao.gov/highlights/d04548high.pdf

[5] See http://www.usatoday.com/news/washington/2006-05-10-nsa_x.htm for an updated version of the original story.

[6] See an interesting series of citizen responses to the query “ Should the NSA look at phone records?” on CNN.com at: http://www.cnn.com/2006/US/05/12/feedback.phone.records/index.html

[7] As an example here, it is impossible not to cite the August 2006. For a recent news analysis discussing the politicization of the issue and nicely highlighting the two poles of the conflict, see:

By Maura Reynolds “GOP Sees Strategic Advantage in Court Loss on Wiretapping” Los Angeles Times. Reposted at Topix.net: http://www.topix.net/content/trb/3174166173118920908517687724210846036088

[8] For a record of some of the switches in position taken by the executive branch see: Dan Eggen and Walter Pincus , For the Record.

“ Varied Rationales Muddle Issue of NSA Eavesdropping” Washingtonpost.com at: http://www.washingtonpost.com/wp-dyn/content/article/2006/01/26/AR2006012601990.html

[9] At: http://www.cnn.com/2006/POLITICS/05/11/nsa.phonerecords/index.html see background for this oft-repeated remark. At this site you can also view video clips of some of the exchanges over these issues.

[10] If you need to brush up on this controversy, you might start with a CNN piece “ Bush defends NSA spying program” found at: http://www.cnn.com/2006/POLITICS/01/01/nsa.spying/

[11] At http://www.nytimes.com/2006/08/09/technology/09aol.html?ex=1156046400&en=c06ff6a2c7708bb4&ei=5070 see MICHAEL BARBARO and TOM ZELLER “A Face Is Exposed for AOL Searcher No. 4417749”

[12] For real-world example of tracking relational networks via data mining see: Valdis Krebs “Connecting the Dots -- Tracking Two Identified Terrorists” at: http://www.orgnet.com/tnet.html See also Valdis E. Krebs, “Uncloaking Terrorist Networks at: http://www.firstmonday.org/issues/issue7_4/krebs/

[13] For a good discussion of these issues, including Kreb's perspective, see: “ NSA Sweep "Waste of Time," Analyst Says” at: http://www.defensetech.org/archives/002399.html