The use of data mining reportedly helped unmask
a terrorist leader months before 9/11, but there are concerns about
coordination and privacy
By DAVID MUNNS, Assistant Editor
Recent reports by The New York Times and Fox News
that the Pentagon identified 9/11 ringleader Mohammed Atta as part
of a U.S.-based terrorist cell months prior to the attacks on Washington
and New York have sparked new interest — and controversy — about
the Defense Department’s relatively nascent abilities to assess
huge volumes of data for patterns of behavior that are indicative
of terrorists and their activities.
According to press reports, Atta was identified
in early 2000 by several military officers, including Navy Capt.
Scott J. Phillpott, who managed a Pentagon program called “Able
Danger” that employed an analytical process called “data
mining.” The process allows intelligence analysts armed with
specially designed software to aggregate multiple data sources, such
as lists of terrorists and decades of reporting by the Associated
Press, and search for specific patterns of behavior, anomalies and
relationships. The findings become the basis for refined analyses
by intelligence specialists.
The New York Times reported in August that Defense
Department lawyers forced three meetings to be canceled where military
officials involved with “Able Danger” were to report
Atta’s name to the FBI after the program identified him. These
claims have not been confirmed by the Pentagon.
U.S. Rep. Curt Weldon, R-Pa., who arranged a meeting
between the news agencies and Phillpott, released a statement in
late August describing the program’s objective as “to
identify and target al Qaeda on a global basis, and, through the
use of cutting-edge technology … to manipulate, degrade or
destroy the global al Qaeda infrastructure.”
After the public speculation about “Able Danger,” the
9/11 Commission stated Aug. 12 that it had learned about the program
in October 2003. Initial informants did not mention Atta or any other
future highjackers. In July 2004, a different informant knowledgeable
about “Able Danger” told the Commission he had seen Atta’s
name and photo in another analyst’s notes. However, this informant
was not able to substantiate that assertion to the satisfaction of
the Commission, and “Able Danger” was not mentioned in
the Commission’s final report.
The alleged identification of Atta has attracted
high-profile attention to the potential of data mining technologies
and processes as intelligence tools. However, the usage and processes
of data mining remain relatively immature in the military arena.
One official told Seapower that coordination of
data-mining efforts and requirements between federal agencies should
be much improved. Also, implementation and oversight issues remain
a key challenge in balancing the use of data-mining tools with privacy
concerns.
Data mining is not new. Industry has reaped benefits
from it in sectors such as health care, insurance and banking. But
the lack of coordination between government agencies sometimes creates
barriers that prevent valuable intelligence from reaching the proper
authorities.
At the forefront of acquisition and development
of Navy data-mining tools are the Space and Naval Warfare Systems
Command, the Naval Research Laboratory and the Office of Naval Intelligence
(ONI). There is little to no coordination between these commands
to acquire data-mining tools in concert, a Navy official said, adding
that one of the biggest problems with Navy data-mining tools is the
number of various commands working on acquiring these tools, “some
of which overlap, and it’s not always as well coordinated as
it could be.”
The official suggested establishing a maritime domain
awareness program executive office as a means to “deconflict” some
of the divergent acquisition of data-mining tools between commands,
which leads to conflicts in data and hardships in comparing data
sets. The Navy had no comment on the plausibility of this suggestion.
“There have been times where ONI needed information
that existed in other agencies’ data sources” and it
was not available, the Navy official said. “It’s certainly
not seamless and it’s not as well integrated as it could be.
Today, there are still lots of places where things can fall through
the cracks and where connections might not be made.
“For example, there is not a single source
of, or a single list of, terrorists” that all intelligence
commands share, the official said. “If someone boards a ship
in the Mediterranean and gets a crew list of people who are on that
ship and that ship’s en route to the United States, we can
take that crew list but we have to run it against multiple lists
to see if anybody who’s on that ship pops up as a bad guy. … It
could be easy to not check against somebody’s database.”
ONI shares a working relationship with Naval Networks
Commander Vice Adm. James McArthur, who wears a lesser-known hat
as the assistant chief of naval operations for Information Technology.
McArthur’s office provides oversight and guidance to validate
ONI’s information technology spending on tools such as data
mining.
McArthur’s office was reluctant to discuss
these tools because of the “Able Danger” controversy,
citing their immaturity and the relative lack of “concrete” examples
of how they can be used successfully, according to a Navy spokesperson.
Several experts told Seapower that data mining is
destined to be a valuable asset in the war on terror, but should
be viewed as a capability with advantages and limitations rather
than a cure-all for the nation’s growing intelligence requirements.
Jeffrey W. Seifert, an analyst in information science
and technology policy for the Resources, Science and Industry division
of the Congressional Research Service, released an overview of data
mining last December. The report points to a limitation in data mining
as being unable to determine the value or significance of intelligence.
It also mentions an inability of data-mining tools to determine causal
relationships.
“For example, an application may identify
that a pattern of behavior, such as the propensity to purchase airline
tickets just shortly before a flight is scheduled to depart, is related
to characteristics such as income, level of education and Internet
use. However, that does not necessarily indicate that the ticket
purchasing behavior is caused by one or more of these variables,” the
report states.
Regardless of the particular data-mining tool or
its limitations, the first step in data mining is to concentrate
data into a single, normalized architecture or data model. That can
be done physically, by actually moving all the data into a common
disk form, or “disk warehouse,” so it can then be digested
to resolve ambiguities, or the sorting can be done automatically
by a computer. For example, if one set of data is recorded in meters
and one is recorded in feet, then the data-mining process would initially
make a conversion so that when the actual tools are run against the
data set a consistent outcome would be produced. Once data is normalized,
the tools scan through it and create a statistical model.
Data-mining tools look through the existing data
and identify patterns. From those patterns, anomalies, or out-of-place
data patterns, are recognized and then analyzed. One notable outcome
from the analysis of these patterns is the ability to make predictions
about what is missing in the data, or what elements of data are not
included.
This, however, is an extremely difficult task when
working with 26 terabytes of active data on a daily basis, an amount
that would fill up about 85 high-end 300 gigabyte hard drives each
day. This quantity of information being processed by the Navy is
also growing at a rate of 10 percent per year, according to ONI.
Nonetheless, data mining is an asset to government
agencies that have taken on new roles in the aftermath of 9/11.
A new interest of the Navy and other government
agencies is to track the movement of more than 130,000 commercial
vessels and the 17 million cargo containers they carry, which could
be used by terrorists as a means of attack against U.S. ports, or
to smuggle arms or people into the country. ONI looks at transit
plans, bills of lading, intelligence reports, and years of reporting
by internal analysts and news agencies to identify vulnerabilities
or suspicious activity within the shipping industry. Today, the Navy
is shifting its focus from the ships themselves to terrorist use
of the commercial shipping network, according to a Navy source.
“Many of the problems that we’re looking
at in the commercial shipping industry are very much analogous to
fraud detection; we want to track norms and we want to identify things
that are outside of the norm,” said the Navy official.
Data-mining tools take some of the manpower out
of the loop, but the likelihood of them ever reaching a capability
to replace the need for analysts is unlikely. Data-mining tools provide
some of the manipulation of data that data entry analysts have historically
had to deal with, and the development of these tools now allows analysts
to focus on the actual threats and their dissemination to the appropriate
authorities for mitigation.
There are typically 10,000 messages on an analyst’s
desk at ONI every morning. One tool ONI has been exploring, and is
deploying this fall to approximately three-dozen workstations, is
Project Rockwell. Derived from another agency and an industry partner,
Project Rockwell allows analysts to go through open wire news feeds,
such as Reuters or the Associated Press, and run queries against
the feeds in the areas that they have highlighted.
If there is a subject an analyst has particular
interest in, they can highlight it, and pertinent information will
be color-coded on their desktop. For example, if there is a topic
of concern that normally has one news-feed pertaining to it and suddenly
there are hundreds of feeds, Project Rockwell brings that information
to the analyst’s attention and directs them to that topic or
subject of interest.
“What it allows them do is go through the
thousands of messages that they would get normally in a day and does
it four times faster,” said the Navy official. “That’s
not taking the man out of the loop, but it’s certainly freeing
up the man to do more analysis and less data sorting and initial
review.”
In the homeland security realm, there are some legal
privacy constraints, not necessarily restrictions, on sharing information
outside of Department of Defense boundaries, depending on what that
information is. Intelligence commands, for example, have limitations
on how and how long they can retain information on U.S. persons or
companies.
“What we’re hoping to build is a capability
that, if we can’t keep the data, will allow us to connect the
data that might be held by the FBI or by the U.S. Coast Guard, as
examples of law enforcement agencies, so they can easily extract
value from our data,” said the official.