What will result from the persistence and depth of geospatial data?
A near-clairvoyant ability to develop knowledge on everything, everywhere, all the time is fictitiously portrayed in TV shows such as 24, Person of Interest, The Wire, Alias, and Homeland. However, recent proliferation of new sensors, the integration of humans and machines, and the advent of big data analytics provide new opportunities to translate this portrayed drama and excitement of intelligence fiction into intelligence fact. The persistence and depth of data now readily available allows new products to be woven together out of three basic threads or classes of information: vector-based knowledge (everything), locational knowledge (everywhere), and temporal knowledge (all the time).
As we move to an era of ubiquitous, real-time information, economists, first responders, business intelligence analysts, scientific researchers, intelligence officers, and many other analysts have the potential to answer questions previously unimagined. However, reaching this potential future vision will require the geospatial intelligence (GEOINT) Community to overcome several distinct challenges.
New GEOINT Sources for a New World
Where analysts previously relied on only a few sources, today’s GEOINT professionals have a plethora of new, non-traditional sources from which to choose. Increasingly proliferated and persistent small satellites, drones, and other emerging commercial capabilities contribute greatly to the wealth of information by complementing traditional airborne and spaceborne GEOINT collection systems. At the same time, the convergence of sensing, communication, and computation on single platforms combined with the ubiquity of the internet and mobile devices have further increased the variety of data available.
Traditional and proven imagery capabilities based on large government and commercial aircraft and spacecraft have been augmented by increasingly capable small satellites that cost less to produce and are easier to launch. Small sats, picosats, and even smaller versions being created in the past decade have proliferated new remote sensing capabilities that increase revisit rates and cover larger portions of the electromagnetic spectrum. Closer to Earth, affordable commercial drones with high-resolution imaging, multi/hyper-spectral sensors, high-definition video, and other capabilities have revolutionized all aspects of data collection, from hobby photography to agriculture to archaeology. Small sats also contribute to the U.S. military mission by providing easier and faster access to communication, positioning, navigation, timing, and weather data. As these sensors become more affordable, pervasive, and persistent, new users across industry, academia, and government will be able to leverage increasingly capable systems to improve access to all forms of GEOINT.
Crowdsourcing, or “participatory sensing,” is defined in the 2011 Merriam-Webster’s Dictionary as “the practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people and especially from the online community rather than from traditional employees or suppliers.” Crowdsourcing plays a major role in creating information-rich maps, collecting geo-localized human activity, and working collaboratively. This relative newcomer to the GEOINT tool kit has been utilized effectively in crisis mapping efforts such as DigitalGlobe’s Tomnod, a volunteer, public crowdsourcing community that gained popularity during the 2014 search for Malaysian Airlines Flight 370 and the aftermath of the 2015 Nepal Earthquake. In the commercial sector, companies like Findyr, Native, and Spatial Networks provide high-fidelity, street-level, near real-time contextual data from a worldwide, hyper-local audience of participatory geographers.
While intelligence revolutions often rely on the advent of new collection systems, a dominant driver for the future of GEOINT includes multisource data persistently generated and processed by intelligent machines. National Geospatial-Intelligence Agency (NGA) Director Robert Cardillo recently named artificial intelligence (AI) and machine learning (ML) technologies a top priority for U.S. GEOINT Community analysis: “If we attempted to manually exploit all of the imagery we’ll collect over the next 20 years, we’d need 8 million imagery analysts.” Cardillo also noted automation is needed to augment human analysts. The development of ML algorithms for automated change detection to handle the increasing load of imagery data will free up human analysts to continue working on “higher-order thinking to answer broader questions,” said Scot Currie, director of NGA’s Source Mission Integration Office. Such algorithms also have the potential to discover unknown relationships invisible to cognitively biased humans, generate unconventional hypotheses, and anticipate potential outcomes based on continuously learned causal models.
Included in these techniques are automated feature-based image registration, automated change finding, automated change feature extraction and identification, intelligent change recognition, change accuracy assessment, and database updating and visualization. The GEOINT analyst of the near future will operate as a member of a blended human-machine team that leverages the best skills of each to answer more questions with more information over a wider range of issues on a shorter timeline.
- This article is part of USGIF’s 2018 State & Future of GEOINT Report. Download the PDF to view the report in its entirety and to read this article with citations.
Living and Working in a Persistent Knowledge Environment
Highlighting The Economist’s description of “data as the new oil”—a valuable commodity driving our economy, NGA Director of Capabilities Dr. Anthony Vinci has challenged industry partners to “turn it into plastic.” The tradecraft of a GEOINT analyst now lies in the ability to quickly synthesize this highly adaptable resource into intricate, creative, and useful products previously unforeseen or unimagined.
In a persistent information world, every object and entity on, above, or below the surface of the Earth may be represented as a vector of its attributes—i.e., all the metadata about that entity. This extends the analytic paradigm to a knowledge environment in which every property of every entity is always available in real time. Analysts will be able to create comprehensive datasets about specific vectors of interest—be it individuals, groups of people, a particular building, or a particular type of infrastructure. To know “everything” in this sense means being able to perform a deep dive on any person, place, or thing and gain insight on how their attributes are distributed spatially. In addition, this new wave of data allows us to dispense with the old pick-and-choose mentality and perform this level of examination on all subjects at the same time. If this capability sounds far-fetched, the proliferation of sensor-enabled, internet-connected mobile devices—the so-called Internet of Things (IoT)—seems poised to introduce a paradigm in the not-too-distant future in which almost every entity on Earth beacons vectorized metadata into a ubiquitous data cloud.
In this world, it is also possible to create a complete dataset about any location on Earth. For a given place, we can gather data about topography, weather and climate, population density and demographics, local populations, recent conflicts, infrastructure, land cover, and 3D renders and imagery of buildings. Aggregating these data allows for a complete snapshot not only of any given area, but of the whole Earth at once. Imagine a spinning Google Earth globe with an infinite number of layers and an infinite level of detail at every altitude from the Mariana Trench to geostationary Earth orbit updated in real time. The challenge for the analyst holding such a globe is simply where to start.
As persistent data providers blanket the Earth and index data in accessible, online repositories, analysts build upon the immense place- and vector-oriented datasets over long periods of time. This exposes movements of population and demographic shifts, changes in weather and climate as well as land cover, destruction or construction of infrastructure, and the movement of conflict hot spots over time. By integrating all of these datasets, we can connect patterns between any number of variables across time. The concept of real time extends to all time.
With persistent knowledge across vector, location, and temporal domains, analysts can instantly exploit extremely high-resolution data about every concept of interest in every location, and refresh it on a daily basis, if not more frequently. However, the question remains, “So what?” It certainly seems interesting to have persistent data, but what can we do with them that we couldn’t do with simple big data? Are there questions we can answer now that we couldn’t before?
Answering New Questions
As anyone with a five-year-old child can attest, the most dreaded word in the English language is “why.” Children approach the world with relentless inquisitiveness, but “why” questions are taxing to answer. Early GEOINT collection and analysis capabilities constrained analysts to answering questions of what, where, and how many, but modern analytic advances open new avenues for who, how, and most importantly, why. The GEOINT environment of the future will reinvigorate the curious five-year-old in all of us.
The ability to rapidly ask questions and instantly receive answers from a near-infinite amalgamation of information gives analysts a Google-like ability to comprehend the world. New analysts will develop a deep understanding of geospatial and cultural issues in a fraction of the time required with infrequent, periodic collection and multiyear analytic efforts. Using app-based micro-tasking capabilities, an intelligence analyst in Virginia might interact in a video chat session with a protester in Cairo to understand why people are protesting and anticipate future areas of violence.
Analysts in the past operated in a sparse data environment where they waited to get data that in some cases was never collected or not processed in time. In an environment of instant, persistent data, it is likely that many knowable facts might be sensed by multiple phenomena that don’t always generate the same interpretation. The pace of weighing, judging, integrating, and verifying information will increase dramatically. Decision-makers will require an unrelenting operational tempo and a near-superhuman expectation of omniscience. It now falls to analysts and analytics to make sense of everything in context.
Integrating cultural, social, and economic information within GEOINT analysis significantly enhances analyst understanding over object-focused analysis. Human geography, while not technically a new source, is being used in new ways to provide fresh insights. By applying the who, what, when, why, and how of a particular group to geographic locations, analysts can create maps to track the social networks of terrorist groups, documenting complex interactions and relationships. By mapping the evolution and movement of ideas, activities, technologies, and beliefs, analysts develop deep contextual maps that combine “where” and “how” to convey “why” events are occurring (e.g., the rise of the Islamic State).
By integrating information about a specific area, such as who is in charge, what language they use, who they worship, what they eat, etc., analysts can create information mash-ups on the web that help planners and decision-makers safely and effectively anticipate potential future violence, deliver humanitarian aid, or improve regional stability. Human terrain information, introduced at broad scale during the Iraq and Afghanistan conflicts, will increasingly become part of a standard foundational GEOINT layer included in all cartographic products.
Collaborative analytic teams will extend existing operating procedures based on text-based Jabber sessions to “always-on” telepresence where geographically dispersed analysts interact as though they are in the same room and time zone. Perhaps these teams will even break down collaboration barriers across organizations, time zones, cultures, languages, and experience levels. Multidisciplinary teams working in a persistent knowledge environment can change their mind-set and answer new questions, especially the elusive “why.”
Overcoming New Challenges
While the proliferation of sensors and big data seemingly on demand may lead us to believe omniscience is truly within reach, several distinct challenges currently impede our vision for the future. First, if the data exists, can everyone access it? Should they? Data’s democratization has made excessively large volumes of data available to anyone who can search and download from the internet. But, as the saying goes, you get what you pay for. Thousands of websites offer free investment advice, but can you beat the market when everyone has access to the same free data? For example, the much-touted data.gov repository boasts nearly 200,000 free public datasets, but data are not always well described or easy to navigate.
Even when freely available data points to a logical conclusion, skepticism should arise. We cannot always know the origin of openly available data, be sure it has not been altered, assume scientific correction factors have been applied appropriately to raw data, or confirm metadata has been tagged correctly. Additionally, we may not know the intent of the person who made the data available or the biases that may have been introduced. In short, data veracity can be questioned when one does not fully control the data supply chain. Constant verification and vetting of sources may take over a majority of the analytic time bought back by advanced automated algorithms.
Freely available data’s omnipresence can overwhelm any analytic workflow, even with powerful big data processing, thereby quickly becoming an analyst’s self-imposed analytic quagmire. Many analysts will brave the overwhelming to ferret out insight that hides within this deluge. Data can be conditioned and formats standardized for compatibility. But to what benefit if the combined dataset is impossible to search, filter, fuse, and explore in a realistic time frame?
As commercial market demand for remotely sensed data and knowledge products continues to evolve and expand and barriers to market entry become lower, new vendors continue to emerge. A critical question arises: Can the government afford to pay for data and will commercial companies survive if they don’t? In 2010, NGA awarded two 10-year contracts for commercial imagery to DigitalGlobe and GeoEye with a combined value of $7.3 billion, but two years later, funding shortfalls caused the companies to merge.
Social media harvesting and sentiment mining is popular, but Twitter aggregator Sifter charges users a simple pricing model of $50 per 100,000 tweets. (Twitter estimates there are about 200 billion tweets in a year.) The Intelligence Community’s noble attempt to connect all the dots to ensure the U.S. does not experience a surprise terrorist or military attack underscores the desire to acquire and examine “all” available data. Whether government or commercial, it may be cost-prohibitive to purchase and examine all collected data to ensure competitive advantage—going infinitely and indefinitely global might carry a similarly infinite price tag. Overly narrowing the focus of collection might limit opportunities to stumble upon the singular “missing dot.”
Additionally, licensing and usage rights that protect commercial business often inhibit redistribution of data to other individuals, departments, or agencies. The U.S. government’s contract with DigitalGlobe limits imagery use to within federal agencies to “minimize the effects on commercial sales.” NGA’s 2017 $14 million contract to San Francisco-based imagery start-up Planet tests a subscription-based model to “access imagery over 25 select regions of interest” for one year. Despite its widespread use as a source of human activity data, the Twitter Terms of Service prevent usage of data “for surveillance purposes” and “in a manner inconsistent with our users’ reasonable expectations of privacy.” Key issues of perpetual data ownership, lineage to source data and processing settings, privacy protections, and long-term archive requirements will challenge traditional concepts of data ownership.
Finally, the ubiquity of spatial data of all kinds raises new privacy concerns. Policies have been developed to govern how different “INTs” can be used, but when source data can be worked into new products and discoveries, protection of citizens from continuous monitoring becomes increasingly difficult. A GEOINT savvy workforce must also include lawyers, psychologists, law enforcement personnel, and even politicians.
Succeeding in a Persistent World
The democratization of GEOINT and the expectation of omniscient, instant knowledge of every activity, event, and object on Earth puts new pressures on the GEOINT workforce. Similar pressure exists in commercial industry, such as in the financial sector, to ensure better knowledge than competitors about issues such as trade volumes, raw material supply, and transportation networks. In the business of intelligence, competitors are threats that evolve across changing geopolitical and economic environments. The stakes are more than financial—they are existential. As GEOINT becomes more persistent, pervasive, and accessible, it will also become increasingly able to answer new questions, develop new products, and enhance the GEOINT workforce with new tradecraft.
Headline Image: Data visualization software is displayed for attendees of the Cyber Capability Expo at the newest SOFWERX facility in Tampa, Fla., October 2017. The expo sought to identify novel and provocative cyber technologies to meet current and future special operations forces requirements. Photo Credit: Master Sgt. Barry Loo