2020 State & Future of GEOINT Report, Contributed

Creating an Integrated, Multi-Source, Accurate, Trusted, and Immersive Cognitive Analytic Environment

Industry and government must develop technologies across a wide spectrum of needs

By: USGIF | April 28, 2020

By Dr. Ann Carbonell, Riverside Research; Bob Gajda, Frontier Enterprise LLC; Johnnie DeLay, L3Harris Technologies; and Alex Fox, Hawkeye 360

The following vignette illustrates a future analysis environment:

The Joint Analysis Center in England alerts the AFRICOM watch desk that Lloyd’s of London is reporting three supertankers off the Horn of Africa suddenly going dark as they cease sending Automatic Identification System (AIS) signals. More than six million barrels of oil and at least 75 crewmen are missing. AFRICOM’s support analyst, Josephine, calls up the last three hours of AIS signals from the region, isolates the three ships, and plots their courses. Simultaneously, she queries for radio frequency (RF) intercepts and observes increases in activity before the AIS signals stopped. Geolocation plots show routes intercepting with the tankers, suggesting abnormal and suspicious activity from ocean-based transmitters.

Suspecting pirate activity, Josephine alerts the operations desks and begins exploring data in her immersive environment. She brings in historical sensor data covering all nearby coastal regions, and she interacts with other analysts from around the world. Much like players in the online video game “Fortnite,” these analysts collaborate on analytic approaches and allocate workload to the most appropriate person. Social media analysis algorithms reveal two unusual trends: diminished activity from among identified “pirate sympathizers” and increased overall chatter from nearby port facilities. Since Josephine has used these tools and data sources many times and knows them to be reliable, she determines that these indicators point to probable locations for intervention.

Coordinating with operations, Josephine requests real-time coverage of the suspected area and downloads recent military, commercial, and historic content to begin building an operational target folder. Her “hyper-dimensional” pattern-of-life tool offers her multiple perspectives that will be useful in operational planning—what “normally” happens in the region, what is “different” right now, and possible weighted scenarios. The race to recover the ships and save the crewmen is on!

Unfortunately, this response scenario is not possible today. Analysts cannot easily, quickly, and reliably reach into the vast set of potential input sources suggested in the vignette above. They cannot fuse collateral sources with conventional reconnaissance and remote sensing sources, or seamlessly interact with dispersed analysis resources through a common immersive interactive framework. Inconsistent data schemas and nonexistent ontologies hamper the few existing fusion and conflation tools. Algorithms and artificial intelligence (AI) tools that sample data and produce products such as trends and anomalies are not widely available, trusted, or used.

“Effective visualization is a key part of the discovery process in the era of ‘big data.’ It is the bridge between the quantitative content of data and human intuition, and thus an essential component of the scientific path from data, into knowledge and understanding.” ^[1]

Because the vignette’s analysis environment does not fully exist today or is incomplete, unintegrated, or unreliable, responsive decisions cannot be made. The result? Lives at risk or lost.

“Instead of improving the world, [biased and unreliable automation tools such as AI/ML] could be making things worse. Not predicting the future; causing the future.”^[2]

The future is about reliable and responsive decision environments operating across all sensing domains and across the electromagnetic spectrum. These environments bring together three-dimensional (positional x,y,z), four-dimensional (temporal), and five-dimensional (e.g., trends, event durations, and connectedness to other data streams) data characteristics to provide analysts with a holistic perspective. Analysts are enabled by machines that are interoperable, are self-learning, operate as human teammates, are trusted sources of information, and effectively assist in determining decisive action.

What needs to be done to create a more integrated, data-fused and conflated, machine-assisted, and accurate “Cognitive Analytic Environment (CAE);”^[3] regardless of sensing domain, an individual system’s accuracy and resolution, the medium, or the passage of time? The analytic community must focus on three critical components:

Content: What content is available with its provenance and trustworthiness understood, how readily useful is it, and what needs to be done to more fully utilize the content?
Analytics: What is the maturity of the analytic technology and what issues need to be resolved?
Trust: We must improve our trust in data, algorithms, tools, and analytic products. What influences user trust and how can trust be enhanced?

Multitude of Sources

As of November 2018, more than 500 U.S. commercial and civil satellites were in orbit,^[4] most performing remote sensing operations in diverse segments of the spectrum and observing the world all day, every day. But sensing is not just performed by satellites. For example, state and city departments continuously monitor street life, traffic, and critical infrastructure in support of quality of life. In London alone, there are an estimated 625,000 cameras in use today.^[5]

And sensing operations collect more than pictures. RF sensing systems can identify and geolocate emitter activity everywhere on Earth every day—monitoring planes, ships, seismic activity, and weather. Social media content usually includes location and time information, bringing new sources into observation domains and offering incredible potential given the volume of data being generated by their users—a number that is continually rising. Today there are 300 million monthly Twitter users, 900 million Instagram users, and 200 million Snapchat users alone.^[6] Open-source data include media items such as news broadcasts, newspaper and magazine articles, and company and government analysis reports.

Remote Sensing (RS) Sources are the most prevalent data, operating across the electromagnetic spectrum, from satellite and airborne platforms, day and night, and in single and multiple bands.^[7] With the advent of small satellites and drones, a “virtual constellation” exists, providing near-continuous, worldwide collection potential. Analytic experience with RS systems is robust, albeit not generally integrated with other systems. Data processing is common and geolocation determination accurate and well defined. However, with volume of commercial data overwhelming infrastructures, the notion of bringing all the data to the enterprise becomes cost-prohibitive, thus putting more emphasis on upstream distributed processing.

While each system’s data and metadata formats are unique, there is consistency across products that adhere to standards such as the National Imagery Transmission Format (NITF). This is less the case for metadata content. Accessing the sources is a challenge because vendors maintain their own discovery, processing, and distribution systems as well as their own unique business pricing models.

A significant improvement in the integration of RS source data would be a “data concentrator service” through which issues of availability, access, provenance, formatting, geolocation accuracy, privacy, pricing, and metadata issues could be resolved. For example, “all” electro-optical (EO)—or radar, RF, or multispectraI—sources, regardless of platform, would be processed in a consistent manner, output in a common format, and normalized (by time, resolution, quality). The concentrator would handle the sensor- and platform-specific encoding techniques and output to a common standard so software and other application usage would not be constrained by data format issues. For users looking for “one-stop-shopping,” the ideal CAE would not have to be a data processing system, and instead would provide more focus on analytic processes.

Sources are usually a mixture of two-dimensional and three-dimensional data with an additional time component. But when the sources are exploited as collections of integrated data, they have inherent five-dimensional, or hyper-dimensional, characteristics. We need to concentrate on conflation and fusion, making the data richer and creating new datasets and thereby moving to the hyper-dimensional data space.

“The more dimensions we can visualize, the higher are our chances of recognizing patterns, correlations, and outliers.” ^[8]

Open-Source Media analysis deepens the analytic potential when analysts integrate existing reports, findings, assessments, and conclusions on the topic under investigation. The challenges around preparing social media data also exist for open-source media, only amplified. The authors believe that a data concentrator could be applied to this domain. Three prominent examples are already in wide use today: AP News summarizes, collates, and produces data on news events; Bloomberg does the same for financial data markets; and LexisNexis serves business information.

Social Media is the “Wild West” compared to RS sources. Geolocation and time are implied in the data, but not necessarily as a specified data element. Standardized product formats do not exist as they do for imagery, video, and geographical information system outputs. Social media sources are a challenge to use because each vendor is a unique provider.

The biggest challenge to using social media data in a CAE is the lack of product definitions and associated standards. Raster, vector, and data cube formats are well established. The CAE will have to establish needs for this information so appropriate formats can be devised. Once formats are established, a data concentrator model could be developed to handle access, processing, formatting, and delivery of such content.

Key Analytic Capabilities

The CAE looks at situations holistically, operates as a human’s teammate, and is trusted. Success starts with driving the configuration from the mission questions and then exploring what we can get out of the data. Today, integration, collaboration, and AI/machine learning (ML) automation research and investment leads from the data and not from the questions being addressed. One could never finish the task of building a CAE for every possible question across all available data. Even when focusing on a specific problem, building the analytic models and conditioning the data is a challenge. A useful CAE needs to be mission-constrained (by topic, region, question, decision intent). A mission constraint that is too broad will be unmanageable in terms of creating concisely defined analysis models, accessing and understanding relevant data, or building visualizations that explain the data.

Analytic models then must be developed for the specific mission being explored. Taxonomies and ontologies must be developed so data from a concentrator can be indexed, fused, and/or conflated for analyst visualization and consumption. Semantic technologies are at the heart of the future geospatial enterprise systems, with mission-specific ontologies providing the basis to understand human reasoning. Mission-specific ontology will drive the development of cognitive solutions and form the core of AI initiatives.

Within this new paradigm, a semantically enabled CAE will allow systems to understand the way analysts reason and how to connect information for them. Through self-learning, such AI systems will simultaneously bolster a user’s ability to understand data and provide the means to discover a whole new world of relationships among the data and entities contained within them. Through the application of semantic technologies, we can consolidate and link widely dispersed information by integrating it into mission-driven knowledge graphs in support of the CAE that can be queried.

The third action needed is to provide a robust immersive environment for user applications, querying, and decision-making, and for collaboration with supporting analysts. Analysis functions include visualization, natural language processing, AI/ML, social media analytics, and “conventional” data analytics.

“Visualization is essential in the data mining process, directing the choice of algorithms, and in helping to identify and remove bad data from the analysis.”^[9]

The key technical challenge is not simply tools, but rather tools that will work within the appropriate analytic model. The models will direct the tools to the question being asked, and their products will reflect the on-the-ground situation being provided by the input sources.

“Immersion provides benefits beyond the traditional desktop visualization tools; it leads to demonstrably better perception of datascape geometry, more intuitive data understanding, and a better retention of perceived relationships in the data.”^[10]

The immersive environment is a fully formed collaboration environment. As long ago as 1982, researchers confirmed that collaborative performance exceeded an individual’s performance measure.^[11] Effective “sensemaking” is a process of discussion in which interactive visualization supports social interaction.”^[12] In other words, immersive collaboration, especially via virtual reality technologies, provides for “telepresence” collaboration sharing a common viewpoint or navigating independently.

Trust in the Environment

Analysts must trust the CAE’s sources of information and its analytic processes and tools for determining decisive action.

Today’s analysis domain is basically a set of linear processes integrated by the user at each step in the process. The analysts have high confidence because they have examined every dataset, requested and reviewed all intermediary products, collated and fused data from collateral sources, and used their judgment on what the data means. Having direct responsibility for each step, analysts trust the outcome. They have a “positive relationship” with the data and data providers because there is validated “judgment and expertise,” and they have seen “consistent” results over time.^[13]

In the future, analysts will be presented with data they have not “touched” and with intermediate analysis products from algorithmic tools that they may not fully understand.

Why do people trust stock market analysis tools? Why do users trust Waze for the best route to their destination? Why do Fantasy Football players rely on opinions from sports experts? How can we create that same kind of trust for the CAE? Not until there is clarity in the “goals” the algorithms are pursuing. Not until the analysts’ “expertise” with the system matches their history with former hands-on systems. Not until there is full “transparency” in how the CAE functions. Not unless the analyst participated in “building the environment.” And not until the analyst has fully “experimented” with the CAE’s processing conditions and values.^[14]

Trust in a decision-support tool is critical. In December 2018, Glenn Gaffney, Former Director of the CIA’s Science and Technology Directorate, stated, “If we are going to rely on those things [automation, AI, ML, and deep learning] there are some deep questions that need to be addressed … things like explainability of models, and the effect of collection bias and data bias in those models. We are going to have to understand much deeper how they work and what’s really happening….”^[15]

“Deep fakes,” created when audio and video data is manipulated to present false information to a consumer, is a growing concern. Relying on a trusted data concentrator will reduce these risks. The increased evidence of artificially influenced social media data, though, could still bias data from a concentrator. Increasing trust in AI technologies is a key element in accelerating adoption and use across the intelligence and defense communities. Today, the life-cycle management for AI/ML is limited, and there are few ways to measure an algorithm’s trustworthiness. AI standards and related tools, along with AI risk management strategies, need to rapidly evolve to address this limitation and spur innovation. For GEOINT analysts to trust AI, they will need to understand data curation, accuracy, reliability, security, and provenance for the entire life cycle of the data and AI process.

Across the U.S. government, the reliance on AI tools has caused concern. The government has launched an initiative in response to Executive Order 138597 to bring standards to AI technologies in order to ensure there is “… public trust, and public confidence in systems that use AI Technologies.” A key element recognizes that data standards and datasets in standardized formats, including metadata, are vital to ensuring the data’s applicability in making informed decisions. This is especially the case within a hyper-dimensional analytic environment, where accuracy, provenance, reliability, and dependability build trust.

The Way Ahead: A Call to Action

How do we get to a trusted, integrated, collaborative, multisource, accurate, and immersive CAE? First, the government needs to articulate the compelling need for integrated immersive analysis and solicit concepts, ideas, pilots, and prototypes from industry. From these, the government needs to develop road maps across technological boundaries. Then together, industry and government must develop technologies across a wide spectrum of needs. The technologies include:

Visualization of multiple dimensions as a key part of discovering patterns, correlations, and outliers in big data. Collaboration techniques through virtual reality devices should be considered. Immersion provides more intuitive data understanding and better perceived relationships in the data.
Development of mission analysis ontologies with an eye toward the analytic models that will drive AI/ML tools. Mission-specific ontology will drive the development of cognitive solutions and form the core of AI initiatives.
Establishment of data schema, data standards, conflation approaches, and indexing methodologies to develop and advance five-dimensional analytic environments. Approaches to integrate curated, validated, and metadata-complete data with automation tools is crucial.
Development of a “data concentrator service” for all forms of sources that addresses availability, search and discovery, data access, formatting, geolocation accuracy, conflation, and curation. Companies, uniquely or in joint venture, should explore this potential. The authors believe there are business cases supporting such endeavors.
Finally, the government needs to continue pursuing standards for automation and AI per Executive Order 13859. Building trust in analytic outcomes cannot be gained without consistency and transparency. Meanwhile, nontechnical policy and legal issues such as privacy, data rights, and intellectual property protection, as well as liability issues, must be addressed now before their absence creates chaos.

The potential value of a CAE is immense when all relevant data sources are available for analysis and the analysts are collaborating in a common, real-time, and immersive environment. In this environment, tools and algorithms are automatically deployed, providing insights for decisions and analyst feedback, and continuous learning enhances the analytic processes. The human is in full partnership with the environment, forming judgments leading to decisions.

C. Donalek, S. Djorgovski, A. Cioc, A. Wang, J. Zhang, E. Lawler, S. Yeh, S. Davidoff, J. Norris, G. Longo, A. Mahabal, M. Graham, and A. Drake. “Immersive and Collaborative Data Visualization Using Virtual Reality Platforms.” 2014 IEEE International Conference on Big Data, 609.
Cathy O’Neil, author of “Weapons of Math Destruction,” quoted from the “The Secret History of the Future” podcast from August 14, 2019.
The “Cognitive Analytic Environment,” or CAE, is a construct proposed by the authors.
L. Grego. “Record Number of Satellites in Orbit.” https://allthingsnuclear.org/lgrego/2018satellitedata. January 9, 2019.
J. Ratcliffe. https://www.cctv.co.uk/how-many-cctv-cameras-are-there-in-london/. May 29, 2109.
J. Clement. https://www.statista.com/statistics/545967/snapchat-app-dau/, https://www.statista.com/statistics/282087 J. Clement. https://www.statista.com/statistics/545967/snapchat-app-dau/, https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/, https://www.statista.com/statistics/253577/ number-of-monthly-active-instagram-users/. August 14, 2019.
The authors include GIS and other mapping data as primarily produced from RS sources. Therefore, the authors include them in this source type.
Immersive and Collaborative Data Visualization, 610. 9. Ibid, 609.
Ibid.
W. Hill. “Group Versus Individual Performance: Are N+1 Heads Better than One?” Psychological Bulletin. 91(3):517-539. https://psycnet.apa.org/record/1982-23527-001.
Heer and M. Agrawala. “Design Considerations for Collaborative Visual Analytics.” Proc. IEEE VAST, 2007.
Zenger and J. Folkman. https://hbr.org/2019/02/the-3-elements-of-trust. February 5, 2019.
A. Raymone. https://www.techrepublic.com/article/infographic-7-ways-to-build-trust-in-data-and-analytics-at-your-company/. November 2, 2016.
Gaffney, quoted from the “Intelligence Matters” podcast from December 11, 2018.
NIST. “U.S. Leadership in AI: A Plan for Federal Engagement in Developing Technical Standards and Related Tools – Draft for Public Review 2-Jul-2019.”