If attendees learned just one thing at GEOINT 2018, it’s that we’re living in an algorithmic world. At past Symposia, artificial intelligence (AI) and machine learning were regarded as emerging trends; this year, they arrived as operational imperatives. And yet, there remains a conspicuous gap between where data science is today—a state of collecting data—and where it needs to go tomorrow: a state of understanding it.

As a discipline, GEOINT is ideally positioned to lead the transition, panelists agreed Wednesday during “Analytics Driving Action,” the final main stage session of this year’s event.

“We’re now rightfully … going from sensing to sense-making, and I have every expectation that this community will be just as revolutionary on that front as it has been historically,” said the panel’s moderator, Dr. Erin Simpson, director of strategic analysis at Northrop Grumman.

Joining Simpson on stage were four panelists: Dr. Sarah Battersby, research scientist at Tableau Software; Auren Hoffman, CEO of SafeGraph; Jeff Jonas, founder, CEO, and chief scientist at Senzing; and Dr. Karen A. Miller, scientist at Los Alamos National Laboratory. Together, the panel spent 45 minutes discussing not the promise of AI and machine learning, but rather the path to executing it.

That path has two forks, panelists suggested: data and people, both of which must be followed to their shared terminus—a future in which data is seamlessly and successfully integrated into both the public and the private enterprise.

Developing Data

The first fork concerns the data itself, which typically is organized around the four “Vs” of volume, velocity, variety, and veracity.

Typically, organizations become obsessed with volume and velocity. For Hoffman, however, perhaps the most essential “V” is veracity.

“The most important thing about data is that it’s true. … The better the data, the more true the data, the more data you have, the less important the algorithms are,” he said, adding that data should always be viewed with skepticism instead of certainty. “It’s really important that you don’t trust the data. … Even data as simple as weather data is often wrong.”

A favorite illustration among data scientists comes from the University of Washington, where researchers in 2016 deliberately trained a machine learning algorithm to give unreliable results.

“They took a bunch of images and they had a machine learning algorithm classify them: Is this a wolf or is this a dog?” Battersby explained. “They had actually gamed the system and made sure that all of the backgrounds for the wolves were snow.”

The algorithm learned to classify images according to the scenery in the background instead of the animal in the foreground. As a result, it classified images as “wolf”—regardless of the animal—any time there was a light-colored background.

“They showed [the algorithm] to machine learning grad students and said, ‘How well did this do? Do you trust it? Would you use this model?’” Battersby continued. “A third of the machine learning grad students said, ‘Yeah. That seems like a good model.’ … That’s a problem.”

While Hoffman, Battersby, and Miller emphasized veracity, Jonas made the case for variety. He asked the audience to consider, for example, a hypothetical organization whose business is protecting the supply chain. They want to leverage analytics to find bombs in their cargo, but their only piece of data is a manifest.

“No one writes ‘bomb’ on a manifest; you will never find a bomb,” Jonas said. “The remedy is widening the observation space. … If you want to get really high-quality outcomes, you’ve got to blend more diverse kinds of data.”

Harnessing Human Capital

Of course, even the best data is only half of the equation. To achieve analytic excellence, panelists said, organizations must focus equally on the people who manage and use data.

“We’re using these automated methods to create more information, but that new information still needs analysis,” Battersby said. “So when we think about what the future holds, it’s really embracing both the technical challenges of how we do things and how we do them better, and then what is the human capital that’s needed to take advantage of what we’re doing. Because we’ve really dropped the ball if we don’t think about how humans are going to help us make sense of what it is that we’re processing.”

On that note, panelists picked up a baton passed by National Geospatial-Intelligence Agency Director Robert Cardillo and venture capitalist Scott Hartley, who in their GEOINT 2018 keynotes spoke about blending art with science.

In the world of data, that means facilitating a happy marriage between people who can build models and people who can inform them.

“There’s no substitute for domain expertise,” Miller said. “You really have to understand the nuance of the problem at hand and how that maps onto whatever model you’re building.”

For data-driven organizations of all flavors, the way forward is evident in the panel’s conclusion, which focused on recommendations for a planned “AI Center of Excellence” within the Department of Defense.

“What would be the one piece of advice … you might offer to folks who are designing and standing up that new AI center?” Simpson asked.

“If I were to pick one thing, I would say focus on the human capital first,” Battersby said. “Make sure you’re giving people the appropriate training, the appropriate tasks, and really giving them the resources to think about the implications of what they’re doing.”

In other words: The future of data can’t be about algorithms exclusively; it has to be about anthropology, too.


Posted by Matt Alderton