The four components to achieving success in the data deluge era
Big Data is a term that originated in the e-commerce world and resonates with most of us in the Geospatial Intelligence Community. GEOINT professionals have always been tasked to optimize analysis over growing datasets and to be open to new capabilities that better serve the mission. This is why GEOINT was one of the first domains to adopt the new Hadoop-centric open source software frameworks, and why the Community leads the next wave of improvements to Big Data processing.
Most organizations in the federal space have spent several years thinking through the architectural impacts of new, ultra-large data sets, and most have either established a foundational infrastructure to build on or have mapped out a framework they believe will work for their mission set. There is still work to be done here, but since much progress has been made, a shift in attention and resources is occurring. The shift is an increased focus on mission outcomes.
This “mission outcomes” shift can be considered in four key categories:
The IT personnel working Big Data solutions will continually need to upgrade their skills, with an increasing focus on the solutions coming from the open source community, especially the Apache Software Foundation. This group stewards the many activities around the Hadoop framework of tools. IT experts are critical to success with Big Data, but we really aim to empower those making mission decisions.
Analysts, operators, and even executive decision makers are increasingly empowered with the ability to interact with Big Data holdings. This is a significant shift, empowering the people with mission responsibility to run their own queries, including interactive queries.
You should never automate a bad process. Focusing on mission outcomes can help reduce the risk of this occurring. Reworking processes should be done early on when new technologies are introduced, and may result in tremendous optimization of activities.
For example, if a Big Data solution could enable multiple agencies to share common data sets and better leverage common infrastructure, the cost savings might extend beyond IT. A good look at process may result in massive restructuring of activities and in new options for roles and missions.
As noted, some very important shifts are occurring technologically. The shift is toward technologies that are more aligned with mission-focused outcomes, and technologies that empower the end user are most important.
Analysts are already empowered with solutions that are easier to learn, and this trend is expected to continue. No analyst should have to be a Java programmer to create queries over data. The technology that serves analysts this way may be complex for the IT department to configure, but should be easy on the end user.
Big Data solutions provide cyber defenders with new capabilities, including ways to bring the right data together. In this way, they provide positive enhancements to the overall cybersecurity posture of modern enterprises.
But there are other impacts on security, including challenges. For example, Big Data solutions must be fielded with strong methods for authentication, authorization, and access control, as well as auditing and overall management of the data clusters. These critically important elements must be completed before deployment, and therefore the need for them should be articulated early in the fielding process.
Still, there is room for more work to be done in Big Data security. It is not as simple as mandating that Big Data solutions have encryption. Every encryption solution currently available introduces new vulnerabilities. New solutions are on the horizon that will help the Community better protect data, but they are not here just yet. For now, Hadoop clusters should be on owner-controlled networks and data access should be limited to trusted components.
Big Data is a dynamic area for the GEOINT Community. As your company or organization engineers for change, keep the key areas of people, process, technology, and security in mind, and please share with the rest of the community how you face and solve Big Data challenges.