Kirk Borne, a principal data scientist with Booz Allen Hamilton, has learned first-hand the misconceptions about data scientists.
On Monday at GEOINT Foreword, Borne moderated a concurrent session called, “How to Train Your Data Scientist” and recalled how—as a former astrophysicist and professor at George Mason University—he often fielded calls from recruiters.
On one hand, he said, “They’d interview people who claimed to be data scientists but had no science training. [Job seekers] were rebranding themselves in this hype cycle but were missing a lot of skills.” On the other hand, he would also get calls from recruiters who would say, “Oh, you’re just a teacher.”
Recruiters aren’t alone in misunderstanding the field of data science. Among the refrains in a pair of data science concurrent sessions held Monday, one was that “the unicorn”—the proverbial mythical creature who possesses a perfect list of skills and capabilities—doesn’t exist.
Panelists agreed data scientists need to be capable of performing analysis but also of communicating results, which is equally important. They discussed the difference between “data analyst” and “data scientist.” Casey Stella, principal architect at HortonWorks, said the latter is expected to do things outside his or her comfort zone, such as investigating what’s behind a database table.
Srinivas Prasad, associate professor of decision sciences at George Washington University, said understanding the value of the analysis and being able to communicate how the results contribute to business goals—whether it’s increasing crop yields or tripling click-through rates—is critical to data science. He added data science training doesn’t need to come from academia—passion is more important.
“A very small percentage of people actually finish the online programs, but if someone is committed, the resources are all out there,” Prasad said.
Panelists mentioned hackathons, Meetup groups, and working with dirty data as among the best nontraditional ways to learn data science.
Philippe Rigollet, associate professor at MIT, said data scientists learn by shadowing and practicing.
“You look for people who are insatiably curious and fearless, and then you run them through the ringer,” he said.
Borne spoke about the importance of data literacy, suggesting such education should begin as early as kindergarten to build data literacy and encourage curiosity within the future workforce.
Acquiring Data Science
In the second concurrent session, “Data Science Acquisition Models,” another set of experts continued to discuss what makes a data scientist. Panelists said the imagination to look at new uses for data and the ability to use inductive reasoning to pull data together, predict, and communicate are essential.
Saurin Shah, chief data scientist for Booz Allen Hamilton, said he looks for risk-takers when hiring.
“Some [data science] results can be controversial,” he said. “They have to be able to communicate and stand by their results.”
Developing open-source software and participating in competitions were discussed as fruitful training and recruiting tools. For example, the competition website Kaggle awards problem-solvers who find solutions for government and company data sets.
In the data science field, even training has been “gamified,” Shah said. “Explore Data Science is the first thing I put my staff through,” he added.
Throughout the 40-hour course both managers and participants can watch the leader board and track student accuracy rates.
Panelists also discussed best practices and when it’s ideal to bring data scientists on board (at a project’s inception). They also talked about what’s next (more data, deep learning), and agreed that no matter how clean someone claims their data is before a data scientist gets to it—they’re mistaken.