Defining the Data Scientist

Defining the Data Scientist

What makes a great data scientist?

As much as any technical skill or advanced mathematics, it's communication and curiosity.

When I searched on the phrase "what makes a great data scientist" more than 200 million hits came back. That search number is mind boggling, but I found it interesting that the very first hit was for Data Scientist School at the University of Kansas, my undergrad alma mater. The next hit was "Six qualities of a great data scientist" published in 2014. I agree with the six qualities, but quibble with their order.

Defining a data scientist is one of the most difficult things to do because of the potential breadth of the role. That's why my belief is in soft skills as much as math and technical skills. The role should also be defined in conjunction with the questions or problems at hand. The depth and complexity of those problems can make a huge difference between looking for a Tableau-driven data scientist and a PhD-level, R-expert, ex-NSA unicorn.

The six qualities from the article are

  1. Statistical thinking
  2. Technical acumen
  3. Multi-modal communication skills
  4. Curiosity
  5. Creativity
  6. Grit

That's a great list. I'll approach them in the order I find most important and put a slightly different spin on their description.

Communication is the key for data scientists and a data science practice to be successful. Communication entails both the give and take of listening and speaking, presenting, updating, or writing. Great data scientists listen to understand the business problem or question and can repeat it back using language the stakeholder can understand. Jargon and algorithmic terminology--hyper-parameterization or regularization anyone?--do nothing to reassure a client or stakeholder that the solution is the right one. As the article notes, data scientists know how to "edit themselves" to get their points across to others who aren't data scientists.

Curiosity is tied with communication for most important. And oddly, curiosity is one quality that often gets data scientists in hot water. I'll come back to that in the next paragraph. Curiosity is the insatiable interest to ask the questions, all the questions, to better understand what they're working towards. There are stupid questions, yes, but great data scientists aren't afraid to ask them if, even in only their mind, the answer would move them one step closer to a better solution.

The curious are sometimes stigmatized as "difficult to work with" or "thinkers but not doers" because "they ask too many questions". That's an unfortunate side effect of working with data scientists; it's also a quality that is often underappreciated by organizations. This is an opportunity for an organization to better integrate data scientists by explaining these qualities as positive. Finally, curiosity must be reigned in at times with guard rails. Generally those guard rails are time boundaries. At the earliest part of a data science project there must be very specific time set aside for Q&A, which could be the hypotheses to be investigated by the data scientist. Setting clear and distinct time boundaries and sticking with them is a great way to contain the curious without stymying free thinking.

Creativity is the data scientist's way of getting around the word "no". Here are a few "no" statements that challenge the data scientist:

  • "That data is siloed in that other department. We can't get it."
  • "We'd have to scrape that site to get the static list of locations."
  • "That's not publicly available data."

Creativity is, in this case, finding ways to solve problems that move a project forward. Creativity is not illegal measures by which to fashion or gather data. Creativity is the boundless energy to be told something can't happen and then finding a way to do it, work around it, or work with the problem.

Grit is the will to continue when the complications of our environment test us. Great data scientists can look at scope creep, administrative red tape, and other roadblocks as opportunity. They know that they'll never have "all the data" and work within the constraints they're given. The business world is a busy, chaotic place where data scientists accustomed to order and logic can see their dreams go up in smoke with a single, "I don't understand." Great data scientists don't take that or the lack of data personally.

Statistical thinking converts data into information; the application of statistics confirms or denies our hypotheses about how things work or don't work. Statistics are the basis for anomaly detection--when results are outside the bounds of normal in a good or bad way--and help data scientists uncover "fishy" data--data that's got problems and could mess up a predictive model, for example.

Technical acumen encompasses a large set of skills: coding and programming, database manipulation, data architecture knowledge, visualization expertise, documentation development, and machine learning or predictive modeling knowledge. Machine learning practitioners should be able to find the data they need, clean and prepare it for modeling, visualize the features and patterns in the data to support model development, create models, optimize a best-choice model, present the results in a clear method, and document their code for the future. For now leave out the architectural choices of training, testing, and validating the model; integrating those models into other software or systems; or running those models against live data in the cloud. The baseline technical acumen is enough to get started.

If you're ready to find data scientists and/or in need of some similar to what I've outlined here, reach out to me at sam.johnson@bluejacketsol.com. I'd also appreciate hearing from you how you've defined a data scientist and the skills you've determined work best for you or your organization.

For fun the image at the top of this article is the gentleman I and many others believe to be the founder of computer science and creator of the definition of artificial intelligence. He's 16 in that picture. Can you name him?

Well written. For me, the soft skills tend to be the skills that can really set a data scientist apart from the crowd. The ability to communicate and bring life to their findings is so critical in getting business leaders and marketing teams to understand and be excited about data science.

Like
Reply

To view or add a comment, sign in

More articles by Sam Johnson

Others also viewed

Explore content categories