Data Science as a non-binary condition.

There are a lot of bytes floating about the internet making the case for who is a data scientist, including a whole subset of posts about who are "real" data scientists.  I have observed that this philosophy manifests itself in data science training programs as well.  Right now, if I wish to learn about data science, it's an all-or-nothing proposition.  I would place the available programs in three categories:

  1.  Online Training from Companies such as Coursera, Udacity and others
  2. 12-14 week bootcamp-style data science courses
  3. University programs

Many of these programs look amazing and if I had the time I would love to take one (or all) of them.  With that said, all these programs take the approach that the students will emerge from these programs as data scientists and as result, they teach a comprehensive set of skills to their students.  

A Different Approach

There are certain professions in which you either are a member or not, and non-members practicing the profession could have disastrous consequences.  Medicine for instance.  You either are a medical doctor or you are not, and when individuals who are not qualified to practice medicine are dispensing medical advice, the results are often disastrous--as best illustrated by the anti-vaccine movement which at this point is being pushed by many celebrities rather than legitimate medical professionals.  Engineering would be another example.  You wouldn't want to drive over a bridge that I designed because well... I have no idea how to do it, even though I probably have the requisite math skills. 

However, these professions all have professional associations and government regulators who determine who has the requisite skills to call themselves members of that profession.  This does not exist for data science, for good reason.  Data Science as practiced today, is a mixture of several other disciplines, and can be applied to nearly any other discipline.  The breadth of skills that fall under the data science umbrella is so enormous.

In 2013, O'Reilly published a short booklet entitled Analyzing the Analyzers in which they identified four main groupings of individuals who considered themselves data scientists which were:

  • Data Creative
  • Data Developer
  • Data Researcher
  • Data Businessperson

You can see from the graphic on the below that the skill breakdown is far from homogenous.  Data Science differs from other professions in other professions--such as medicine--practitioners MUST have a core set of knowledge in order to be a member of that profession.  As illustrated in the chart on the left, that is not really the case for data science.  One could be a machine learning master, yet have no experience with big data and still legitimately be said to be doing data science work.  Therefore, I'd like to propose that data science be viewed as a spectrum of skills rather than a binary condition.

In short, I believe that data scientists spill too much ink trying to label individuals as individuals (or as "fake" data scientists). Instead data scientists should spend time determining what skills actually make up data science and that individuals with different proportions of these skills can be included as data scientists.

Teaching Data Science Skills Rather than Data Science

This slight twist on the approach has implications for training.  If data science is no longer an exclusive club, but rather a collection of skills, those skills, or portions thereof can be taught to individuals who have no interest in becoming data scientists.  In other words, data science training could be about teaching anyone who does analytic work data science skills which they can incorporate into their workflow.  The goal of this kind of training would not be to mint full data scientists, but rather, teach individuals data science skills relevant to their profession.  This kind of training could be a lot shorter and less comprehensive, but in the end, I believe that it would be more practical for the thousands of individuals out there seeking to incorporate data science into their work but don't really have the time to put into it or the desire to become data scientists.  Mind you, I am not proposing dumbing data science down, but rather I am suggesting that in addition to what is currently offered, data science training can be effective if it is tailored to specific audiences, and focuses on the techniques that would be directly relevant to those audiences.

TL;DR

The data science world today views data science as a binary condition: either you are or not a data scientist.  However data science should be viewed as a collection of skills which virtually any professional can incorporate into their workflow.  If data science training was viewed in this context, organizations could increase their use of data science by training their current staff in the data science skills that are relevant to their work.

Good article. Agree that thinking about the skills our teams need is more constructive than trying to determine who belongs in the club. In addition to skills, organizations might think about how to foster / teach data science tradecraft that works best for their needs. This helps new team members add value more quickly and avoid mistakes others have already made when attempting to apply their data science skills.

Like
Reply

01110111 01101000 01100001 01110100 00111111

Like
Reply

Agree, that's what I'm doing with my courses at General Assembly. I don't have time in three months to teach students the mathematical theory behind Lasso Regularization or Gradient Descent, but they will leave knowing what these tools accomplish and when/how to use them.

Like
Reply

Fantastic article. Thank you for posting it.

Like
Reply

To view or add a comment, sign in

More articles by Charles Givre

  • Why Cybersecurity Professionals Need AI Skills

    Originally published at https://gtkcyber.com/blog/why-cybersecurity-professionals-need-ai-skills](https://gtkcyber.

    1 Comment
  • Input Is All You Need: Thoughts from BlackHat 2025

    It's been a week now since I returned from #BlackHat and now that I've had an opportunity to collect my thoughts, I…

    15 Comments
  • All Great Things Part 2: The Founder's Dilemma

    I recently posted an article about the demise of DataDistillr. It was painful to write and I was worried that by doing…

    4 Comments
  • All Great Things...

    Well, this is the post I’d hoped to never write, but alas, we’ve reached the conclusion that it’s time to shut down…

    64 Comments
  • Why You Shouldn't Rely on GPT to Write Code

    A lot of people have tried out ChatGPT and other LLMs for code their code writing abilities. My theory was that the…

    20 Comments
  • Tests in a GenAI World

    I teach a graduate level data management class at the University of Maryland, Baltimore County (UMBC). Let me preface…

    5 Comments
  • Five Things I Learned Writing SQL with Gen AI

    ChatGPT has been all over the news for the last few months and again with the release of GPT-4. At DataDistillr, we…

    7 Comments
  • It's The Assumptions That Get You

    I’ve had a number of conversations recently that have highlighted to me how not understanding people’s assumptions can…

    4 Comments
  • ChatGPT, Meet DataDistillr! You’ll have lots to discuss!

    Happy New Year everyone! I’m pretty excited about this. Like every other tech geek out there, I was experimenting with…

    24 Comments
  • Five Technologies That I Think Are Bullshit

    This is going to piss people off. I took a road trip a few weeks ago to New York and listened to an interview with Mark…

    49 Comments

Others also viewed

Explore content categories