Data Science as a non-binary condition.
There are a lot of bytes floating about the internet making the case for who is a data scientist, including a whole subset of posts about who are "real" data scientists. I have observed that this philosophy manifests itself in data science training programs as well. Right now, if I wish to learn about data science, it's an all-or-nothing proposition. I would place the available programs in three categories:
- Online Training from Companies such as Coursera, Udacity and others
- 12-14 week bootcamp-style data science courses
- University programs
Many of these programs look amazing and if I had the time I would love to take one (or all) of them. With that said, all these programs take the approach that the students will emerge from these programs as data scientists and as result, they teach a comprehensive set of skills to their students.
A Different Approach
There are certain professions in which you either are a member or not, and non-members practicing the profession could have disastrous consequences. Medicine for instance. You either are a medical doctor or you are not, and when individuals who are not qualified to practice medicine are dispensing medical advice, the results are often disastrous--as best illustrated by the anti-vaccine movement which at this point is being pushed by many celebrities rather than legitimate medical professionals. Engineering would be another example. You wouldn't want to drive over a bridge that I designed because well... I have no idea how to do it, even though I probably have the requisite math skills.
However, these professions all have professional associations and government regulators who determine who has the requisite skills to call themselves members of that profession. This does not exist for data science, for good reason. Data Science as practiced today, is a mixture of several other disciplines, and can be applied to nearly any other discipline. The breadth of skills that fall under the data science umbrella is so enormous.
In 2013, O'Reilly published a short booklet entitled Analyzing the Analyzers in which they identified four main groupings of individuals who considered themselves data scientists which were:
- Data Creative
- Data Developer
- Data Researcher
- Data Businessperson
You can see from the graphic on the below that the skill breakdown is far from homogenous. Data Science differs from other professions in other professions--such as medicine--practitioners MUST have a core set of knowledge in order to be a member of that profession. As illustrated in the chart on the left, that is not really the case for data science. One could be a machine learning master, yet have no experience with big data and still legitimately be said to be doing data science work. Therefore, I'd like to propose that data science be viewed as a spectrum of skills rather than a binary condition.
In short, I believe that data scientists spill too much ink trying to label individuals as individuals (or as "fake" data scientists). Instead data scientists should spend time determining what skills actually make up data science and that individuals with different proportions of these skills can be included as data scientists.
Teaching Data Science Skills Rather than Data Science
This slight twist on the approach has implications for training. If data science is no longer an exclusive club, but rather a collection of skills, those skills, or portions thereof can be taught to individuals who have no interest in becoming data scientists. In other words, data science training could be about teaching anyone who does analytic work data science skills which they can incorporate into their workflow. The goal of this kind of training would not be to mint full data scientists, but rather, teach individuals data science skills relevant to their profession. This kind of training could be a lot shorter and less comprehensive, but in the end, I believe that it would be more practical for the thousands of individuals out there seeking to incorporate data science into their work but don't really have the time to put into it or the desire to become data scientists. Mind you, I am not proposing dumbing data science down, but rather I am suggesting that in addition to what is currently offered, data science training can be effective if it is tailored to specific audiences, and focuses on the techniques that would be directly relevant to those audiences.
TL;DR
The data science world today views data science as a binary condition: either you are or not a data scientist. However data science should be viewed as a collection of skills which virtually any professional can incorporate into their workflow. If data science training was viewed in this context, organizations could increase their use of data science by training their current staff in the data science skills that are relevant to their work.
Good article. Agree that thinking about the skills our teams need is more constructive than trying to determine who belongs in the club. In addition to skills, organizations might think about how to foster / teach data science tradecraft that works best for their needs. This helps new team members add value more quickly and avoid mistakes others have already made when attempting to apply their data science skills.
01110111 01101000 01100001 01110100 00111111
Agree, that's what I'm doing with my courses at General Assembly. I don't have time in three months to teach students the mathematical theory behind Lasso Regularization or Gradient Descent, but they will leave knowing what these tools accomplish and when/how to use them.
Nicely stated
Fantastic article. Thank you for posting it.