Lessons Learned From 400 hours of Independent Data Science Study

Lessons Learned From 400 hours of Independent Data Science Study

My adventure into a self-directed, 'Masters level equivalent', course of Data Science study continues. After having written about lessons learned after 200-hours of study, so here are my thoughts after 400-hours of work and effort. 

What is 400 hours of study? For me, this equates to:

  • 12 completed online courses that are designed to take anywhere from 4 to 11 weeks to complete
  • Several smaller tutorials on VIM, Mode SQL, MATLAB/Octave, PyCharm, Sublime Text, Anaconda, etc.
  • Dozens and dozens of YouTube videos watched, and websites perused, articles read in an effort to find other insights or explanations into complex topics that I am working on
  • Two personal projects; web scrapping, and initial analysis of actual field data for the University of Illinois
  • Built out the beginnings of a GitHub Data Science portfolio page
  • A handful of networking and other Data Science related social events that I use Meetup to curate

I originally estimated 750 hours of study to achieve my goal.

           "I decided to split the difference and establish a personal goal of 750 hours of study. Assuming that a typical 3-credit class requires 48-hours of effort spread over a16-week semester, 750 hours of self-directed work equates to about 15 classes, which is about what a typical Masters-level program would require".

I believe this is still a valid estimate of effort, so I'm slightly over halfway in my studies. What lessons have I learned?

1. Get grounded. Before jetting off into Machine Learning algorithms or Artificial Intelligence make sure that you have a very good understanding of Linear Algebra and Multivariate Calculus. Understanding vectors, matrices and all of the mathematical concepts, formulas, notation, etc., will greatly ease your journey. I chose the Coursera courses (Mathematics for Machine Learning: Linear Algebra,  Mathematics for Machine Learning: Multivariate Calculus) taught by Imperial College London. While not having a STEM background, I was initially pretty confident in my math skills, however I found these two courses to be very difficult, and I really had to work hard with both the lectures and outside material. 

2. Along with the above math(s), a good understanding of Statistics and Probability is also very useful. Again, I chose the Coursera's Statistics with R Specialization taught by Duke University that consisted of five courses, however I only completed the first three. My rationale was that I was only seeking the knowledge and didn't need to spend valuable time on a final project if I had successfully completed the coursework. In truth and to this point, I've found that the background in Linear Algebra and Multivariate Calculus to be more beneficial than the Statistics and Probability, but I would definitely not bypass either.

3. I do not have a background in coding, so I hit Python and R pretty hard. I took advantage of Code Academy and DataCamp to step through several Python courses, and I was also using R in the Statistics Specialization, so in effect I was using/learning two languages at the same time. This caused a bit of confusion, and I spent time on GitHub trying to sort out what I was attempting to do.  Both R and Python are super-powerful at what they do. I contend that you need to have some capabilities in both, although at this point, I prefer Python3.

4. "Reading is fundamental"... for those of you that are old enough to have heard that saying before.   For books, I worked my way through; Practical Statistics for Data Scientists,  Automate the Boring Stuff with Python, and Learn Python the Hard Way. Also, I tried to read a daily, curated Medium feed for Data Science, as well as a similar Flipboard feed. As time moved on, I understood more and more of the articles that I was reading, to the point where many of them were reviews of topics that I already felt comfortable with.

5. Once I felt good about the maths, the statistics, and the coding it was time to jump in the deep end and begin with Machine Learning. For me, the obvious place to start was Coursera's 11-week course, Machine Learning, taught by Stanford University and Andrew Ng. The two criticisms of this course are that it's a bit dated, and that the exercises are in MATLAB/Octave. I, for one, really didn't care if it was dated because it is such a seminal course. The MATLAB threw me a bit, as I had to learn another coding language. MathWorks' MATLAB Onramp is the place to start, and it took only a day or so to step through the tutorial to the point where I could do the coding exercises required by the course.

This is a hard course and my only saving grace was the groundwork that I had laid before beginning it. Andrew would press through derivatives, probabilities, crazy math formulas, and other items rather quickly, but I had seen it all before and it gave me a bit of breathing room and confidence. The coding was a beast, but the more comfortable I became with MATLAB the easier it became for me. I finished the 11-week curriculum in about five weeks, but I felt like I had really accomplished something at the end. 

6. At this point I felt comfortable beginning to look for a Data Science or Business Intelligence position in my area. There is an endless amount of literature of job search strategies, but to begin I had my resume professionally recrafted, as well as my LinkedIn page. I then posted resumes on Indeed and ZipRecruiter; those are my three platforms that I'm currently using to search. The intent is to find a company that is a good fit for both my executive business experience as well as my new-found Data Science skills and continue to grow and improve. I am far from the best coder, or statistician, but I have a keen understanding of many of the hard Data Science skills and believe that I'm at a point where I can greatly contribute to the right company. 

Future steps: I still have 350-hours of work to do until I reach my goal. I'm currently enrolled in Coursera's Creating Dashboards and Storytelling with Tableau course. There's a debate whether Tableau is worth the effort, but the consensus is that you should spend a little time on the platform and add it to your toolkit. A quick scan of Business Intelligence job listings will show that many companies are still requiring the skill. 

Next I will step through Coursea's five-course Deep Learning Specialization, again taught by Andrew Ng. Some of this will be a review of the Machine Learning course, but it is a more up-to-date and even broader set of courses.

To view or add a comment, sign in

More articles by Eric Stewart, MA, MBA

Others also viewed

Explore content categories