Time and knowledge trade-off

Rishabh Jain

Published Apr 16, 2021

If your work is related to data analysis, machine learning, then you have probably read many articles on these platforms - medium, TowardsDataScience, AnalyticsVidhya etc. However, how many times did you try learning a concept from a book?

Searching for a topic on google and finding a relevant article is a seconds job and indeed very easy than searching for the same in a book. But, not all writers on these platforms are experienced and learned. While you can almost always find good codes on these platforms, knowledge and intuition about the topics are not very common, and you can find many articles filled with rookie mistakes. So totally relying on these platforms, especially when learning a new concept or trying something new, is not a good idea.

For example, there is a very famous concept about data leakage in machine learning. While splitting the data into training and test set, we ensure that we do not pass/leak any information related to training data into the test set. So the first task is always to split testing data and does not even look at it until the testing phase. We do not even visualise this data to prevent our brains from making any pattern/hypothesis on this data. But the thing people generally miss is that while transforming data (using scaling, encoding, etc.), they tend to fit on the complete dataset before splitting, which is a dangerous task and could affect your results badly or, even worse, give you false confidence in your mode.

Another prevalent mistake is related to 'class imbalance' in a classification machine learning problem. Let's say you are working on a binary classification task and you have a very large dataset. But the number of positive instances(or any one class) is significantly less (which is generally the case). In this situation, you have to take care of the imbalance when you divide your data set into training, validation and test set.

These were some common mistakes that I could see in the code, but when it comes to more advance topics, or topics that you are not much familiar with, then it becomes difficult to save ourselves from these biases/mistakes.

To conclude, this article's overall motive was to make a clear point that before learning any new concept on any platform, please re-check or validate its authenticity. The best way is always to follow a good book written by an expert in the field. It has many advantages over reading online, but there's also a cost-- you have to pay more time. But as working individuals in machine learning, that's what we do; we make a trade-off. :-)

To view or add a comment, sign in

Time and knowledge trade-off

Rishabh Jain

Others also viewed

From 'I Do' to Training Data Power.

Machine Learning Project Checklist

Step by Step flow involved in Machine Learning

📨Byte Sized ML#3: How do you know if your model is actually good? 🤔

Main Challenges to Machine Learning

Decision Trees

Why I like My Current Role: Getting into Machine Learning

The trouble caused by 'Self- Learning'

Explore content categories